__Ideas for
improvement.__

** **

The general idea of the
implementation of NPL analysis in this project is to iterate over inheritance
vectors in the outermost loop and over alleles in the inner loop: for each new
inheritance vector possibly there are O(F+N) alleles
to rearrange, computation of Spairs takes O(A) and
computation of Sall – O(2^A) [ F – number of
founders, N – number of nonfounders, A – number of affected
]. Overall number of inheritance vectors is O(2^(2*N-F)), so the computation of
Spairs for all vectors takes O((F+N+A)*2^(2*N-F)) and
the computation of Sall – O((F+N+2^A)*2^(2*N-F)), as
it was already mentioned in complexity analysis.

But this may be reduced to
O(2^(2*N-F)) for Spairs and to O(2^(A+2*N-F)) for Sall, if we will use another principle: not to iterate over
vectors in the outermost loop, but to “build” vector bit by bit and for each
new bit of vector to compute new score using score (either Spairs
or Sall) of the previous “shorter” vector (and
without taking into consideration the rest of the bits that will be added to
the vector later), when we start with vector of length 1 and compute initial
score assuming only 1 bit. This principle avoids duplicated calculations for
vectors that differ only for branches in the pedigree.

If the previous vector was v,
we added some bit to v and got v’, then we can calculate new scores in the
following manner (assuming that the previous scores were Spairs(v) and Sall(v)):

__Spairs____(v’)__: let’s assume we
have array of 2*F alleles “f_alleles” in which for
each founder allele j we store number of times allele j appears in pedigree in
different affected individuals assuming vector v; if the added bit corresponds
to allele i in some affected individual then: Spairs(v’) <– Spairs(v) + f_alleles[i] ; f_alleles[i]++;

this simple calculation takes O(1).

__Sall____(____v’)__: this case is much
more complicated, because in order to use the previous score we need the scores
for all collections (for each collection h we need to store ∏bi(h)! –
multiplication of factorials of number of times each allele i
appear in h), and not only that, for each collection h we need also array “f_alleles” (for each allele j storing the number of times j
appears in h), that means memory complexity increases up to O(2*F*2^A); if we
have all this and the added bit corresponds to allele i
in some affected individual then we loop over all collections and for each
collection h’ we use the score of corresponding collection h (from vector v)
and array f_alleles of that collection and get: f_alleles[i]++; ∏bi(h’)! <- (∏bi(h)!)*f_alleles[i];

we take sum over all
new scores of the collections (dividing of sum by 2^A is performed only at the
last step);

for each collection
O(1) operations are performed, so computing the new Sall
score takes O(2^A) time.

The number of inheritance
vectors is O(2^(2*N-F)), so we get the desired time
complexity.

It is not the exact
implementation, its only the main idea and perhaps can
be improved further (especially memory complexity in the calculation of new Sall score).