External documentation

 

         In order to implement the algorithms which were described in “Algorithms and complexity” part, the following data structures were used:

         - childless founders were excluded using linked list “unlinkedFounders”:

 

         typedef struct UnlinkedFounder

{

                   int founder;  /* index of childless founder in the Pedigree */

                   struct UnlinkedFounder* next;

}UnlinkedFounder;

UnlinkedFounder* unlinkedFounders;

 

This structure is defined in “gdata.h”, is initiated in “normalize.c” in function “prepareInput” by using array “tmpFounders”. “unlinkedFounders” is used in “NPLstudents.c” in function “getRealFounders” in order to exclude all the childless founders from the computation, and returns the array “RealFounders” which is the shortened version of array “Founders” from which were excluded all childless founders(simply, “RealFounders” = “Founders” – “unlinkedFounders”).

 

- all the relevant information (for the computation of NPL scores) for each person in the pedigree is stored in the array “ped” of pointers to “ped_node”s:

 

typedef struct ped_node

{

         struct ped_node* next_af;        /* only for affected individuals:

                                                            pointer to the next affected individual

                                                            in the pedigree                                                          */

         int mother;                               /* mother index     (starting 0, in founder -1) */

         int father;                                 /* father index       (starting 0, in founder -1) */

         Bool af;                                   /* false - not affected; true - affected  */

         int allele1;                                /* allele1 - from father; allele2 - from mother;                  */

         int allele2;                                /* two of 2*f alleles {0,1,...,2*f-1}               */

         Bool usefull;                            /* shows if the person is usefull in the computation of NPL           

                                                           (meaning he is affected or has affected children) */

}ped_node;

ped_node** ped;

 

This structure is defined in “NPLstudents.h” and is initiated in “NPLstudents.c” in the function “InitPed” using 3 arrays: “RealFounders”, “NonFounders” and “Ped” (=AllPedigreesList->PedigeeGraph). Usage of  “RealFounders” and “NonFounders” is essential for topological sorting of all the persons in “ped”, so all the indexes of parents are smaller than indexes of their children in “ped”, it is critical in the following computations.

 

- all the affected individuals are pointed from the array “affected”, this is a fast way to access affecteds (is used in many functions):

ped_node** affected;

is defined in “NPLstudents.h” and initiated in “NPLstudents.c” in the function “InitAffected”; also all the affected individuals are linked together by the “next_af” pointers of “ped_node”s (of affected individuals), the last affected points to NULL. The size of “affected” array (and of the linked list of affected individuals):

int af_num; 

 

- in the computation of Sall score we pick up collections of alleles from affected individuals, 1 allele from each affected, so we need to store the state of collection, t.m. for each affected which of 2 alleles is chosen, we use array:

int* aff;

of size “af_num” (aff[i] = 1 or aff[i] = 2 depending on which allele of affected “i” is chosen). “aff” is defined in “NPLstudents.h”, initiated in “NPLstudents.c” in the function “InitAffected”, and is used in functions “getSall” and “getSallTmp”.

 

- in the computation of Sall and Spairs scores we need to count the number of times each founder allele (out of 2*f founder alleles) appears in some collection (for Sall) or between pairs of distinct affected individuals (for Spairs), for this purpose we use:

int* f_alleles;

is defined in “NPLstudents.h” and initiated in “NPLstudents.c” in the function “InitFounderAlleles”, is used in “getSall”, “getSallTmp” and “getSpairs” functions.

 

- we also need to store each specific inheritance vector for which Spairs and Sall scores are computed; each cell in this vector is “responsible” for some allele (paternal or maternal) in some individual:

 

typedef struct vector_node

{

         int inheritance;                                  /*       grandpaternal(0) or grandmaternal(1)  */

                                                               /*       for the current allele                          */

         Bool usefull;                                    /*       like in ped_node                               */

         int person;                                       /*       index of the person in "ped"              */

         int parent;                                        /*       current allele inherited from father(0)  */

                                                               /*       or mother(1)                                    */

}vector_node;

vector_node* Ivector;

 

This structure is defined in “NPLstudents.h” and initiated in “NPLstudents.c” in the function “InitVector” (inheritance in each cell is initiated to 0, the size of “Ivector” is 2*n-f), is used in the function “getNewScores”.

 

- 3 variables are used to store the result of computation for each inheritance vector:

double Spairs;

double Sall;         

double SallTmp;

The meaning of 2 first variables is obvious. “SallTmp” stores the result of [multiplying factorials of number of times each founder allele appears in some collection], while “Sall” gets the sum of “SallTmp”s for all collections.

These variables are in “NPLstudents.h” and are used in “NPLstudents.c” in functions for computing NPL scores.

 

- the result of computation is stored in 2 arrays of size 2*n-f:

double* SpairsArray;

double* SallArray;

 

- the iteration through inheritance vectors is done in “gray code” order (and it is not the natural order of iteration), so for each new vector when the NPL score is computed, we need to compute also to which cell in “SpairsArray” or “SallArray” we should store the result, depending on vector that is treated. This can be done in O(1) if we store the index of previous cell (which was filled for the previous vector), the index of bit which has changed in vector, and array of size 2*n-f with powers of 2 stored (for example, if the size is 4, then the array is {1,2,4,8}). The index of the previous cell and the array:

 

int next;       (is called “next”, because it gets also the index of the next cell to fill)

int* powersOf2;

 

are defined in “NPLstudents.h”. “powersOf2” is initiated in “NPLstudents.c” in the function “computeScores”.

The index of bit to be changed in vector is computed in each iteration by the function “compute_algorithm” (defined in “sequence.h” and “sequence.c”), which passes this index to “getNewScores” (where the computation of a new “next” is performed, using this index, “next” and “powersOf2”).

 

 

 

Integration of the code into original

 

1) Add files NPLstudents.h, NPLstudents.c, sequence.h, sequence.c to the project

2) In file gh_main.c

- add:  #include "NPLstudents.h"

- modify function “main()” by adding between existing lines “createTestPoints();” and “likehood();”

the new line computeNPL();

3) In file gdata.h

         - add: typedef enum {S_PAIRS, S_ALL} SCORE;

         - add the definition of unlinkedFounders;

4) In the file normalize.c modify function “prepareInput()” to create linked list  unlinkedFounders;

5) In the file likehood.h add definition: double sumVector(VECTOR Vector);

6) In the file likehood.c:

         - add: #include "NPLstudents.h"

         - add implementation of double sumVector(VECTOR Vector);

- modify function likehood() to perform the multiplication of multipoint probabilities vectors with SpairsArray and SallArray , and free these 2 arrays in the end of the function.

7) In the file programSpec.h

         - add: #include "NPLstudents.h"

         - in the definition of the struct TestPoint add:        double NPL_Spairs;

                                                                                    double NPL_Sall;

8) In the file programSpec.c modify function “printTestPoints()” to print NPL scores Spairs and Sall.