The aim of the project was to find the values of theta that maximize the probability of the evidance for a given pedigree or pedigees. Our project was to find these values using the EM algoritm.
We implemented our program as a function of SuperLink called: calcLikelihoodSIlink. Therefor our function uses the data structures in SuperLink and it's input files are in the same format as in SuperLink.
We implemented the EM algoritm by calculating the theta values based on the data given, updating the data according to the new theta values we calculated, and calculating the theta values again. This was done as long as the differance between two consecutive results for the theta values was smaller than a value we set in advance.
The differance between two consecutive result was calculated using this function:
diff = sum over all the loci [ (oldthetaval-newthetaval)/ (oldthetaval)]
There is also an option to stop the iteration according to the likelihood, in that case the differance is calculated by:
diff = (oldlikelihood-newlikelihood)/ oldlikelihood.
The calculation of the theta values at each iteration was done using the BTE Algoritm to find the probability functions for the selectors of the Loci of the pedigree now being calculated. From them we got the expected number of recombinant meioses and theta was calculated by dividing the number of expected recombinant meioses by the total number of meioses.
We implemented the BTE algoritm according to: K. Kask, R.Dechter, J.
Larrosa & F. Cozeman. Unifying Tree-Decomposition Schemes for Automated
The third column of the table holds the value used as the stopping condition for the iterations of the EM algoritm , the stopping condition was calculated according to the difference in the theta values.
The values of the running time are in seconds.
|Input files||No. of people||No. of Loci||No. of allels||Loops||
The main problem we encountered was shortage in memory. The probability tables we calculate can become very large and the computer running the program runs out of memory to allocate for the program. When runing the program on a file contating 10 people and 25 Loci the program terminated with an error message printed as a result of the malloc function failing ,the same happens when running the program on input files containing 20 people and 3 Loci. However these restrictions on the size of the pedigree are not absolut , other factors effect the size of the probability tables such as the number of nodes eliminated by SuperLink or the elimination order (that is also set in SuperLink) that can effect the shape of the tree and so effect the merging of the buckets . For these reasons the former examples should be looked at as "ball park" figures.
Executable for Windows (zip)