The aim of the project was to find the values of theta that maximize the probability of the evidance for a given pedigree or pedigees. Our project was to find these values using the EM algoritm.
Implementation
We implemented our program as a function of SuperLink called: calcLikelihoodSIlink. Therefor our function uses the data structures in SuperLink and it's input files are in the same format as in SuperLink.
We implemented the EM algoritm by calculating the theta values based on the data given, updating the data according to the new theta values we calculated, and calculating the theta values again. This was done as long as the differance between two consecutive results for the theta values was smaller than a value we set in advance.
The differance between two consecutive result was calculated using this function:
diff = sum over all the loci [ (oldthetavalnewthetaval)/ (oldthetaval)]
There is also an option to stop the iteration according to the likelihood, in that case the differance is calculated by:
diff = (oldlikelihoodnewlikelihood)/ oldlikelihood.
The calculation of the theta values at each iteration was done using the BTE Algoritm to find the probability functions for the selectors of the Loci of the pedigree now being calculated. From them we got the expected number of recombinant meioses and theta was calculated by dividing the number of expected recombinant meioses by the total number of meioses.
We implemented the BTE algoritm according to: K. Kask, R.Dechter, J.
Larrosa & F. Cozeman. Unifying TreeDecomposition Schemes for Automated
Reasoning. http://www.ics.uci.edu/~dechter/publications/
The third column of the table holds the value used as the stopping condition for the iterations of the EM algoritm , the stopping condition was calculated according to the difference in the theta values.
The values of the running time are in seconds.
Input files  No. of people  No. of Loci  No. of allels  Loops 







pedfile1.dat,datafile1.dat  15  4  34  no 

0.05 

15  6.325  data1 
pedfile2.dat,datafile2.dat  15  3  23  yes 

0.07  3.725  5  0.745  data2 
pedfile3.dat,datafile3.dat 





0.26 

5  24.74  data3 
pedfile4.dat,datafile4.dat  15  3  23  no 

0.04  0.971  10  0.0971  data4 
pedfile5.dat,datafile5.dat  15  4  3  no 

0.18  27.5  24  1.1458  data5 
pedfile6.dat,datafile6.dat  10  5  23  no 

0.03  0.2  7  0.02857  data6 
The main problem we encountered was shortage in memory. The probability tables we calculate can become very large and the computer running the program runs out of memory to allocate for the program. When runing the program on a file contating 10 people and 25 Loci the program terminated with an error message printed as a result of the malloc function failing ,the same happens when running the program on input files containing 20 people and 3 Loci. However these restrictions on the size of the pedigree are not absolut , other factors effect the size of the probability tables such as the number of nodes eliminated by SuperLink or the elimination order (that is also set in SuperLink) that can effect the shape of the tree and so effect the merging of the buckets . For these reasons the former examples should be looked at as "ball park" figures.