Geppetto - Population Synthesis Software
Amit Tzur (sendwithchibo AT gmail DOT com) ,
Daneial Anvar (danielanvar AT gmail DOT com)
January 2nd, 2008,
under guidance of
Prof. Geiger Dan (dang AT cs DOT technion DOT ac DOT il) and
Bercovici Sivan (sberco AT cs DOT technion DOT ac DOT il)
Geppetto is population synthesis software. It can produce genomes of individuals, according to
genotype data provided by the user - it creates the genomes by assigning an allele to each
marker in the genotype data. The genotype data input defines probabilities for each allele in
each marker, and Geppetto assigns the alleles according to the alleles’ probabilities.
Geppetto aims to provide an extensible tool that can create diverse populations under
different scenarios, with sick and healthy individuals under a certain disease model.
Geppetto is written in Java. Due to the size of the genotype data, the size of the output (each
individual genome is a collection of text files), and different population scenarios, Geppetto uses
large amounts of CPU time and memory, uses the hard drive, and its execution time may vary
between few seconds to several minutes (depending on the type of population creation and the
genotype data). In order to keep track of Geppetto execution, you can see the console logs
throughout the execution (Tip: use the trace verbosity level).
The admixed population creation scenarios, which define sick people to create are most prone
to lengthy execution times - Geppetto tries to create a sick person, and then according to the
disease model, determines if indeed the created person is sick. If the person is not sick,
Geppetto tries again, and that can result in long execution time. Therefore, the execution time is
dependent on the disease model and the complexity of the created population (when admixed
population creation methods are the most complex).
Geppetto is highly configurable - the user can define an extensive disease model, define the
rate of genetic recombination in the pedigree, define number of sick and healthy people to
produce, define the number of generation of admixture that the created population underwent
(see the admixed population creation scenarios for more info: HI, CGF and Straight Admixed),
Full report (PDF)
Source code (.zip)
A detailed description of the project is available.