GenomeSim software is not yet available. Please check back soon.

Method Description:

Genome-wide association studies have become a reality in the study of the genetics of complex disease. This technology provides a wealth of genomic information on patient samples, from which we hope to learn novel biology and detect important genetic and environmental factors for disease processes. Because strategies for analyzing these data have not kept pace with the laboratory methods that generate the data it is unlikely that these advances will immediately lead to an improved understanding of the genetic contribution to common human disease and drug response. Currently, no single analytical method will allow us to extract all information from a whole-genome association study. Thus, many novel methods are being proposed and developed. It will be vital for the success of these new methods, to have the ability to simulate datasets consisting of polymorphisms throughout the genome with realistic linkage disequilibrium patterns. Within these datasets, we can embed genetic models of disease whereby we can evaluate the ability of novel methods to detect these simulated effects.

genomeSIM is a new data simulation package for the simulation of large-scale genomic data in population based case-control samples. It allows for single SNP, as well as gene-gene interaction models to be associated with disease risk. genomeSIM utilizes two different methods to generate datasets. An initial population can be generated on the basis of allele frequencies of the SNPs and then further generations are created by crossing the members of successive generations. The simulator assigns affection status only after a specified number of generations. Alternatively, the simulator can construct a case-control dataset by generating individuals as above, assigning affection status, and selecting cases and controls until the dataset is complete.

genomeSIM uses a penetrance table to set the affection status of individuals. To determine status, the simulator determines the genotype of the individual at the disease SNPs. The simulation then determines the penetrance for that genotype and generates a random number to determine if this individual is affected. The penetrance table represents the disease model being simulated.

genomeSIM is written in ANSI-C++ and compiled using the GNU compiler into a library that can be linked to programs to generate datasets without the need for intermediate files. The analysis can be run using functions in the library classes or the library can accept a configuration file as input for easy linkage with existing programs. The simulator accepts keywords and values as the configuration format.











Updated November 05 2009 13:09:26.