GenomeSim software is not yet available. Please check back
soon.
Method Description:
Genome-wide association studies have become a reality in the study of the genetics of
complex disease. This technology provides a wealth of genomic information on patient
samples, from which we hope to learn novel biology and detect important genetic and
environmental factors for disease processes. Because strategies for analyzing these data
have not kept pace with the laboratory methods that generate the data it is unlikely that
these advances will immediately lead to an improved understanding of the genetic
contribution to common human disease and drug response. Currently, no single analytical
method will allow us to extract all information from a whole-genome association study.
Thus, many novel methods are being proposed and developed. It will be vital for the
success of these new methods, to have the ability to simulate datasets consisting of
polymorphisms throughout the genome with realistic linkage disequilibrium patterns.
Within these datasets, we can embed genetic models of disease whereby we can evaluate
the ability of novel methods to detect these simulated effects.
genomeSIM is a new data simulation package for the simulation of large-scale genomic data in
population based case-control samples. It allows for single SNP, as well as gene-gene interaction
models to be associated with disease risk. genomeSIM utilizes two different methods to generate datasets. An initial
population can be generated on the basis of allele frequencies of the SNPs and
then further generations are created by crossing the members of successive
generations. The simulator assigns affection status only after a specified
number of generations. Alternatively, the simulator can construct a case-control
dataset by generating individuals as above, assigning affection status, and
selecting cases and controls until the dataset is complete.
genomeSIM uses a penetrance table to set the affection status of individuals. To determine status, the
simulator determines the genotype of the individual at the disease SNPs. The
simulation then determines the penetrance for that genotype and generates a
random number to determine if this individual is affected. The penetrance table represents the disease
model being simulated.
genomeSIM is written in ANSI-C++ and compiled using the GNU compiler
into a library that can be linked to programs to generate datasets without the
need for intermediate files. The analysis can be run using functions in the library classes or the library
can accept a configuration file as input for easy linkage with existing programs.
The simulator accepts keywords and values as the configuration format.
|