The goal of simfam is to simulate and model families with founders drawn from a structured population. The main function simulates a random pedigree for many generations with realistic features. Additional functions calculate kinship matrices, admixture matrices, and draw random genotypes across arbitrary pedigree structures starting from the corresponding founder values.
You can install the released version of simfam from CRAN with:
install.packages("simfam")
The current development version can be installed from the GitHub
repository using devtools
:
install.packages("devtools") # if needed
library(devtools)
install_github('OchoaLab/simfam', build_vignettes = TRUE)
You can see the package vignette, which has more detailed documentation and examples, by typing this into your R session:
vignette('simfam')
These are some basic ways of calling the main functions.
# load package!
library(simfam)
Simulate a random pedigree with a desired number of individuals per
generation n
and a number of generations
G
:
<- sim_pedigree( n, G )
data # creates a plink-formatted FAM table
# (describes pedigree, most important!)
<- data$fam
fam # lists of IDs split by generation
<- data$ids
ids # and local kinship of last generation
<- data$kinship_local kinship_local_G
The basics of encoding a pedigree in a fam
table (a
data.frame) is that every individual in the pedigree is a row, column
id
identifies the individual with a unique number or
string, columns pat
and mat
identify the
parents of the individual (who are themselves earlier rows), and
sex
encodes the sex of the individual numerically (1=male,
2=female). The following functions work with arbitrary
pedigrees/fam
data.frames:
Prune a given fam
, to speed up simulations/etc, by
removing individuals without descendants among set of individuals
ids
(in this example, the last generation from the output
of sim_pedigree
):
<- prune_fam( fam, ids[[G]] ) fam
Draw genotypes X
through pedigree, starting from
genotypes of founders (X_1
):
<- geno_fam( X_1, fam )
X # Version for last generation only, which uses less memory.
# (`ids` must be as from `sim_pedigree`,
# a list partitioning non-overlapping generations)
<- geno_last_gen( X_1, fam, ids ) X_G
Calculate kinship through pedigree, starting from kinship of founders
(kinship_1
):
<- kinship_fam( kinship_1, fam )
kinship # Version for last generation only, which uses less memory.
<- kinship_last_gen( kinship_1, fam, ids ) kinship_G
Calculate expected admixture proportions through pedigree, starting
from admixture of founders (admix_proportions_1
):
<- admix_fam( admix_proportions_1, fam )
admix_proportions # Version for last generation only, which uses less memory.
<- admix_last_gen( admix_proportions_1, fam, ids ) admix_proportions_G