The simtrait
R package enables simulation of complex
traits with user-set number of causal loci and the desired heritability
of the trait (the proportion of variance due to genetic effects).
The main function requires a simulated genotype matrix, including the
true ancestral allele frequencies. These parameters are necessary to
correctly specify the desired correlation structure. See the package
bnpsd
for simulating genotypes for admixed individuals
(example below).
Simulating a trait from real genotypes is possible with a good
kinship matrix estimate. See the package popkin
for
accurate kinship estimation.
You can install the released version of simtrait from CRAN with:
install.packages("simtrait")
Install the latest development version from GitHub:
install.packages("devtools") # if needed
library(devtools)
install_github("OchoaLab/simtrait", build_vignettes = TRUE)
You can see the package vignette, which has more detailed documentation, by typing this into your R session:
vignette('simtrait')
The code below has two parts: (1) simulate genotypes, and (2) simulate the trait.
The first step is to simulate genotypes from an admixed population,
to have an example where there is population structure and known
ancestral allele frequencies. We use the external package
bnpsd
to achieve this.
library(bnpsd) # to simulate an admixed population
# dimensions of data/model
# number of loci
<- 10000
m_loci # number of individuals, smaller than usual for easier visualizations
<- 30
n_ind # number of intermediate subpops
<- 3
k_subpops
# define population structure
# FST values for k = 3 subpopulations
<- 1 : k_subpops
inbr_subpops # bias coeff of standard Fst estimator
<- 0.5
bias_coeff # desired final Fst of admixed individuals
<- 0.3
Fst <- admix_prop_1d_linear(
obj
n_ind,
k_subpops,bias_coeff = bias_coeff,
coanc_subpops = inbr_subpops,
fst = Fst
)<- obj$admix_proportions
admix_proportions # rescaled Fst vector for intermediate subpops
<- obj$coanc_subpops
inbr_subpops
# get pop structure parameters of the admixed individuals
<- coanc_admix(admix_proportions, inbr_subpops)
concestry <- coanc_to_kinship(concestry)
kinship
# draw allele freqs and genotypes
<- draw_all_admix(admix_proportions, inbr_subpops, m_loci)
out # genotypes
<- out$X
X # ancestral allele frequencies
<- out$p_anc p_anc
Here we apply our package to this simulated genotype data.
library(simtrait) # load this package
# parameters of simulation
<- 100
m_causal <- 0.8
herit
# create simulated trait and associated data
# version 1: known p_anc (prefered, only applicable to simulated data)
<- sim_trait(X = X, m_causal = m_causal, herit = herit, p_anc = p_anc)
obj # version 2: known kinship (more broadly applicable but fewer guarantees)
<- sim_trait(X = X, m_causal = m_causal, herit = herit, kinship = kinship)
obj
# outputs in both versions:
# trait vector
$trait
obj# randomly-picked causal locus index
$causal_indexes
obj# locus effect size vector
$causal_coeffs
obj
# theoretical covariance of the simulated traits
<- cov_trait(kinship = kinship, herit = herit) V