--- title: "Command Line Arguments" author: "Sha Zhu" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{dEploid-Arguments} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- #Program parameters and options ## Mostly used * `-vcf [file]` File path of the isolate vcf. Assume all variants are ``PASS`` in the ``QUAL`` column, the VCF file also reqires the ``AD`` field. Note: In the current implementation, ``DEploid`` only take the first sample in the VCF file. ``DEploid`` DO NOT handle multi-allelic variants, nor indels. The ``FILTER`` column will not be used. * `-plaf [file]` File path of population level allele frequencies (tab-delimited plain text file), for example CHROM | POS | PLAF ------|-----|----- Pf3D7_01_v3|93157|0.0190612159917058 Pf3D7_01_v3|94422|0.135502358766423 Pf3D7_01_v3|94459|0.156294363760064 Pf3D7_01_v3|94487|0.143439298925837 * `-panel [file]` File path of the reference panel (tab-delimited plain text file), for example CHROM | POS | 3D7 | Dd2 | Hb3 | 7G8 ------|-----|-----|-----|-----|----- Pf3D7_01_v3|93157|0|0|0|1 Pf3D7_01_v3|94422|0|0|0|1 Pf3D7_01_v3|94459|0|0|0|1 Pf3D7_01_v3|94487|0|0|0|1 * `-noPanel` Use population level allele frequency as prior. Warning: Flags `-panel` and `-noPanel` should not be used together. * '-exclude [file]' File path of sites to be excluded (tab-delimited plain text file). * '-o [string]' Specify the file name prefix of the output. * '-k [int]' Number of strain (default value 5). * '-nSample [int]' Number of MCMC samples (default value 800). * '-rate [int]' MCMC sample rate (default value 5). * '-burn [float]' MCMC burn rate (default value 0.5). * '-v , -version' DEploid version. * '-vcfOut' Save final halpotypes into a VCF file. ## You may also try * '-ref [file] -alt [file]' File path of reference and alternative allele count (tab-delimited plain text file). Note: In early ``dEploid`` versions (prior to `v0.2-release`), allele counts extracted from the vcf file are placed in two files, and parsed by flags `-ref [file]` and `-alt [file]`. Tab-delimited plain text for input. First and second columns record chromosome and position labels respectively. Third columns records the reference allele count or alternative allele count. For example, CHROM | POS | PG0390.C -------|-----|--------- Pf3D7_01_v3|93157|85 Pf3D7_01_v3|94422|77 Pf3D7_01_v3|94459|90 Pf3D7_01_v3|94487|79 CHROM | POS | PG0390.C -------|-----|--------- Pf3D7_01_v3|93157|0 Pf3D7_01_v3|94422|0 Pf3D7_01_v3|94459|0 Pf3D7_01_v3|94487|0 Warning: Flags `-ref` and `-alt` should not be used with `-vcf`. * `-forbidUpdateProp` Forbid MCMC moves to update proportions. * `-forbidUpdateSingle` Forbid MCMC moves to update single haplotype. * `-forbidUpdatePair` Forbid MCMC moves to update pair haplotypes. * `-exportPostProb` Save the posterior probabilities of the final iteration of all strains. * `-miss [float]` Miss copying probability * `-recomb [float]` Constant recombination probability * `-initialP [float ...]` Initialize proportions. * `-p [int]` Output precision (default value 8). # Example Data exploration, plot the read count `ALT` vs `REF`. ```{r, fig.show='hold', fig.width=5, fig.height=5} library(DEploid) vcfFile <- system.file("extdata", "PG0390-C.test.vcf.gz", package = "DEploid") PG0390 <- extractCoverageFromVcf(vcfFile, "PG0390-C") plotAltVsRef(PG0390$refCount, PG0390$altCount) ``` Plot the histogram of the observed allele frequency within sample. ```{r, fig.show='hold', fig.width=5, fig.height=5} obsWSAF <- computeObsWSAF(PG0390$altCount, PG0390$refCount) histWSAF(obsWSAF) ``` Load prior information: PLAF and reference panel ```{r, fig.show='hold', fig.width=5, fig.height=5} plafFile <- system.file("extdata", "labStrains.test.PLAF.txt", package = "DEploid") plaf <- extractPLAF(plafFile) panelFile <- system.file("extdata", "labStrains.test.panel.txt", package = "DEploid") ``` Deconvolute the haplotypes ```{r, fig.show='hold', fig.width=5, fig.height=5} set.seed(1) PG0390.deconv <- dEploid(paste("-vcf", vcfFile, "-plaf", plafFile, "-panel", panelFile)) prop <- PG0390.deconv$Proportions[dim(PG0390.deconv$Proportions)[1], ] expWSAF <- t(PG0390.deconv$Haps) %*% prop ``` Plot the allele frequency within sample (observed in red, expected in blue) against the population level allele frequency. ```{r, fig.show='hold', fig.width=5, fig.height=5} plotWSAFvsPLAF(plaf, obsWSAF, expWSAF) ``` Plot the history of the MCMC proportion estimates. ```{r, fig.show='hold', fig.width=5, fig.height=5} plotProportions(PG0390.deconv$Proportions) ``` Plot the allele frequency within sample, observed against expected. ```{r, fig.show='hold', fig.width=5, fig.height=5} plotObsExpWSAF(obsWSAF, expWSAF) ``` Reporting Bugs ============== If you encounter any problem when using ``dEploid``, please file a short bug report by using the [issue tracker](https://github.com/mcveanlab/DEploid/issues) on GitHub or email joe.zhu (at) well.ox.ac.uk. Please include the output of ``dEploid -v`` and the platform you are using ``dEploid`` on in the report. If the problem occurs while executing ``dEploid``, please also include the command you are using and the random seed. Thank you! Citing DEploid ============== If you use `dEploid` with the flag `-ibd`, please cite the following paper: Zhu, J. S., J. A. Hendry, J. Almagro-Garcia, R. D. Pearson, R. Amato, A. Miles, D. J. Weiss, T. C. D. Lucas, M. Nguyen, P. W. Gething, D. Kwiatkowski, G. McVean, and for the Pf3k Project. (2018) The origins and relatedness structure of mixed infections vary with local prevalence of *P. falciparum* malaria. *eLife*, 40845, doi: https://doi.org/10.7554/eLife.40845. If you use `dEploid` in your work, please cite the program: Zhu, J. S., J. A. Garcia, G. McVean. (2018) Deconvolution of multiple infections in *Plasmodium falciparum* from high throughput sequencing data. *Bioinformatics* 34(1), 9-15. doi: https://doi.org/10.1093/bioinformatics/btx530. Bibtex record:: @article {Zhu387266, author = {Zhu, Sha Joe and Hendry, Jason A. and Almagro-Garcia, Jacob and Pearson, Richard D. and Amato, Roberto and Miles, Alistair and Weiss, Daniel J. and Lucas, Tim C.D. and Nguyen, Michele and Gething, Peter W. and Kwiatkowski, Dominic and McVean, Gil and ,}, title = {The origins and relatedness structure of mixed infections vary with local prevalence of P. falciparum malaria}, year = {2018}, doi = {10.1101/387266}, publisher = {Cold Spring Harbor Laboratory}, URL = {https://www.biorxiv.org/content/10.1101/387266v1}, eprint = {https://www.biorxiv.org/content/biorxiv/early/2018/08/09/387266.full.pdf}, journal = {bioRxiv} } @article {Zhubtx530, author = {Zhu, Sha Joe and Almagro-Garcia, Jacob and McVean, Gil}, title = {Deconvolution of multiple infections in {{\em Plasmodium falciparum}} from high throughput sequencing data}, year = {2017}, doi = {10.1093/bioinformatics/btx530}, URL = {https://doi.org/10.1093/bioinformatics/btx530}, journal = {Bioinformatics} }