---
title: "Using GxEScanR"
output:
  rmarkdown::html_vignette:
    toc: true
vignette: >
  %\VignetteIndexEntry{Using GxEScanR}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

This vignette shows some examples of using GxEScanR to perform
genome-wide association study (GWAS)
and genome-wide by environment interaction study (GWEIS)
scans using all the options available to the user.

```{r setup}
library(GxEScanR)
```

# Introduction

With growing number of SNPs that can be imputed it is necessary to have
software that can efficiently perform GWAS and GWEIS scans. GxEScanR can
do this using files that were saved in the BinaryDosage format. The BinaryDosage
package can convert VCF and GEN files into the BinaryDosage format. The
BinaryDosage format was designed to keep the file with the genetic data
small with fast read times. GxEScanR uses this and efficient large scale
regression routines to perform GWAS and GWEIS scans quickly.

# Example Files

The examples below use three sample files. The first contains a data frame
that has subject data. The second file is a genetic data file in the
BinaryDosage format. The last file contains the information returned by
the BinaryDosage::getbdinfo routine that returns information about the binary
dosage file that makes reading it fast.

``` {r, eval = T, echo = T, message = F, warning = F, tidy = T}
covdatafile <- system.file("extdata", "covdata.rds", package = "GxEScanR")
covdata <- readRDS(covdatafile)
```

``` {r, eval = T, echo = F, message = F, warning = F, tidy = T}
knitr::kable(covdata[1:5,], caption = "First 5 Subjects")
```
To load the binary dosage information file, it is necessary to update the
file name since the file has been moved from its original location during the
installation process. The following loads the binary dosage information and
updates the file name.

``` {r, eval = T, echo = T, message = F, warning = F, tidy = T}
bdinfofile <- system.file("extdata", "pdata_4_1.bdinfo", package = "GxEScanR")
bdinfo <- readRDS(bdinfofile)
bdinfo$filename <- system.file("extdata", "pdata_4_1.bdose", package = "GxEScanR")
```

``` {r, eval = T, echo = F, message = F, warning = F, tidy = T}
modeldf <- readRDS(system.file("extdata", "models.rds", package = "GxEScanR"))
knitr::kable(modeldf, caption = "Models Fit")
```

# Examples

## Linear Regression GWAS

The simplest scan to do is a linear regression GWAS. The following model is
first when doing a linear regression GWAS.
``` {r, eval = T, echo = F, message = F, warning = F, tidy = T}
knitr::kable(modeldf[1,], caption = "Model Fit")
```

In the example data set,
the phenotype, y, is coded 0,1. When GxEScanR sees the phenotype codes this way
it assumes the outcome is binary and uses logistic regression. To perform
a linear regression GWAS the binary option needs to be set to FALSE. The
following shows how to do a linear GWAS along with the results.

The routine outputs the number of subjects used in the analysis and returns
a data frame with the results.

``` {r, eval = T, echo = T, message = F, warning = F, tidy = T}
lingwas1 <- gwas(data = covdata,
                 bdinfo = bdinfo,
                 binary = FALSE)
```

``` {r, eval = T, echo = F, message = F, warning = F, tidy = T}
knitr::kable(lingwas1, caption = "Linear Regression GWAS")
```

The output can be redirected to output file that produces a plain test
version of the results in a tab delimited file that can be read into R
using the read.table routine. In this case, the gwas routine returns a
value of 0.

``` {r, eval = T, echo = T, message = F, warning = F, tidy = T}
outfile <- tempfile()
lingwas2 <- gwas(data = covdata,
                 bdinfo = bdinfo,
                 outfile = outfile,
                 binary = FALSE)
lingwas2
lingwas2 <- read.table(outfile, header = TRUE, sep ='\t')
```

``` {r, eval = T, echo = F, message = F, warning = F, tidy = T}
knitr::kable(lingwas2, caption = "Linear Regression GWAS")
```

## Linear Regression GWEIS

The gweis routine takes the same parameters as the gwas function but performs
additional tests. The models fit for a linear regression GWAS are.
``` {r, eval = T, echo = F, message = F, warning = F, tidy = T}
knitr::kable(modeldf[1:2,], caption = "Models Fit")
```

Note: When doing a GWEIS the interaction covariate is in the last column of the
subject data frame.

In this test the minmaf option was used. When minmaf is specified the minor
allele for a SNP must exceed minmaf to be test. Notice that only 5 SNPs are
in the output data frame. This is because one of the SNPs has a minor allele
frequency below 0.2.

``` {r, eval = T, echo = T, message = F, warning = F, tidy = T}
lingweis1 <- gweis(data = covdata,
                   bdinfo = bdinfo,
                   minmaf = 0.2,
                   binary = FALSE)
```

``` {r, eval = T, echo = F, message = F, warning = F, tidy = T}
knitr::kable(lingweis1, caption = "Linear Regression GWEIS")
```

If the user is interested in see what happened to SNPs that weren't included
in the data frame, the skipfile option can be used. The skipfile value is the
name of a file to write the skipped SNPs to. The skipfile option can be used
along with the outfile option. The skip file is in the same format as the
output file. Below is an example using the skipfile option.

``` {r, eval = T, echo = T, message = F, warning = F, tidy = T}
skipfile = tempfile()
lingweis2 <- gweis(data = covdata,
                   bdinfo = bdinfo,
                   skipfile = skipfile,
                   minmaf = 0.2,
                   binary = FALSE)
```

``` {r, eval = T, echo = F, message = F, warning = F, tidy = T}
knitr::kable(lingweis2, caption = "Linear Regression GWEIS")
skipsnps <- read.table(skipfile, header = TRUE, sep = '\t')
```

``` {r, eval = T, echo = T, message = F, warning = F, tidy = T}
knitr::kable(skipsnps, caption = "Skipped SNPs")
```

The following table lists the reasons SNPs were skipped given the skipped value.
``` {r, eval = T, echo = F, message = F, warning = F, tidy = T}
reasondf <- readRDS(system.file("extdata", "skipreason.rds", package = "GxEScanR"))
knitr::kable(reasondf, caption = "Skipped Reasons")
```

## Logistic Regression GWAS

In this example, the phenotype is coded (0,1). The gwas and gweis routines check
for this an will run logistic regressions if the outcome is coded (0,1) unless
binary is set to FALSE. If the use wants to make sure the outcome is coded
(0,1), the user may set binary to TRUE. In this case, if the outcome is not
coded (0,1) an error is produced.

The following model is fit when doing a logistic regression GWAS.
``` {r, eval = T, echo = F, message = F, warning = F, tidy = T}
knitr::kable(modeldf[1,], caption = "Model Fit")
```

``` {r, eval = T, echo = T, message = F, warning = F, tidy = T}
loggwas1 <- gwas(data = covdata,
                 bdinfo = bdinfo,
                 blksize = 2,
                 binary = TRUE)
```

``` {r, eval = T, echo = F, message = F, warning = F, tidy = T}
knitr::kable(loggwas1, caption = "Logistic Regression GWAS", digits = 4)
```

In this example, the option blksize is used. When an analysis is run several
SNPs are read in at one time. This saves disk time. The following are the
default values for given the number of subjects. These values were chosen to
keep the program running using less than 4GB of RAM. The user is allowed to
specify a value up to twice the default value. Little performance gain is seen
going with larger values. If the user enters 0 for blksize, the default value
is used.

``` {r, eval = T, echo = F, message = F, warning = F, tidy = T}
defaultdf <- readRDS(system.file("extdata", "defaultblk.rds", package = "GxEScanR"))
knitr::kable(defaultdf, caption = "Default blksize")
```

## Logistic Regression GWEIS

A logistic regression GWEIS fits an additional 4 models that produce 7 more
tests. 3 of these models use the the interaction covariate as the outcome.
The following show all the models fit in a logistic regression GWEIS.

The following model is fit when doing a logistic regression GWAS.
``` {r, eval = T, echo = F, message = F, warning = F, tidy = T}
knitr::kable(modeldf, caption = "Models Fit")
```

Note: When doing a GWEIS the interaction covariate is in the last column of
the data frame.

### Logistic Regression GWEIS with Binary Covariate

In the example subject data, the covariate is coded (0,1). In this case, the
gweis routine will use logistic regression to fit the last 3 models.

``` {r, eval = T, echo = T, message = F, warning = F, tidy = T}
loggweis1 <- gweis(data = covdata,
                   bdinfo = bdinfo,
                   snps = 1:2,
                   binary = TRUE)
```

``` {r, eval = T, echo = F, message = F, warning = F, tidy = T}
knitr::kable(loggweis1, caption = "Logistic Regression GWEIS", digits = 4)
```

In this example the snps options was used. The snps option can either be a
vector of indices indicating what SNPs to include or a list of SNPs by SNP ID.
A vector of indices was used in this example.

### Logistic Regression GWEIS with a Continuous Covariate

In the example subject data, the covariate is coded (0,1) which the gweis
routine sees a binary covariate to make the routine do a linear regression
1 can be added to the interaction covariate. This will change the coding to
(1,2) which the routine sees as a continuous covariate.

``` {r, eval = T, echo = T, message = F, warning = F, tidy = T}
covdata2 <- covdata
covdata2$e <- covdata2$e + 1
loggweis2 <- gweis(data = covdata2,
                   bdinfo = bdinfo,
                   snps = c("1:10001", "1:10002"))
```

``` {r, eval = T, echo = F, message = F, warning = F, tidy = T}
knitr::kable(loggweis2, caption = "Logistic Regression GWEIS", digits = 4)
```

In this example the snps options was used with the SNP IDs.