---
title: "Getting Started with AIPW"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting Started with AIPW}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  fig.width = 6,
  comment = "#>"#,
  # cache=TRUE
)
```


Contents:

  * [Installation](#Installation)
  * [One-line version](#one_line)
  * [Longer version](#details)
    + [Create an AIPW object](#constructor)
    + [Fit the object](#fit)
    + [Calculate average treatment effects](#ate)
    + [Calculate average treatment effects among the treated](#att)
  * [Parallelization](#par)
  * [Using tmle/tmle3 as input](#tmle_input)
    + [tmle](#tmle)


## <a id="Installation"></a>Installation
1. Install AIPW from [GitHub](https://github.com/yqzhong7/AIPW)

```{r, eval = FALSE}
install.packages("remotes")
remotes::install_github("yqzhong7/AIPW")
```
__* CRAN version only supports SuperLearner and tmle. Please install the Github version (master branch) to use sl3 and tmle3.__

2. Install [SuperLearner](https://CRAN.R-project.org/package=SuperLearner) or [sl3](https://tlverse.org/sl3/articles/intro_sl3.html)

```{r, eval = FALSE}
#SuperLearner
install.packages("SuperLearner")
#sl3
remotes::install_github("tlverse/sl3")
install.packages("Rsolnp")
```

## Input data for analyses 

```{r example data}
library(AIPW)
library(SuperLearner)
library(ggplot2)
set.seed(123)
data("eager_sim_obs")
cov = c("eligibility","loss_num","age", "time_try_pregnant","BMI","meanAP")
```

## Using AIPW to estimate the average treatment effect
### <a id="one_line"></a>One line version (Method chaining from R6class)
Using native AIPW class allows users to define different covariate sets for the exposure and the outcome models, respectively.

```{r one_line}
AIPW_SL <- AIPW$new(Y= eager_sim_obs$sim_Y,
                    A= eager_sim_obs$sim_A,
                    W= subset(eager_sim_obs,select=cov), 
                    Q.SL.library = c("SL.mean","SL.glm"),
                    g.SL.library = c("SL.mean","SL.glm"),
                    k_split = 10,
                    verbose=FALSE)$
  fit()$
  #Default truncation is set to 0.025; using 0.25 here is for illustrative purposes and not recommended
  summary(g.bound = c(0.25,0.75))$
  plot.p_score()$
  plot.ip_weights()
```

### <a id="details"></a>A more detailed tutorial

#### 1. <a id="constructor"></a>Create an AIPW object 

  * ##### Use [SuperLearner](https://CRAN.R-project.org/package=SuperLearner) libraries
```{r SuperLearner, message=FALSE,eval=F}
library(AIPW)
library(SuperLearner)

#SuperLearner libraries for outcome (Q) and exposure models (g)
sl.lib <- c("SL.mean","SL.glm")

#construct an aipw object for later estimations 
AIPW_SL <- AIPW$new(Y= eager_sim_obs$sim_Y,
                    A= eager_sim_obs$sim_A,
                    W= subset(eager_sim_obs,select=cov), 
                    Q.SL.library = sl.lib,
                    g.SL.library = sl.lib,
                    k_split = 10,
                    verbose=FALSE)
```

If outcome is missing, analysis assumes missing at random (MAR) by estimating propensity scores with I(A=a, observed=1). Missing exposure is not supported.

#### 2. <a id="fit"></a>Fit the AIPW object

This step will fit the data stored in the AIPW object to obtain estimates for later average treatment effect calculations.

```{r}
#fit the AIPW_SL object
AIPW_SL$fit()
# or you can use stratified_fit
# AIPW_SL$stratified_fit()
```

#### 3. <a id="ate"></a>Calculate average treatment effects 

  * ##### Estimate the ATE with propensity scores truncation
```{r}
#estimate the average causal effects from the fitted AIPW_SL object 
AIPW_SL$summary(g.bound = 0.25) #propensity score truncation 
```

  * ##### Check the balance of propensity scores and inverse probability weights by exposure status after truncation
```{r ps_trunc}
library(ggplot2)
AIPW_SL$plot.p_score()
AIPW_SL$plot.ip_weights()
```

#### 4. <a id="att"></a>Calculate average treatment effects among the treated/controls
* ##### `stratified_fit()` fits the outcome model by exposure status while `fit()` does not. Hence, `stratified_fit()` must be used to compute ATT/ATC [(Kennedy et al. 2015)](http://www.ehkennedy.com/uploads/5/8/4/5/58450265/2015_kennedy_et_al_-_semiparametric_causal_inference_in_matched_cohort_studies.pdf)

```{r}
suppressWarnings({
  AIPW_SL$stratified_fit()$summary()
})
```


## <a id="par"></a>Parallelization with future.apply

In default setting, the `AIPW$fit()` method will be run sequentially. The current version of AIPW package supports parallel processing implemented by [future.apply](https://github.com/HenrikBengtsson/future.apply) package under the [future](https://github.com/HenrikBengtsson/future) framework. Before creating a `AIPW` object, simply use `future::plan()` to enable parallelization and `set.seed()` to take care of the random number generation (RNG) problem:

```{r parallel, eval=FALSE}
# install.packages("future.apply")
library(future.apply)
plan(multiprocess, workers=2, gc=T)
set.seed(888)
AIPW_SL <- AIPW$new(Y= eager_sim_obs$sim_Y,
                    A= eager_sim_obs$sim_A,
                    W= subset(eager_sim_obs,select=cov), 
                    Q.SL.library = sl3.lib,
                    g.SL.library = sl3.lib,
                    k_split = 10,
                    verbose=FALSE)$fit()$summary()
```



## <a id="tmle_input"></a>Use `tmle` fitted object as input

AIPW shares similar intermediate estimates (nuisance functions) with the Targeted Maximum Likelihood / Minimum Loss-Based Estimation (TMLE). Therefore, `AIPW_tmle` class is designed for using `tmle` fitted object as input. Details about these two packages can be found [here](https://www.jstatsoft.org/article/view/v051i13) and [here](https://tlverse.org/tlverse-handbook/). This feature is designed for debugging and easy comparisons across these three packages because cross-fitting procedures are different in `tmle`. In addition, this feature does not support ATT outputs.

#### <a id="tmle"></a>`tmle`

As shown in the message, [tmle](https://CRAN.R-project.org/package=tmle) only support cross-fitting in the outcome model. 

```{r tmle, eval=F}
# install.packages("tmle")
library(tmle)
library(SuperLearner)

tmle_fit <- tmle(Y=eager_sim_obs$sim_Y,
                 A=eager_sim_obs$sim_A,
                 W=eager_sim_obs[,-1:-2], 
                 Q.SL.library=c("SL.mean","SL.glm"),
                 g.SL.library=c("SL.mean","SL.glm"),
                 family="binomial",
                 cvQinit = TRUE)

cat("\nEstimates from TMLE\n")
unlist(tmle_fit$estimates$ATE)
unlist(tmle_fit$estimates$RR)
unlist(tmle_fit$estimates$OR)

cat("\nEstimates from AIPW\n")
a_tmle <- AIPW_tmle$
  new(A=eager_sim_obs$sim_A,Y=eager_sim_obs$sim_Y,tmle_fit = tmle_fit,verbose = TRUE)$
  summary(g.bound=0.025)
```