---
title: "Simple Emax model fit with Stan"
author: "Kenta Yoshida"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Simple Emax model fit with Stan}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r message=FALSE}
library(rstanemax)
library(dplyr)
library(ggplot2)
set.seed(12345)
```

This vignette provide an overview of the workflow of Emax model analysis using this package.


# Typical workflow

## Model run with `stan_emax` function

`stan_emax()` is the main function of this package to perform Emax model analysis on the data.
This function requires minimum two input arguments - `formula` and `data`.
In the `formula` argument, you will specify which columns of `data` will be used as exposure and response data, in a format similar to `stats::lm()` function, e.g. `response ~ exposure`.

```{r, results="hide"}
data(exposure.response.sample)

fit.emax <- stan_emax(response ~ exposure,
  data = exposure.response.sample,
  # the next line is only to make the example go fast enough
  chains = 2, iter = 1000, seed = 12345
)
```

```{r}
fit.emax
```

`plot()` function shows the estimated Emax model curve with 95% credible intervals of parameters.

```{r plot_example, fig.show='hold'}
plot(fit.emax)
```

Output of `plot()` function (for `stanemax` object) is a `ggplot` object, so you can apply additional settings as you would do in `ggplot2`.  
Here is an example of using log scale for x axis (note that exposure == 0 is hanging at the very left, making the curve a bit weird).

```{r plot_example_log, fig.show='hold'}
plot(fit.emax) + scale_x_log10() + expand_limits(x = 1)
```


Raw output from `rstan` is stored in the output variable, and you can access it with `extract_stanfit()` function.

```{r}
class(extract_stanfit(fit.emax))
```

## Prediction of response with new exposure data

`posterior_predict()` function allows users to predict the response using new exposure data.
If `newdata` is not provided, the function returns the prediction on the exposures in original data.
The default output is a matrix of posterior predictions, but you can also specify "dataframe" or "tibble" that contain posterior predictions in a long format.
See help of `rstanemax::posterior_predict()` for the description of two predictions, `respHat` and `response`.

```{r}
response.pred <- posterior_predict(fit.emax, newdata = c(0, 100, 1000), returnType = "tibble")

response.pred %>% select(mcmcid, exposure, respHat, response)
```

You can also get quantiles of predictions with `posterior_predict_quantile()` function.

```{r}
resp.pred.quantile <- posterior_predict_quantile(fit.emax, newdata = seq(0, 5000, by = 100))
resp.pred.quantile
```

Input data can be obtained in a same format with `extract_obs_mod_frame()` function.

```{r}
obs.formatted <- extract_obs_mod_frame(fit.emax)
```

These are particularly useful when you want to plot the estimated Emax curve.

```{r plot_with_pp, fig.show='hold'}
ggplot(resp.pred.quantile, aes(exposure, ci_med)) +
  geom_line() +
  geom_ribbon(aes(ymin = ci_low, ymax = ci_high), alpha = .5) +
  geom_ribbon(aes(ymin = pi_low, ymax = pi_high), alpha = .2) +
  geom_point(
    data = obs.formatted,
    aes(y = response)
  ) +
  labs(y = "response")
```

Posterior draws of Emax model parameters can be extracted with `extract_param()` function.

```{r}
posterior.fit.emax <- extract_param(fit.emax)
posterior.fit.emax
```



# Fix parameter values in Emax model

You can fix parameter values in Emax model for Emax, E0 and/or gamma (Hill coefficient).
See help of `stan_emax()` for the details.
The default is to fix gamma at 1 and to estimate Emax and E0 from data.

Below is the example of estimating gamma from data.

```{r, results="hide"}
data(exposure.response.sample)

fit.emax.sigmoidal <- stan_emax(response ~ exposure,
  data = exposure.response.sample,
  gamma.fix = NULL,
  # the next line is only to make the example go fast enough
  chains = 2, iter = 1000, seed = 12345
)
```

```{r}
fit.emax.sigmoidal
```

You can compare the difference of posterior predictions between two models (in this case they are very close to each other):

```{r plot_with_gamma_fix, fig.width = 6, fig.height = 4, fig.show='hold'}
exposure_pred <- seq(min(exposure.response.sample$exposure),
  max(exposure.response.sample$exposure),
  length.out = 100
)

pred1 <-
  posterior_predict_quantile(fit.emax, exposure_pred) %>%
  mutate(model = "Emax")
pred2 <-
  posterior_predict_quantile(fit.emax.sigmoidal, exposure_pred) %>%
  mutate(model = "Sigmoidal Emax")

pred <- bind_rows(pred1, pred2)


ggplot(pred, aes(exposure, ci_med, color = model, fill = model)) +
  geom_line() +
  geom_ribbon(aes(ymin = ci_low, ymax = ci_high), alpha = .3) +
  geom_ribbon(aes(ymin = pi_low, ymax = pi_high), alpha = .1, color = NA) +
  geom_point(
    data = exposure.response.sample, aes(exposure, response),
    color = "black", fill = NA, size = 2
  ) +
  labs(y = "response")
```



# Set covariates

You can specify categorical covariates for Emax, EC50, and E0.
See help of `stan_emax()` for the details.


```{r, results="hide"}
data(exposure.response.sample.with.cov)

test.data <-
  mutate(exposure.response.sample.with.cov,
    SEX = ifelse(cov2 == "B0", "MALE", "FEMALE")
  )

fit.cov <- stan_emax(
  formula = resp ~ conc, data = test.data,
  param.cov = list(emax = "SEX"),
  # the next line is only to make the example go fast enough
  chains = 2, iter = 1000, seed = 12345
)
```

```{r plot_with_cov, fig.width = 6, fig.height = 4, fig.show='hold'}
fit.cov
plot(fit.cov)
```


You can extract MCMC samples from raw stanfit and evaluate differences between groups.

```{r compare_emax, fig.show='hold'}
fit.cov.posterior <-
  extract_param(fit.cov)

emax.posterior <-
  fit.cov.posterior %>%
  select(mcmcid, SEX, emax) %>%
  tidyr::pivot_wider(names_from = SEX, values_from = emax) %>%
  mutate(delta = FEMALE - MALE)

ggplot2::qplot(delta, data = emax.posterior, bins = 30) +
  ggplot2::labs(x = "emax[FEMALE] - emax[MALE]")

# Credible interval of delta
quantile(emax.posterior$delta, probs = c(0.025, 0.05, 0.5, 0.95, 0.975))

# Posterior probability of emax[FEMALE] < emax[MALE]
sum(emax.posterior$delta < 0) / nrow(emax.posterior)
```