---
title: "joineRML and the broom package"
author: "Alessandro Gasparini"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{joineRML and the broom package}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
---

```{r load-joineRML, include=FALSE}
library(RcppArmadillo)
RcppArmadillo::armadillo_throttle_cores(2)

Sys.setenv(OMP_THREAD_LIMIT = 1)
Sys.setenv(OMP_NUM_THREADS = 1)
Sys.setenv(OPENBLAS_NUM_THREADS = 1)
Sys.setenv(ARMA_OPENMP_THREADS = 1)
Sys.setenv(R_INSTALL_NCPUS = 1)
options(Ncpus = 1)

library(joineRML)
library(knitr)
```

# Introduction

Tidiers for objects of class `mjoint` have been included in latest release of `joineRML` package (0.4.5).

The purpose of these tidiers are described in the introductory vignette to `broom`:

> The broom package takes the messy output of built-in functions in R, such as `lm`, `nls`, or `t.test`, and turns them into tidy data frames.

There are three distinct tidiers included with `broom`:

* `tidy`: constructs a data frame that summarises the model estimates;

* `augment`: add columns to the original data that was modeled;

* `glance`: construct a concise one-row summary of the model. 

These methods are specifically useful when plotting results of a joint model or when comparing several models (e.g. in terms of fit).

# Example

We use the sample example from the introductory vignette to `joineRML` using the heart valve data.

```{r vignette, eval=FALSE, purl=FALSE}
vignette("joineRML", package = "joineRML")
help("heart.valve", package = "joineRML")
```

We analyse only the records with case-complete data for heart valve gradient (`grad`) and left ventricular mass index (`lvmi`):

```{r hvd_data}
data(heart.valve)
hvd <- heart.valve[!is.na(heart.valve$grad) & !is.na(heart.valve$lvmi), ]
```

Further to that, we only select the first 50 individuals to speed up these examples:

```{r hvd_data_small}
hvd <- hvd[hvd$num <= 50, ]
```

```{r hvd_model_fit}
set.seed(12345)
fit <- mjoint(
  formLongFixed = list(
    "grad" = log.grad ~ time + sex + hs,
    "lvmi" = log.lvmi ~ time + sex
  ),
  formLongRandom = list(
    "grad" = ~ 1 | num,
    "lvmi" = ~ time | num
  ),
  formSurv = Surv(fuyrs, status) ~ age,
  data = list(hvd, hvd),
  timeVar = "time"
)
```

## `tidy` method

The `tidy` method returns a tidy dataset with model estimates. 

```{r tidy}
tidy(fit)
```

By default the `tidy` method returns the estimated coefficients for the survival component of the joint model; however, it is possible to extract each component by setting the `component` argument:

```{r tidy-long}
tidy(fit, component = "longitudinal")
```

It is also possible to require confidence intervals to be calculated by setting `conf.int = TRUE`, and modify the confidence level by setting the `conf.level` argument:

```{r tidy-ci}
tidy(fit, ci = TRUE)
tidy(fit, ci = TRUE, conf.level = 0.99)
```

The standard errors reported by default are based on the empirical information matrix, as in `mjoint`. It is of course possible to use bootstrapped standard errors as follows:

```{r tidy-boot, eval=FALSE, purl=FALSE}
bSE <- bootSE(fit, 
              nboot = 100, 
              safe.boot = TRUE, 
              progress = FALSE,
              ncores = 1L)
tidy(fit, boot_se = bSE, conf.int = TRUE)
```

The results of this example are not included as it would take too long to run for CRAN.

The `tidy` method is useful for custom plotting (e.g. forest plots) of results from `joineRML` models, all in a tidy framework:

```{r tidy-plotting, fig.height=5, fig.width=5, fig.align="center"}
library(ggplot2)
out <- tidy(fit, conf.int = TRUE)
ggplot(out, aes(x = term, y = estimate, ymin = conf.low, ymax = conf.high)) +
  geom_point() +
  geom_errorbar()
```

## `augment` method

The `augment` method returns a dataset with added predictions from the joint model. In particular, population-level and individual-level fitted values and residuals are added to the data frame returned by the method:

```{r augment-nd-grad}
preds <- augment(fit)
head(preds[, c("num", "log.grad", ".fitted_grad_0", ".fitted_grad_1", ".resid_grad_0", ".resid_grad_1")])
```

```{r augment-nd-lvmi}
head(preds[, c("num", "log.lvmi", ".fitted_lvmi_0", ".fitted_lvmi_1", ".resid_lvmi_0", ".resid_lvmi_1")])
```

We can plot the resulting predictions for four distinct individuals:

```{r augment-plot}
out <- preds[preds$num %in% c(26, 36, 227, 244), ]
ggplot(out, aes(x = time, colour = num)) +
  geom_line(aes(y = log.grad, linetype = "Measured")) +
  geom_line(aes(y = .fitted_grad_1, linetype = "Fitted")) +
  labs(linetype = "Type", colour = "ID", y = "Aortic gradient")
```

## `glance` method

The `glance` method allows extracting summary statistics from the joint model:

```{r glance-fit}
glance(fit)
```

This allows comparing competing models easily. Say for instance that we fit a second model with only random intercepts:

```{r glance-fit2}
set.seed(67890)
fit2 <- mjoint(
  formLongFixed = list(
    "grad" = log.grad ~ time + sex + hs,
    "lvmi" = log.lvmi ~ time + sex
  ),
  formLongRandom = list(
    "grad" = ~ 1 | num,
    "lvmi" = ~ 1 | num
  ),
  formSurv = Surv(fuyrs, status) ~ age,
  data = list(hvd, hvd),
  timeVar = "time"
)
```

We can go ahead and compare the models in terms of AIC and BIC:

```{r glance-comparison}
glance(fit)
glance(fit2)
```

## Additional examples

Several examples of how to use `broom` including more details are available on its introductory vignette:

```{r vignette-broom, eval=FALSE, purl=FALSE}
vignette(topic = "broom", package = "broom")
```

```{r, echo=FALSE, message=FALSE}
RcppArmadillo::armadillo_reset_cores()
```