---
title: "GLMMadaptive Basics"
author: "Dimitris Rizopoulos"
date: "`r Sys.Date()`"
output:
  rmarkdown::html_vignette:
    toc: true
vignette: >
  %\VignetteIndexEntry{GLMMadaptive Basics}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library("GLMMadaptive")
```


# Generalized Linear Mixed Models - Theory
## Model Specification
Package **GLMMadaptive** provides a suit of functions for fitting and post-processing 
mixed effects models for grouped/clustered outcomes which have a distribution other than a
normal distribution. In particular, let $y_i$ denote a vector of grouped/clustered outcome for
the $i$-th sample unit ($i = 1, \ldots, n$). The conditional distribution of $y_i$ given a 
vector of random effects $b_i$ is assumed to be a member of the extended exponential 
family, with linear predictor given by $$g\{E(y_i \mid b_i)\} = X_i \beta + Z_i b_i,$$ 
where $g(\cdot)$ denotes a monotonic link function, $X_i$ a design matrix for the fixed 
effects coefficients $\beta$, and $Z_i$ a design matrix for the random effects 
coefficients $b_i$. Typically, matrix $Z_i$ is assumed to be a subset of $X_i$. The random
effects are assumed to follow a normal distribution with mean 0 and variance-covariance 
matrix $D$. In addition, the distribution $[y_i \mid b_i]$ may potentially have 
extra dispersion/shape parameters $\phi$.

## Estimation
The package focuses in settings in which the distribution $[y_i \mid b_i]$ is not normal
and/or the link function $g(\cdot)$ is not the identity. In these settings, the estimation
of the model is complicated by the fact that the marginal log-likelihood function of the 
observed $y_i$ cannot be derived analytically. In particular, the log-likelihood function
has the form: 
$$\begin{eqnarray*}
\ell(\theta) & = & \sum_{i = 1}^n \log p(y_i; \theta)\\
& = & \sum_{i = 1}^n \log \int p(y_i \mid b_i; \theta) \, p(b_i; \theta) \, db_i,
\end{eqnarray*}$$
where $\theta$ denotes the full parameter vector including the fixed effects, the extra
potential dispersion/shape parameters $\phi$ and the unique element of the covariance 
matrix $D$, and $p(\cdot)$ denotes a probability density or probability mass function. The 
integral in the definition of $\ell(\theta)$ does not have a closed-form solution, and 
numerical approximations are required to obtain the maximum likelihood estimates.

In the literature several approaches have been proposed to approximate such integrals, and
a nice overview is given in 
[Pinheiro and Chao (2006)](https://doi.org/10.1198/106186006X96962). A typical approach to
approximate these integrals is the Laplace approximation. However, the general consensus 
has been that in the standard but difficult cases of binary/dichotomous data and count data
with small counts and few repeated measurements, the accuracy of this approximation is 
rather low. Due to this fact, the general consensus is that the gold standard numerical 
approximation method is the adaptive Gaussian quadrature rule (*note: we focus here on 
maximum likelihood estimation; under the Bayesian paradigm, approaches, such as, MCMC and
Hamiltonian Monte Carlo also provide accurate evaluation of the integrals*). This is more 
computationally intensive but also more accurate. This package provides an efficient 
implementation of the adaptive Gaussian quadrature rule, allowing for multiple correlated 
random effects (e.g., random intercepts, linear and quadratic random slopes) but currently
a single grouping factor (i.e., no nested or crossed random effects designs).

A hybrid optimization procedure is implemented starting with an EM algorithm, treating the
random effects as 'missing data', followed by a direct optimization procedure with a 
quasi-Newton algorithm. Fine control of this procedure is allowed with a series of control
arguments.

# Generalized Linear Mixed Models - Practice
## Mixed Effects Logistic Regression
We illustrate the use of the package in the standard case of a mixed effects logistic 
regression. That is, the distribution of $[y_i \mid b_i]$ is binomial, and the distribution
of $[b_i]$ multivariate normal. 

We start by simulating some data for a binary longitudinal outcome:
```{r}
set.seed(1234)
n <- 100 # number of subjects
K <- 8 # number of measurements per subject
t_max <- 15 # maximum follow-up time

# we construct a data frame with the design: 
# everyone has a baseline measurement, and then measurements at random follow-up times
DF <- data.frame(id = rep(seq_len(n), each = K),
                 time = c(replicate(n, c(0, sort(runif(K - 1, 0, t_max))))),
                 sex = rep(gl(2, n/2, labels = c("male", "female")), each = K))

# design matrices for the fixed and random effects
X <- model.matrix(~ sex * time, data = DF)
Z <- model.matrix(~ time, data = DF)

betas <- c(-2.13, -0.25, 0.24, -0.05) # fixed effects coefficients
D11 <- 0.48 # variance of random intercepts
D22 <- 0.1 # variance of random slopes

# we simulate random effects
b <- cbind(rnorm(n, sd = sqrt(D11)), rnorm(n, sd = sqrt(D22)))
# linear predictor
eta_y <- drop(X %*% betas + rowSums(Z * b[DF$id, ]))
# we simulate binary longitudinal data
DF$y <- rbinom(n * K, 1, plogis(eta_y))
```

We fit a mixed effects logistic regression for `y`, assuming random
intercepts for the random-effects part. The basic model-fitting function in 
**GLMMadaptive** is called `mixed_model()`, and has four required arguments, namely 
`fixed` a formula for the fixed effects, `random` a formula for the random effects, 
`family` a family object specifying the type of response variable, and `data` a data frame
containing the variables in the previously mentioned formulas. Hence, the call to fit the 
random intercepts logistic regression is:
```{r}
fm1 <- mixed_model(fixed = y ~ sex * time, random = ~ 1 | id, data = DF,
                   family = binomial())
```

The summary method gives a detailed output of the model:
```{r}
summary(fm1)
```

We continue by checking the impact of the chosen number of quadrature points to the
parameters estimates and the log-likelihood value at convergence. First, we refit the 
model with an increasing number of quadrature points. The default when the number of 
random effects is smaller or equal to two is 11 points. We fit then with 15, and 21 points:
```{r}
fm1_q11 <- fm1
fm1_q15 <- update(fm1_q11, nAGQ = 15)
fm1_q21 <- update(fm1_q11, nAGQ = 21)

models <- list("nAGQ=11" = fm1_q11, "nAGQ=15" = fm1_q15, "nAGQ=21" = fm1_q21)
```

We now extract from the model the estimated parameter for the fixed effects (using 
function `fixef()`), for the random effects, and the log-likelihood (using function 
`logLik()`):
```{r}
extract <- function (obj) {
    c(fixef(obj), "var_(Intercept)" = obj$D[1, 1], "logLik" = logLik(obj))
}

sapply(models, extract)
```

We observe a rather stable model with virtually no differences between the different choices
of quadrature points.

We first compare the model with a simple logistic regression that does not include any
random effects:
```{r}
km <- glm(y ~ sex * time, data = DF, family = binomial())

anova(fm1, km)
```

We obtain a highly significant p-value suggesting that there are correlations in the data
that cannot be ignored. *Note:* the `anova()` method that performs the likelihood ratio
test calculates the p-value using the standard $\chi^2$ distribution, here with one 
degree of freedom. However, because the null hypothesis for testing variance parameters is
on the boundary of the corresponding parameter space, it would be more appropriate to use 
a mixture of $\chi^2$ distributions.

We extend model `fm1` by also including a random slopes term; however, we assume that the 
covariance between the random intercepts and random slopes is zero. This is achieved by
using the `||` symbol in the specification of the `random` argument, i.e.,
```{r}
fm2 <- mixed_model(fixed = y ~ sex * time, random = ~ time || id, data = DF,
                   family = binomial())
```

The likelihood ratio test between the two models is computed with function `anova()`. When
two `"MixMod"` objects are provided, the function assumes that the first object represents
the model under the null hypothesis, and the second object the model under the 
alternative, i.e.,
```{r}
anova(fm1, fm2)
```

The results suggest that we need the random slopes term. We continue by testing whether 
the covariance between the random effects terms is zero. The model under the alternative
hypothesis is (results not shown):
```{r, eval = TRUE}
fm3 <- mixed_model(fixed = y ~ sex * time, random = ~ time | id, data = DF,
                   family = binomial())
```

And again the likelihood ratio test is performed by a call to `anova()` (results not shown):
```{r, eval = TRUE}
anova(fm2, fm3)
```

The results now suggest that indeed the covariance between the two random effects terms 
is not statistically different from zero.

## Penalized Mixed Effects Poisson Regression
We continue our illustration with a Poisson mixed effects model. We start again by 
simulating some data for a Poisson longitudinal outcome:
```{r}
set.seed(1234)
n <- 100 # number of subjects
K <- 8 # number of measurements per subject
t_max <- 15 # maximum follow-up time

# we construct a data frame with the design: 
# everyone has a baseline measurement, and then measurements at random follow-up times
DF <- data.frame(id = rep(seq_len(n), each = K),
                 time = c(replicate(n, c(0, sort(runif(K - 1, 0, t_max))))),
                 sex = rep(gl(2, n/2, labels = c("male", "female")), each = K))

# design matrices for the fixed and random effects
X <- model.matrix(~ sex * time, data = DF)

betas <- c(2.13, -0.25, 0.24, -0.05) # fixed effects coefficients
D11 <- 0.48 # variance of random intercepts

# we simulate random effects
b <- rnorm(n, sd = sqrt(D11))
# linear predictor
eta_y <- drop(X %*% betas + b[DF$id])
# we simulate Poisson longitudinal data
DF$y <- rpois(n * K, exp(eta_y))
```

We fit the mixed effects Poisson regression for `y` assuming random intercepts for the 
random-effects part.
```{r, eval = TRUE}
gm1 <- mixed_model(fixed = y ~ sex * time, random = ~ 1 | id, data = DF,
                   family = poisson())
```

As previously, the summary method gives a detailed output of the model:
```{r, eval = TRUE}
summary(gm1)
```

In settings with few subjects, repeated measurements of separation effects, the package 
also allows to place a penalty in the fixed effects regression coefficients $\beta$. The
penalty/prior is in the form of a Student's t distribution with mean 0, scale parameter 1, 
and 3 degrees of freedom, and it is placed in all $\beta$ coefficients except from the 
intercept. The penalized model can be fitted by setting argument `penalized` to `TRUE`, 
i.e.,
```{r, eval = TRUE}
gm2 <- mixed_model(fixed = y ~ sex * time, random = ~ 1 | id, data = DF,
                   family = poisson(), penalized = TRUE)
```

```{r, eval = TRUE}
cbind('unpenalized' = fixef(gm1), 'penalized' = fixef(gm2))
```

In this example we observe small differences between the penalized and unpenalized models. 
The users have the option to alter the specification of the Student's t penalty by 
directly specifying the mean, scale and degrees of freedom arguments of the distribution.
For example, a ridge penalty could be placed by setting the degrees of freedom to a high
value. The call in this case should be:
```{r, eval = FALSE}
gm3 <- mixed_model(fixed = y ~ sex * time, random = ~ 1 | id, data = DF,
                   family = poisson(), 
                   penalized = list(pen_mu = 0, pen_sigma = 1, pen_df = 200))
```