---
title: "The Tendril Package"
author: "Martin Karpefors, Mark Edmondson-Jones, Hielke Bijlsma, Stefano Borini"
date: '`r Sys.Date()`'
output:
  rmarkdown::html_vignette: default
  rmarkdown::pdf_vignette: default
vignette: |
  %\VignetteIndexEntry{Intro to the Tendril plot usage}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

# Introduction

The `Tendril` package contains functions designed to compute the x-y
coordinates and to build a Tendril plot. Inspired by the [notabilia
visualization](http://notabilia.net), the Tendril plot was developed to capture
the relative effect of different kind of adverse events for two treatments,
including temporal aspects, in a single visualization. Specifically, each
tendril (branch) in the Tendril plot represents a type of adverse effect, and
the direction of the tendril is dictated by on which treatment arm the event is
occurring. If an event is occurring on the first of the two specified treatment
arms, the tendril bends clockwise (to the right). If an event is occurring on
the second of the treatment arms, the tendril bends anti-clockwise (to the
left). 

```{r example_plot, echo=TRUE, warning=FALSE, message=FALSE, fig.width=6, fig.height=5}
library(Tendril)

data("TendrilData")

test <- Tendril(mydata = TendrilData,
                rotations = Rotations,
                AEfreqThreshold = 9,
                Tag = "Comment",
                Treatments = c("placebo", "active"),
                Unique.Subject.Identifier = "subjid",
                Terms = "ae",
                Treat = "treatment",
                StartDay = "day",
                SubjList = SubjList,
                SubjList.subject = "subjid",
                SubjList.treatment = "treatment"
)
  
plot(test)
plot(test, coloring = "p.adj")
plot(test, term = c("AE40", "AE42", "AE44"))

```

In the plots above, a clinical trial with two treatment arms, placebo and
active, and 80 different adverse effects were simulated ("AE1" to "AE80"). As
outlined above, the Tendril plot is based on an algorithm that evaluates each
type of adverse event (AE) in sequence, producing a collection of tendrils
(branches) that effectively summarizes the time-resolved safety profile of a
clinical trial within a single plot. Events on the first treatment (placebo)
cause that tendril to bend clockwise to the right, and each event on the second
treatment (active) causes the tendril to bend anti-clockwise to the left. The
resulting tree-like structure clearly displays those adverse events having the
largest differences in relative risk (see AE40); AEs having only a transient
increased risk bending and then straightening (see AE42); and AEs that are
balanced over the treatment arms (see AE44).  In the first plot each tendril is
colored according to adverse event type and in the second, each event has been
colored according to the false discovery rate adjusted p value. There are a
number of statistical measures that could be used for colouring, see the
plot.Tendril documentation.

# Objects within the `Tendril` package

## The `Tendril` class
The result of the `Tendril` function is an object of the class `Tendril` that can be referenced as a base R list. It contains the following elements:

* `data` : a dataframe containing the original data, the calculated angles and coordinates used to produce the tendril plot and the statistical analysis results
* `Terms` : the name of the variable in the source dataset that records the event type (e.g. adverse event)
* `Treat` : the name of the variable in the source dataset that records the treatment
* `Treatments` : the available values of Treatments
* `StartDay` : the name of the variable in the source dataset that records the start day of the adverse event
* `Unique.Subject.Identifier` : the name of the variable in the source dataset that records the subject identifier
* `AEfreqThreshold` : the frequency threshold used to select tendrils
* `Tag` : a text label associated with the analysis
* `n.tot` : a dataframe with a single row and variables for the total number of events recorded for each of the treatments
* `SubjList` : A dataframe listing all the subjects in the trial, including those not having an AE, and corresponding treatments
* `SubjList.subject` :  the name of the column in `SubjList` containing the subject IDs
* `SubjList.treatment` : the name of the column in `SubjList` containing the treatments names

## The `TendrilPerm` class

The result of the `TendrilPerm` function is an object of the class `TendrilPerm`.  This object can also be referenced as a list with the following elements:

* `tendril` : A `Tendril` object corresponding to the arguments passed to `TendrilPerm`
* `PermTerm` : The event type for which permutations are computed
* `perm.data` : A dataframe recording the coordinates of the permuted tendril data
* `tendril.pi` : An object of class `TendrilPi` recording estimated percentiles on the assumption of balance between treatment arms

## The `TendrilPi` class

The `TendrilPerm` function outputs an object which contains an element of class
`TendrilPi`.  This is structurally similar to a data frame, with equal length
vector elements for event day (`StartDay`), `Terms`, `x` and `y` coordinates,
`Tag`, number of terms (`TermsCount`), `label` (whether upper or lower limit),
`type` (`"Percentile"`) and the day from which to permute (`perm.from.day`).

# Functions within the `Tendril` package

## `Tendril()`

The `Tendril` function requires several arguments.  The key argument is
`mydata` a data frame with at least four columns, corresponding to a subject
identifier, treatment arm, event type and day (relative to randomisation) of
onset.  Four character variables are also passed to denote the column name of
the required columns; these are `Unique.Subject.Identifier`, `Treat`, `Terms`
and `StartDay` respectively.  If any additional columns are present then these
are retained for subsequent analysis.

Additionally arguments are provided for the unadjusted angular displacement of
each event (`rotations`, either a single value for all records or a vector
which can vary by row of `mydata`); a minimum value for the number of events in
at least one arm (`AEfreqThreshold`); a text label to apply to the analysis as
a whole (`Tag`); the two treatments to be compared (`Treatments`, any other
treatments are ignored).

A data frame can optionally be passed as the argument `SubjList` which lists
all the subjects in the trial, including those not having an AE, and the
corresponding treatments to which each subject has been randomised along with
(optionally) the day to which each subject was followed up. 
Even though the SubjList data frame is optional, it is required to calculate
statistics and simulate permuted tendrils (described below).
Three character arguments (`SubjList.subject`, `SubjList.treatment` and
`SubjList.dropoutday`) are then also passed to allow the variables in
`SubjList` to be correctly identified.

Finally a number of binary flags can be passed to further control the analysis.
`compensate_imbalance_groups` allows for treatment group imbalance to be
compensated for, provided `SubjList` is present. `filter_double_events` allows
either all, or just the first event of each type to be recorded for each
subject. Finally, `suppress_warnings` allows warnings from the Chi-square test
to be disabled, as low counts can result in multiple warning messages.

A typical Tendril dataset might look like this:

```{r tendrildata_head, echo=FALSE, warning=FALSE, message=FALSE}

head(TendrilData)
```

Note the four columns containing the subject IDs (`subjid`), the treatment
(`treatment`), the adverse effect term (`ae`) and the days (`day`).


The `Tendril()` function could then be called as:
```{r call_Tendril, echo=TRUE, eval=TRUE, warning=FALSE, message=FALSE}
test <- Tendril(mydata = TendrilData,
                rotations = Rotations,
                AEfreqThreshold = 9,
                Tag = "Comment",
                Treatments = c("placebo", "active"),
                Unique.Subject.Identifier = "subjid",
                Terms = "ae",
                Treat = "treatment",
                StartDay = "day",
                SubjList = SubjList,
                SubjList.subject = "subjid",
                SubjList.treatment = "treatment"
)
```

NB: If there is any missing data in the subject identifier, treatment, event
type or onset day then such rows will be removed.

The function checks that the arguments are valid and then computes the angles
and coordinates of the x and y points in the tendrils based on the balance of
events between treatments with the angular displacement being determined by the
argument `rotations`.

The time between consecutive events of each type is proportional to the
distance between connected points on the tendril plot.  The angular
displacement at each point is determined by the excess number of treatments on
the first arm (rotating clockwise) or the excess number of treatments on the
second arm (rotating anticlockwise) at each point in time.

If the argument `SubjList` provides a data frame of subjects, treatments and
optionally drop-out days then `Tendril` calls the function `tendril_stat` to
estimate the statistical significance of the imbalance at each data point.
Statistical significance is estimated using an unadjusted Chi-square test
(`p`), a Chi-square test false discovery rate (FDR) adjusted locally per AE
(`p.adj`), or Fisher's Exact test (`fish`).  Additional statistics are provided
for the risk difference (`rdiff`), risk ratio (`RR`) and odds ratio (`OR`).

## `TendrilPerm`

The `TendrilPerm` function required an object of class `Tendril` to be passed (as `tendril`) which is the basis of permutations of the treatment assignment.  An argument is also supplied with the event type (`PermTerm`) for which permutations are required.  All other event types are ignored in the analysis and removed from the results.

Arguments can also be provided to specify the number of permutations (`n.perm`; defaults to 100), the day from which to permute treatments (`perm.from.day`; defaults to 1), the lower proportion to estimate (`pi.low`; defaults to 0.1, i.e. 10th percentile), and the upper proportion to estimate (`pi.high`; defaults to 0.9, i.e. 90th percentile).

As well as calculating permuted tendrils on the basis of randomly permuted treatment assignments (corresponding to a hypothesis of no imbalance between treatment arms) the `TendrilPerm` function also returns tendrils corresponding to the specified percentiles of these permutations.  These facilitate comparison with the observed tendril to identify any event types with significant imbalance between treatment arms.

The use of the `perm.from.day` argument can be useful to explore temporal effects, for example where there is a strong imbalance initially, which subsequently resolves, with balanced incidence of events from a certain point in time onward.

The function outputs a list with four elements including the input tendril data filtered for the selected event type, the event type selected, the permuted tendril details, and the percentile details in the form of a `TendrilPi` object.

An example of an invocation of the `TendrilPerm` function, using the `Tendril` object generated above is as follows.

```{r call_TendrilPerm, echo=TRUE, eval=TRUE, warning=FALSE, message=FALSE}
perm <- TendrilPerm(test,
                    "AE40",
                    n.perm = 500)
perm2 <- TendrilPerm(test,
                     "AE42",
                     n.perm = 500,
                     perm.from.day = 120)
```

## Plot functions

### `plot.Tendril`

This function can be invoked as `plot()` applied to a `Tendril` object.  As
well as providing a `Tendril` object the user can optionally supply `coloring`
and `term` arguments.  `coloring` controls how the points on the tendril plot
are coloured, and defaults to `Terms` meaning that each event type is coloured
differently and a legend provided.  Alternatively `p`, `p.adj`, `FDR.tot`, `fish`,
`rdiff`, `RR` and `OR` will colour each point on the tendril scale according to
the relevant statistic at that specific plot point.

The `term` argument allows the plot to display only specific event types.  The
default is `NULL` which means that all tendrils are displayed.  Alternatively,
a single value can be supplied, or a vector of multiple terms.  In all cases
tendrils are only displayed subject to the `AEfreqThreshold` argument.

The following plots illustrate some sample tendril plots.

```{r plot_tendril, echo=TRUE, eval=TRUE, warning=FALSE, message=FALSE, fig.width=6, fig.height=5}
plot(test)
plot(test, coloring = "p.adj")
plot(test, term = "AE40", coloring = "fish")
plot(test, term = paste("AE", c(40, 42, 44), sep = ""), coloring = "Terms")
```

These are generated using the `ggplot2` package and so can be amended using features from the `ggplot2` package.

Tendril plots can also be produced in interactive mode using the `plotly`
package.  These are requested by passing the optional argument
`interactive=TRUE`.  These interactive plots allow access to feature such as
zooming, and hovering over points to obtain information such as the event type,
the FDR p-value and the total number of events of that type.

### `plot.TendrilPerm`

This function is also invoked as `plot`, but applied to a `TendrilPerm` object.  As well as passing a `TendrilPerm` object the user can again specify a `coloring` argument.  This applies equivalent colouring to that used in `plot.Tendril` but applied only to the selected event type, as defined in the call to `TendrilPerm`.  Permuted tendrils are coloured in light grey.

There is also an optional `percentile=TRUE` argument which will overlay the percentiles specified in the call to `TendrilPerm`.  These are shown as two dark grey lines.

The following plots illustrate example permutation plots.

```{r plot_TendrilPerm, echo=TRUE, eval=TRUE, warning=FALSE, message=FALSE, fig.width=6, fig.height=5}
plot(perm)
plot(perm, coloring = "OR", percentile = TRUE)
plot(perm2, coloring = "p.adj", percentile = TRUE)
```

Again, these plots are produced using `ggplot2` and can be modified accordingly.

### `plot_timeseries`

This function requires a `Tendril` object to be supplied and optionally a `term` argument, which defaults to `NULL`.  The `term` argument operates in an equivalent manner to in the `plot.Tendril` function, and allows specific event types to be selected, unless `NULL` is supplied, in which case all tendrils are displayed.

The `plot_timeseries` function shows the event balance as a linear, rather than radial, plot, with time on the horizontal axis and the event balance on the vertical axis.

Example time series plots are shown below.

```{r plot_timeseries, echo=TRUE, eval=TRUE, warning=FALSE, message=FALSE, fig.width=6, fig.height=5}
plot_timeseries(test)
plot_timeseries(test, term = paste("AE", c(40, 42, 44), sep = ""))
```

# Complete usage example

The following code will use the provided sample data `TendrilData` and `SubjList` to produce a complete analysis: compute tendril data, compute statistics, compute permutations for one of the adverse effects and produce a plot:


```{r full_example, echo=TRUE, eval=FALSE, warning=FALSE, message=FALSE}
#load library
library("Tendril")
#compute tendril data
data.tendril <- Tendril(mydata = TendrilData,
                        rotations = Rotations,
                        AEfreqThreshold = 9,
                        Tag = "Comment",
                        Treatments = c("placebo", "active"),
                        Unique.Subject.Identifier = "subjid",
                        Terms = "ae",
                        Treat = "treatment",
                        StartDay = "day",
                        SubjList = SubjList,
                        SubjList.subject = "subjid",
                        SubjList.treatment = "treatment",
                        filter_double_events = FALSE,
                        suppress_warnings = FALSE)

#compute permutations
data.tendril <- TendrilPerm(tendril = data.tendril,
                            PermTerm="AE40",
                            n.perm = 200,
                            perm.from.day = 1)

#do plot
p <- plot(data.tendril$tendril)

#plot permutations
p <- plot(data.tendril)

#plot permutations and percentile
p <- plot(data.tendril, percentile = TRUE)

#save tendril coordinates and stat results
write.table(data.tendril$tendril$data, "mydata.txt", sep="\t", row.names = FALSE)

#save permutation coordinates
write.table(data.tendril$perm.data, "my_permutation_data.txt", sep="\t", row.names = FALSE)

#save permutation percentiles
write.table(data.tendril$tendril.pi, "my_percentile_data.txt", sep="\t", row.names = FALSE)

```

# References

Karpefors, M and Weatherall, J., "The Tendril Plot - a novel visual summary of the incidence, significance and temporal aspects of adverse events in clinical trials" - *JAMIA* 2018; 25(8): 1069-1073