---
title: "Introduction to Jellyfisher"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to Jellyfisher}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(jellyfisher)
```

**Jellyfisher** is an R package (a [htmlwidget](https://www.htmlwidgets.org/))
for visualizing tumor evolution and subclonal compositions using Jellyfish plots.
The package is based on the
[Jellyfish](https://github.com/HautaniemiLab/jellyfish) visualization tool,
bringing its functionality to R users. Jellyfisher supports both
[ClonEvol](https://github.com/hdng/clonevol) results and plain data frames,
making it compatible with various tools and workflows.

## Input data

The input data should follow specific structures for _samples_, _phylogeny_, 
subclonal _compositions_, and optional _ranks_.

The `jellyfisher` package includes an example data set
(`jellyfisher_example_tables`) based on the following publication:  
Lahtinen, A., Lavikka, K., Virtanen, A., et al. "Evolutionary states and
trajectories characterized by distinct pathways stratify patients with ovarian
high-grade serous carcinoma." _Cancer Cell_ **41**, 1103–1117.e12 (2023). DOI:
[10.1016/j.ccell.2023.04.017](https://doi.org/10.1016/j.ccell.2023.04.017).

```{r echo=F, message=F}
# Subset the data to keep the compiled vignette a bit smaller
jellyfisher_example_tables <- jellyfisher_example_tables |>
  select_patients(c("EOC69", "EOC677", "EOC495", "EOC809"))
```


### Samples
            
- `sample` (string): specifies the unique identifier for each sample.
- `displayName` (string, optional): allows for specifying a custom name for each sample. If the column is omitted, the `sample` column is used as the display name.
- `rank` (integer): specifies the position of each sample in the Jellyfish plot. For example, different stages of a disease can be ranked in chronological order: diagnosis (1), interval (2), and relapse (3). The zeroth rank is reserved for the root of the sample tree. Ranks can be any integer, and unused ranks are automatically excluded from the plot. If the `rank` column is
  absent, ranks are assigned based on each sample’s depth in the sample tree.
- `parent` (string): identifies the parent sample for each entry. Samples without a specified parent are treated as children of an imaginary root sample.

```{r}
head(jellyfisher_example_tables$samples, 25)
```

### Phylogeny

- `subclone` (string): specifies subclone IDs, which can be any string.
- `parent` (string): designates the parent subclone. The subclone without a parent is considered the root of the phylogeny.
- `color` (string, optional): specifies the color for the subclone. If the column is omitted, colors will be generated automatically.
- `branchLength` (number): specifies the length of the branch leading to the subclone. The length may be based on, for example, the number of unique mutations in the subclone. The branch length is shown in the Jellyfish plot's legend as a bar chart. It is also used when generating a phylogeny-aware color scheme.

```{r}
head(jellyfisher_example_tables$phylogeny, 25)
```

### Subclonal compositions

Subclonal compositions are specified in a
[tidy](https://vita.had.co.nz/papers/tidy-data.pdf) format, where each row
represents a subclone in a sample.

- `sample` (string): specifies the sample ID.
- `subclone` (string): specifies the subclone ID.
- `clonalPrevalence` (number): specifies the clonal prevalence of the subclone in the sample. The clonal prevalence is the proportion of the subclone in the sample. The clonal prevalences in a sample must sum to 1.

```{r}
head(jellyfisher_example_tables$compositions, 25)
```

### Ranks

The ranks in the example data set are used to indicate the time points when the
samples were acquired. 

- `rank` (integer): specifies the rank number. The zeroth rank is reserved for the inferred root of the sample tree. However, you are free to define a title for it.
- `title` (string): specifies the title for the rank.

```{r}
head(jellyfisher_example_tables$ranks, 6)
```

## Plotting

### Basic plotting 

The three tables are passed to the `jellyfisher` function as a named list. The
function generates an interactive Jellyfish plot based on the input data. If the
data set contains multiple patients, the Jellyfisher htmlwidget shows navigation
buttons to switch between patients.

```{r}
jellyfisher(jellyfisher_example_tables,
            width = "100%", height = 450)
```

### Plotting with custom options

```{r}
jellyfisher(jellyfisher_example_tables,
            options = list(
              sampleHeight = 70,
              sampleTakenGuide = "none",
              tentacleWidth = 3,
              showLegend = FALSE
            ),
            width = "100%", height = 400)
```

### Plotting a single patient

When plotting multiple patients, Jellyfisher shows buttons (Previous and Next)
to navigate between patients. When the data contains only one patient, these
buttons are hidden. The package also provides a `select_patients` function to
filter the data set with ease.

```{r}
jellyfisher_example_tables |>
  select_patients("EOC677") |>
  jellyfisher(width = "100%", height = 400)
```

## Adjusting the sample tree structure

The sample trees in the example data set were constructed as follows:

*"For each sample, we checked whether an earlier time point included exactly one
sample from the same anatomical location.  If such a sample existed, it was
assigned as the parent; otherwise, the inferred root was used as the parent."*

However, this mechanistic approach may not always produce credible sample trees.

### Changing parent

The *r1Bow1* (bowel) sample in the following jellyfish plot is derived
from an earlier bowel sample *p2Bow1_c*, which has no traces of the subclone 12.

```{r}
jellyfisher_example_tables |>
  select_patients("EOC809") |>
  jellyfisher(width = "100%", height = 600)
```

Using the `set_parents` function, we can adjust the parent of the *r1Bow1* sample
to be *p2Per1_cO* (peritoneum), which is a possible source of the metastasis due
to its proximity. The high prevalence of subclone 12 in this sample suggests
that it is the likely source of the metastasis in the *r1Bow1* sample.

```{r}
jellyfisher_example_tables |>
  select_patients("EOC809") |>
  set_parents(list("EOC809_r1Bow1_DNA1" = "EOC809_p2Per1_cO_DNA2")) |>
  jellyfisher(width = "100%", height = 600)
```

### Changing topology

While ranks (the columns) can indicate the time points when the samples were
acquired, they can also be used to simply show the sample's depth in the sample
tree. For instance, the following plot shows all the samples on the same rank,
indicating that they were diagnostic samples acquired at the same time.

```{r}
jellyfisher_example_tables |>
  select_patients("EOC495") |>
  jellyfisher(width = "100%", height = 650)
```

However, one can argue that the LN (lymph node) samples represent a later
development in the disease, and thus, they should be placed on a later rank.
We can remove the existing ranks, define new parent-child relationships, and
let Jellyfisher assign the ranks based on the sample tree depth.

```{r}
tables <- jellyfisher_example_tables |>
  select_patients("EOC495")

# Remove existing ranks. The ranks will be assigned automatically based
# on samples' depths in the sample tree.
tables$samples$rank <- NA

# Rank titles should be removed as well because they are no longer valid.
tables$ranks <- NULL

tables |>
  set_parents(list("EOC495_pLNL1_DNA1" = "EOC495_pLNR_DNA1",
                   "EOC495_pLNL2_DNA1" = "EOC495_pLNL1_DNA1")) |>
  jellyfisher(width = "100%", height = 500)
```

If we think that the lymph node samples represent an even later development,
we can manually assign ranks to the samples. The `set_ranks` function provides
an easy way to do this.

```{r}
tables |>
  set_parents(list("EOC495_pLNL1_DNA1" = "EOC495_pLNR_DNA1",
                   "EOC495_pLNL2_DNA1" = "EOC495_pLNL1_DNA1")) |>
  set_ranks(list("EOC495_pLNR_DNA1" = 2,
                 "EOC495_pLNL1_DNA1" = 3,
                 "EOC495_pLNL2_DNA1" = 4),
            default = 1) |>
  jellyfisher(width = "100%", height = 400)
```

```{r echo=F}
rm(tables)
```

## Handling non-aberrant cells

Typically in tumor evolution studies, the focus is on the subclonal compositions
of the tumor cells. Thus, the root node in the phylogenetic tree represents the
founding clone. However, the data may also contain non-aberrant cells, i.e., the
tumor purity is less than 100%. For these cases, the `normalsInPhylogenyRoot`
option instructs Jellyfisher to treat the root node as non-aberrant cells. The
option has two consequences: (1) No tentacles are drawn between the root
subclones, and (2) the root subclone is colored as white when the
phylogeny-aware color scheme is used.

```{r}
# Subclone N at the root represents the non-aberrant cells.
# The letter N has no special meaning in Jellyfisher.
non_aberrant <- list(
  samples = data.frame(sample = c("A", "B")),
  compositions = data.frame(
    sample = c("A", "A", "A", "B", "B", "B"),
    subclone = c("N", "1", "2", "N", "1", "2"),
    clonalPrevalence = c(0.2, 0.4, 0.4, 0.3, 0.3, 0.4)
  ),
  phylogeny = data.frame(
    subclone = c("N", "1", "2"),
    parent = c(NA, "N", "1")
  )
)
```

```{r}
non_aberrant |>
  jellyfisher(options = list(
    normalsAtPhylogenyRoot = TRUE
  ),
  width = "100%", height = 350)
```

In the above plot, the root clone, which represents the non-aberrant cells, is
hidden from the inferred root sample. However, sometimes a patient may have
multiple independent clones, and in these cases the root clone is shown:

```{r}
# Change the parent of subclone 2 to N
non_aberrant$phylogeny$parent[non_aberrant$phylogeny$subclone == "2"] <- "N"

non_aberrant |>
  jellyfisher(options = list(
    normalsAtPhylogenyRoot = TRUE
  ),
  width = "100%", height = 350)
```

## Session info

```{r}
sessionInfo()
```