---
title: "A phylogenetic modelling tutorial using Phylogenetic Eigenvector Maps (PEM) as implemented in R package MPSEM (0.4-4)"
author: "Guillaume Guénard"
date: "`r format(Sys.time(), '%Y-%m-%d')`"
output:
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 3
bibliography: PEM_with_MPSEM.bib
vignette: >
  %\VignetteIndexEntry{A phylogenetic modelling tutorial using Phylogenetic Eigenvector Maps (PEM) as implemented in R package MPSEM (0.4-4)}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
hook_output <- knitr::knit_hooks$get("output")
knitr::knit_hooks$set(output = function(x, options) {
  if (!is.null(n <- options$out.lines)) {
    x <- xfun::split_lines(x)
    if (length(x) > n) {
      # truncate the output
      x <- c(head(x, n), "....\n")
    }
    x <- paste(x, collapse = "\n")
  }
  hook_output(x, options)
})
##
### Load packages here:
##
### Figure counter:
(
  function() {
    log <- list(
      labels = character(),
      captions = character()
    )
    list(
      register = function(label, caption) {
        log$labels <<- c(log$labels, label)
        log$captions <<- c(log$captions, caption)
        invisible(NULL)
      },
      getNumber = function(label) {
        which(log$labels == label)
      },
      getCaption = function(label) {
        a <- which(log$labels == label)
        cap <- log$captions[a]
        cat(sprintf("Fig. %d. %s\n\n---\n",a,cap))
        invisible(NULL)
      }
    )
  }
)() -> figCounter
```

# Introduction

Phylogenetic Eigenvector Maps (**PEM**) is a method to perform phylogenetic
modelling. Phylogenetic modelling consists in modelling trait evolution and
predicting trait values using phylogeny as an explanatory factor [@Guenard2013].
Phylogenetic modelling allows one to predict trait values when it is difficult
or impractical to obtain them, for instance when species are rare, extinct, or
when information is needed for several species and trait values are only
available for a relatively small number of them [@Guenard2011;@Guenard2014].

To apply phylogenetic modelling, one needs to have a set of species with known
phylogeny and trait values (hereafter referred to as the _model species_) as
well as to know the locations, with respect to the phylogeny of the models
species, of the species for which trait values are being predicted (hereafter
referred to as the _target species_). Phylogenetic modelling can be performed
jointly with trait correlation modelling: it is possible to use other traits
with known (or estimable) values for the target species to help predict a trait
of interest. Phylogenetic trees being acyclic graphs, I will hereby prefer terms
belonging to the graph theory over terms phylogeneticists may be more familiar
with. Therefore I will use _edge_ over _branches_ and _vertex_ over _root_,
_node_ or _tip_; safe in cases where I want to be specific about what a vertex
represents.

> ### The Phylogenetic eigenvector maps (**PEM**) expression
> 
> Statistical maps are a type of geographic map representing the values or
> states of a variable across space
> <https://encyclopedia2.thefreedictionary.com/statistical+map).
> 
> In a paper entitled "The interpretation of statistical maps", [@Moran1948]
> described tests of significance of the spatial relationships among values of
> qualitative variables on statistical maps.
> 
> "Moran's eigenvector maps" (**MEM**), an expression coined by @Dray2006,
> describes the variation of spatial eigenvectors whose eigenvalues are
> proportional to Moran's I spatial autocorrelation statistics [@Moran1950] of
> the corresponding eigenvectors. Spatial eigenvectors are mathematical
> constructs that describe the variation of quantities across space (or time) at
> different spatial scales. They were originally called Principal Coordinates of
> Neighbour Matrices by @Borcard2002.
> 
> Phylogenetic eigenvector maps (**PEM**) [@Diniz2012; @Guenard2013] are sets of
> eigenfunctions describing the structure of a phylogenetic graph, which
> represents either a Darwinian phylogenetic tree or a reticulated tree, i.e., a
> phylogenetic tree with reticulations. The various eigenvectors describe the
> variation across a phylogeny at different phylogenetic scales.
> 
> Contrary to MEM, the eigenvalues in PEM are not proportional to Moran's I
> autocorrelation coefficients of the corresponding eigenvectors.

The **PEM** work flow consists in

1) calculating the influence matrix of the graph,

2) specifying a model of trait evolution along the edges of the phylogenetic
tree,

3) calculating the left eigenvectors of the weighted and centred influence
matrix and

4) use these eigenvectors as descriptors [@Guenard2013].

An **R** language implementation of that approach is found in package **MPSEM**.
**MPSEM** was developed to make the aforementioned process as seamless as
possible. It is a work in progress; I welcome anyone to provide relevant
suggestions and constructive remarks aimed at making **MPSEM** a better, more
efficient and user-friendly, interface to phylogenetic modelling.

Assuming package **MPSEM** is installed, the first step to calculate a **PEM**
is to load package MPSEM, which depends on packages **ape** and **MASS**:

```{r load_package}
library(MPSEM)
```

# Preparing the data

For the present tutorial, we will use the data set **perissodactyla** from **R**
package **caper**. These data from @Purvis1995 are loaded into your **R**
workspace as follows:

```{r load_data}
data(perissodactyla,package="caper")
```

```{r plot_phylogeny, echo=FALSE, fig.height=4, fig.width = 7}
par(mar=c(2,2,2,2))
plot(perissodactyla.tree)
par(mar=c(5,4,4,2))
figCounter$register(
  "theTree",
  "The phylogenetic tree used for this example."
)
```
```{r, plot_phylogeny_cap, echo=FALSE, results='asis'}
figCounter$getCaption("theTree")
```

The **perissodactyla** data set contains `perissodactyla.tree`, a phylogenetic
tree encompassing `r length(perissodactyla.tree$tip.label)` odd-toed ungulate
species (Fig. `r figCounter$getNumber("theTree")`) and `perissodactyla.data`, a
data frame containing trait information about the species. For the present study
we will model the $\log_{10}$ gestation weight as a function of phylogeny and
$\log_{10}$ adult female weight:

```{r data_table, echo=FALSE, results="latex"}
knitr::kable(perissodactyla.data[,c(1L,2L,4L)])
```

Before going any further, it is important to make sure that the species in the
tree object are the same and presented in the same order as those in the data
table. Glancing at the data table, species clearly cannot match since the latter
feature information for only `r nrow(perissodactyla.data)` of the
`r length(perissodactyla.tree$tip.label)` species in the tree. We will therefore
match the tip labels of the original tree in the data table using the binary
(Latin) species names in a character vector `spmatch`. When no matching element
from the data table is found, the value `NA` appears at the corresponding
position in `spmatch`. We can therefore use these missing values to reference
the species that can be dropped from the tree using function `drop.tip()` from
package **ape** as follows:

```{r droping_species}
match(
  perissodactyla.tree$tip.label,
  perissodactyla.data[,1L]
) -> spmatch

drop.tip(
  perissodactyla.tree,
  perissodactyla.tree$tip.label[is.na(spmatch)]
) -> perissodactyla.tree
```

Now that the data match the tree in terms of species content, we then need to
make sure that species ordering also matches as follows:

```{r check_order}
cbind(perissodactyla.tree$tip.label, perissodactyla.data[,1L])
```

Since they do not, we need to recalculate `spmatch` with the new, reduced, tree
and re-order the data accordingly:

```{r re-order_species}
match(
  perissodactyla.tree$tip.label,
  perissodactyla.data[,1L]
) -> spmatch

perissodactyla.data[spmatch,] -> perissodactyla.data

all(perissodactyla.tree$tip.label == perissodactyla.data[,1L])
```

The last code line is just a last check to guarantee that all species names are
matching. As a last step before we are done with data manipulation, I will put
the binary names in place of the row names and delete the table's first row:

```{r change_rownames}
perissodactyla.data[,1L] -> rownames(perissodactyla.data)
perissodactyla.data[,-1L] -> perissodactyla.data
```

Our data of interest now appear as follows:

```{r re-arranged_data, echo=FALSE, results="latex"}
knitr::kable(perissodactyla.data[,c(1L,3L)])
```

Finally, for the sake of demonstrating how to obtain predictions, we will remove
the Sumatran rhinoceros (_Dicerorhinus sumatrensis_), the first species on top
of the table) to obtain our training data set {`perissodactyla.train`}, keep the
withdrawn data as {`perissodactyla.test`}, and calculate a tree without the
target species:

```{r training_testing_datasets}
perissodactyla.data[-1L,,drop=FALSE] -> perissodactyla.train
perissodactyla.data[1L,,drop=FALSE] -> perissodactyla.test
drop.tip(
  perissodactyla.tree,
  tip = "Dicerorhinus sumatrensis"
) -> perissodactyla.tree.train
```

# Calculating **PEM**

## Edge weighting function

As previously announced, I use the vocabulary of the graph theory when
describing **PEM**: a tree is a (directed) graph, a branch is an edge, and the
root, nodes, and tips are vertices. **PEM** allows one to specify a model of
trait evolution along the edges of the tree. The trait evolution model is a
power function of the edge lengths, with parameters $a$ and $\psi$ describing
the shape of the relationship between the edge lengths and trait evolution rate:

$$
w_{a,\psi}(\phi_{j})=
\begin{cases}
\psi\phi^{\frac{1-a}{2}} & \phi_{j}>0\\
0 & \phi_{j}=0,
\end{cases}
$$
where $a$ is the steepness parameter describing how abrupt the changes in trait
values occur with time following branching, $\psi$ is the evolution rate of the
trait, and $\phi_{j}$ is the length of edge $j$ [@Guenard2013].

```{r display_weighting, echo=FALSE, fig.height=5, fig.width = 7}
par(mar=c(4.5,4.5,1,7) + 0.1)
d <- seq(0, 2, length.out=1000)
a <- c(0,0.33,0.67,1,0.25,0.75,0)
psi <- c(1,1,1,1,0.65,0.65,0.4)
cc <- c(1,1,1,1,1,1,1)
ll <- c(1,2,2,2,3,3,3)
trial <- cbind(a, psi)
colnames(trial) <- c("a","psi")
ntrials <- nrow(trial)
nd <- length(d)
matrix(
  NA,
  ntrials,
  nd,
  dimnames=list(paste("a=", trial[,"a"], ", psi=", trial[,"psi"], sep=""),
                paste("d=", round(d,4), sep=""))
) -> w
for(i in 1:ntrials)
  w[i,] <- MPSEM::PEMweights(d, trial[i,"a"], trial[i,"psi"])
plot(NA, xlim=c(0,2), ylim=c(0,1.6), ylab="Weight", xlab="Distance", axes=FALSE)
axis(1, at=seq(0,2,0.5), label=seq(0,2,0.5))
axis(2, las=1)
text(expression(paste(~~~a~~~~~~~psi)),x=2.2,y=1.57,xpd=TRUE,adj=0)
for(i in 1:ntrials) {
  lines(x=d, y=w[i,], col=cc[i], lty=ll[i])
  text(paste(sprintf("%.2f", trial[i,1]), sprintf("%.2f",trial[i,2]), sep="  "),
       x=rep(2.2,1), y=w[i,1000], xpd=TRUE, adj=0)
}
rm(d,a,psi,cc,ll,trial,ntrials,nd,w,i)
figCounter$register(
  "edgeWeighting",
  paste(
    "Output of the edge weighting function for different sets of parameters",
    "$a$ and $\\psi$."
  )
)
```
```{r, display_weighting_cap, echo=FALSE, results='asis'}
figCounter$getCaption("edgeWeighting")
```

As the steepness parameter increases, the weight assigned to a given edge
increases more sharply with respect to the phylogenetic distance (or
evolutionary time; Fig. `r figCounter$getNumber("edgeWeighting")`). In the
context of **PEM**, the edge weight represent the relative rate of evolution of
the trait; the greater the edge weight, the greater the trait change along that
edge. When $a=0$, trait evolution is neutral and therefore proceeds by random
walk along edges. When $a=1$, edge weights no longer increase as a function of
edge lengths. That situation corresponds to the scenario in which trait
evolution is driven by the strongest possible natural selection: following a
speciation event, trait either change abruptly (directional selection) at the
vertex or do not change at all (stabilizing selection).

Also, the shape parameters may differ for different parts of the phylogenetic
tree or network. For instance, the trait may have evolved neutrally at a steady
rate from the base of a tree up to a certain node, and then, may have been
subjected to different levels of selection pressure and various evolution rate
from some of the nodes towards their descendent tips.

## Phylogenetic graph

The first step to build a **PEM** is to convert the phylogenetic tree. The is
done by giving the tree to function `Phylo2DirectedGraph()` as follows:

```{r convert_to_graph}
Phylo2DirectedGraph(
  perissodactyla.tree.train
) -> perissodactyla.pgraph
```

Here's a snipet showing how the graph container used by **MPSEM** stores graph
information:

```{r graph_storage,echo=FALSE}
str(perissodactyla.pgraph)
```

This list contains two main elements, `$edge` and `$vertex`, plus additional
information. `$edge` and `$vertex` elements each contain a list of sub-elements:

* Element `$edge` is a list containing information about the edges of the graph,
  namely the indices of their origin and destination vertices (the two first
  unnamed elements) and an arbitrary number of supplementary elements storing
  other edge properties. In the present case, a numeric vector created by
  `Phylo2DirectedGraph()` and called `$distance` stores the phylogenetic
  distances ($\phi_{j}$), which, in this example, correspond to the branch
  lengths of `perissodactyla.tree`.

* The element `$vertex` is a list containing an arbitrary number of elements
  storing vertex properties. In the present case, a logical vector created by
  `Phylo2DirectedGraph()` and called `$species` stores whether a given vertex
  represents a species (_i.e._, it is a tip) or not (_i.e._, it is a node).

* In addition to edge and vertex information, the container stores other useful
  information in the form of attributes:

  * `ev` stores the number of edges and vertices.

  * `elabel` stores the edge labels.
  
  * `vlabel` stores the vertex labels.

## Building the eigenvector map

In **MPSEM**, **PEM** are build using function `PEM.build()`. As an example, let
us assume that the steepness and evolution rate are $a=0.25$ and $\psi=2$ among
genus _Equus_, $a=0.8$ and $\psi=0.5$ among genus _Tapirus_, and $a=0$ and
$\psi=1$ from the root of the tree up to the vertex where the two latter genera
begin as well as among the other genera. The following figure will help us
figure out the indices of the edges involved:

```{r tree_labelled, fig.height=5, fig.width = 7}
perissodactyla.tree.train -> tree
paste("N",1L:tree$Nnode) -> tree$node.label

par(mar=c(2,2,2,2))
plot(tree,show.node.label=TRUE)

edgelabels(
  1L:nrow(tree$edge),
  edge=1L:nrow(tree$edge),
  bg="white",
  cex=0.75
)
```
```{r tree_labelled_cap, echo=FALSE, results='asis'}
figCounter$register(
  "trainingTree",
  "The labelled training species tree for this example."
)
figCounter$getCaption("trainingTree")
```


Hence, $a=0.25$ and $\psi=2$ for edges 15--21, $a=0.8$ and $\psi=0.5$ for edges
10--13, and $a=0$ and $\psi=1$ for edges 1--9 and 14:

```{r set_param}
rep(0,attr(perissodactyla.pgraph,"ev")[1L]) -> steepness
rep(1,attr(perissodactyla.pgraph,"ev")[1L]) -> evol_rate

steepness[15L:21] <- 0.25
evol_rate[15L:21] <- 2
steepness[9L:13] <- 0.8
evol_rate[9L:13] <- 0.5
```

The **PEM** is obtained as follows:

```{r calculate_PEM}
PEM.build(
  perissodactyla.pgraph,
  d="distance",
  sp="species",
  a=steepness,
  psi=evol_rate
) -> perissodactyla.PEM
```

In addition to the phylogenetic graph, function `PEM.build()` needs arguments
`d`, the name of the edge property where the phylogenetic distances are stored,
`sp`, the name of the vertex property specifying what vertex is a species, as
well as the user-specified steepness and evolution rate. When the vectors given
to arguments `a` or `psi`, have smaller sizes then the number of edges, the
values are recycled. The default values for `d` and `sp` are those produced by
`Phylo2DirectedGraph()`, and can therefore be omitted in most cases. The object
that **MPSEM** uses to store **PEM** information is rather complex and we will
hereby not browse through it. Method `as.data.frame` can be used to extract the
eigenvector from the `PEM-class` object. For a set of $n$ species, that method
returns a matrix encompassing at most $n-1$ column vectors that can be used in
models to represent phylogenetic structure in traits. Here the phylogenetic
patterns of variation described by two eigenvectors of the **PEM** we calculated
above:

```{r Eigenvector_example, fig.height=4, fig.width=7.5}
layout(matrix(c(1,1,1,2,2,3,3),1L,7L))
par(mar=c(5.1,2.1,4.1,2.1))

## Singular vectors are extracted using the as.data.frame method:
as.data.frame(perissodactyla.PEM) -> perissodactyla.U

plot(perissodactyla.tree.train, x.lim=60, cex=1.5)
plot(y = 1L:nrow(perissodactyla.train), ylab="", xlab = "Loading",
     x = perissodactyla.U[,1L], xlim=0.5*c(-1,1),
     axes=FALSE, main = expression(bold(v)[1]), cex=1.5)
axis(1)
abline(v=0)

plot(y = 1L:nrow(perissodactyla.train), ylab="", xlab = "Loading",
     x = perissodactyla.U[,5L], xlim=0.5*c(-1,1),
     axes=FALSE, main = expression(bold(v)[5]), cex=1.5)
axis(1)
abline(v=0)
```
```{r Eigenvector_example_cap, echo=FALSE, results='asis'}
figCounter$register(
  "eigenvectorExample",
  "Example of two eigenvectors obtained from the training species phylogeny."
)
figCounter$getCaption("eigenvectorExample")
```

The pattern shown by the first eigenvector essentially contrasts Equids and the
other odd-toed ungulate species whereas the pattern shown by the second
eigenvector essentially contrasts tapirs and Rhinocerotids.

## Estimate weighting parameters empirically

Because users do often not have information about the best set of weighting
function parameters to use for modelling, **MPSEM** as has a function called
`PEM.fitSimple()` that allows them to empirically estimate a single value of
parameter $a$ for the whole phylogeny using restricted maximum likelihood.
Function `PEM.fitSimple()` does not estimate parameter $\psi$ because the latter
has no effect when its value is assumed to be constant throughout the phylogeny.
A function to estimate different sets of weighting function parameters for
different portions of the phylogeny has yet to be developed. That function
requires a response variable that will be used to optimize the steepness
parameter (here the $\log_{10}$ neonate weight) as well as lower and upper
bounds for the admissible parameter values and is called as follows:

```{r PEM_opt1}
PEM.fitSimple(
  y = perissodactyla.train[,"log.neonatal.wt"],
  x = NULL,
  w = perissodactyla.pgraph,
  d = "distance",
  sp="species",
  lower = 0,
  upper = 1
) -> perissodactyla.PEM_opt1
```

If other traits are to be used in the model (here the $\log_{10}$ female
weight), they are passed to argument `x` as follows:

```{r PEM_opt2}
PEM.fitSimple(
  y = perissodactyla.train[,"log.neonatal.wt"],
  x = perissodactyla.train[,"log.female.wt"],
  w = perissodactyla.pgraph,
  d = "distance",
  sp="species",
  lower = 0,
  upper = 1
) -> perissodactyla.PEM_opt2
```

The results of the latter calls are a **PEM** similar to that obtained using
`PEM.build()`, with additional information resulting from the optimization
process. It is noteworthy that estimates of the steepness parameter (stored as
element `$optim\$par` of the **PEM** objects) and, consequently, the resulting
phylogenetic eigenvectors, will be different depending on the use of auxiliary
traits. In the example above, for instance, $a$ was estimated to
`r round(perissodactyla.PEM_opt1$optim$par,2)` by `PEM.fitSimple()` when no
auxiliary trait is involved (first call) and to
`r round(perissodactyla.PEM_opt2$optim$par,2)` when the female weight is used as
an auxiliary trait (second call).

## Phylogenetic modelling

To model trait values, **PEM** are used as descriptors in other modelling
method. Any suitable method can be used. For instance, package **MPSEM**
contains a utility function called `lmforwardsequentialAICc()` that does
step-wise variable addition in multiple regression analysis on the basis of the
corrected Akaike Information Criterion (AICc) [@Hurvich1993]:

```{r build_PEM_models}
lmforwardsequentialAICc(
  y = perissodactyla.train[,"log.neonatal.wt"],
  object = perissodactyla.PEM_opt1
) -> lm1
summary(lm1)

lmforwardsequentialAICc(
  y = perissodactyla.train[,"log.neonatal.wt"],
  x = perissodactyla.train[,"log.female.wt",drop=FALSE],
  object = perissodactyla.PEM_opt2
) -> lm2
summary(lm2) 
```

> It is noteworthy that in order to pass a single auxiliary trait to
> `lmforwardsequentialAICc()`, it is necessary, as exemplified above, to set
> `drop=FALSE` to the extraction operator (`[]`) to keep the `data.frame`
> property of the object. Hence, it is crucial that the auxiliary trait values
> be referenced with the same variable names in all `data.frame` objects
> involved in the analysis.
> 
> Since, the extraction operator drops the `data.frame` property on its output
> object by default whenever a single column is selected, it is mandatory to set
> `drop=FALSE` when a single variable is (or could be) selected. A `data.frame`
> object is required for making predictions using the resulting linear model.

To obtain predictions, we need to calculate the locations of the target species
with respect to the phylogeny of the model species. This is accomplished by
`getGraphLocations()`, to which we give the tree for all species (model +
targets) and the names (or indices) of the target species. Then, we use the
`predict()` method for **PEM** objects. The latter takes, in addition to the
**PEM** object, the locations of the target species as obtained by
`getGraphLocations()`, an `lm` (or `glm`) object involving the eigenvectors of
the **PEM**, and a table of auxiliary trait values for the target species, which
can be omitted if no auxiliary trait is present in the linear model.

```{r make_prediction}
getGraphLocations(
  perissodactyla.tree,
  targets = "Dicerorhinus sumatrensis"
) -> perissodactyla.loc

predict(
  object = perissodactyla.PEM_opt2,
  targets = perissodactyla.loc,
  lmobject = lm2,
  newdata = perissodactyla.test,
  interval = "prediction",
  level = 0.95) -> pred
```

Here, the predicted neonatal weight for the Sumatran rhinoceros is
`r round((10^pred$values)/1000,1)` $\,\mathrm{kg}$ and the bounds of the $95\%$
prediction interval are `r round((10^pred$lower)/1000,1)` and
`r round((10^pred$upper)/1000,1)` $\,\mathrm{kg}$, while the observed value was
actually `r round((10^perissodactyla.test$log.neonatal.wt)/1000,1)`
$\,\mathrm{kg}$.

# Cross-validating **PEM** predictions

Here is and example of how to perform a leave-one-out cross-validation of a data
set using the **R** code from the previous two sections. Predictions will be
added to table `perissodactyla.data`:

```{r cross-validation}
data.frame(
  perissodactyla.data,
  predictions = NA,
  lower = NA,
  upper = NA
) -> perissodactyla.data

jackinfo <- list()
for(i in 1L:nrow(perissodactyla.data)) {
  jackinfo[[i]] <- list()
  getGraphLocations(
    perissodactyla.tree,
    targets = rownames(perissodactyla.data)[i]
  ) -> jackinfo[[i]][["loc"]]
  PEM.fitSimple(
    y = perissodactyla.data[-i,"log.neonatal.wt"],
    x = perissodactyla.data[-i,"log.female.wt"],
    w = jackinfo[[i]][["loc"]]$x
  ) -> jackinfo[[i]][["PEM"]]
  lmforwardsequentialAICc(
    y = perissodactyla.data[-i,"log.neonatal.wt"],
    x = perissodactyla.data[-i,"log.female.wt",drop=FALSE],
    object = jackinfo[[i]][["PEM"]]
  ) -> jackinfo[[i]][["lm"]]
  predict(
    object = jackinfo[[i]][["PEM"]],
    targets = jackinfo[[i]][["loc"]],
    lmobject = jackinfo[[i]][["lm"]],
    newdata = perissodactyla.data[i,"log.female.wt",drop=FALSE],
    interval = "prediction",
    level = 0.95
  ) -> predictions
  unlist(predictions) -> perissodactyla.data[i, c("predictions", "lower", "upper")]
}
rm(i, predictions)
```

Because the result of `getGraphLocations()` includes the phylogenetic graph
without the target species _Dicerorhinus sumatrensis_, which was removed from
the tree given as the argument `tpall` and can be found as its element `$x`, it
is not necessary to re-calculate the tree with the target species dropped, as
well as the phylogenetic graph, as we did previously for explanatory purposes.
Also, the internal information about each cross-validation steps is stored into
a list (hereby called `jackinfo`), in order for the details of the analyses to
be accessible later on.

```{r plot_pred_obs, echo=FALSE, fig.height=7, fig.width=7}
par(mar=c(5,5,2,2)+0.1)
range(
  perissodactyla.data[,"log.neonatal.wt"],
  perissodactyla.data[,c("predictions","lower","upper")]
) -> rng

plot(NA, xlim = rng, ylim = rng, xlab = "observed", ylab = "Predicted",
     asp = 1, las = 1)
points(
  x = perissodactyla.data[,"log.neonatal.wt"],
  y = perissodactyla.data[,"predictions"]
)

abline(0,1)
arrows(
  x0 = perissodactyla.data[,"log.neonatal.wt"],
  x1 = perissodactyla.data[,"log.neonatal.wt"],
  y0 = perissodactyla.data[,"lower"],
  y1 = perissodactyla.data[,"upper"],
  length = 0.05,
  angle = 90,
  code = 3
)

figCounter$register(
  "crossPreds",
  paste(
    "Leave-one-out crossvalidated prediction of the neonatal weight for",
    nrow((perissodactyla.data)),
    "odd-toed ungulate species."
  )
)
```
```{r plot_pred_obs_cap, echo=FALSE, results='asis'}
figCounter$getCaption("crossPreds")
```

From the present cross-validation, we found that the ($\log_{10}$) neonatal body
mass can be predicted with a cross-validated $R^{2}$ of
`r round(1-(sum((perissodactyla.data[,"predictions"]-perissodactyla.data[,"log.neonatal.wt"])^2)/nrow(perissodactyla.data))/var(perissodactyla.data[,"log.neonatal.wt"]),2)`
(Fig. `r figCounter$getNumber("crossPreds")`).

# Other utility functions

## Influence matrix

The influence matrix is used internally to calculate **PEM**. It is a matrix
having as many rows as the number of vertices (species + nodes) and as many
columns as the number of edges. Any given element of the influence matrix is
coding whether a vertex, which is represented a row of the matrix is influenced
an edge, which is represented by a column of the matrix. In the context of
**PEM**, a vertex is influenced by an edge when the former has ancestors on the
latter or, in other words, when an edge is on the path leading from a tip to the
root of the tree. The influence matrix is obtained as follows:

```{r influence_matrix}
InflMat(perissodactyla.pgraph) -> res
res
```

## Update and forced **PEM** parameters

The calculation of the influence matrix performed by `PEM.build()` for a given
phylogenetic graph need not be done every time new weighting function parameters
are to be tried. For that reason, **MPSEM** provides a function called
`PEM.updater()` that takes a previously calculated **PEM** object, applies new
edge weighting, and recalculates the phylogenetic eigenvectors:

```{r PEM_updater}
PEM.updater(object = perissodactyla.PEM, a = 0, psi = 1) -> res
res
```

The result of `PEM.build()` and `PEM.updater()` does not contain all the
information necessary to predict trait values. Hence, neither of these functions
is given information about the response variable and auxiliary traits. To
perform these preliminary calculations, **MPSEM** provides the user with
function `PEM.forcedSimple()` that produce the same output as `PEM.fitSimple()`
with user-provided values of weighting parameters. It is called as follows:

```{r forcedSimple}
PEM.forcedSimple(
  y = perissodactyla.train[,"log.neonatal.wt"],
  x = perissodactyla.train[,"log.female.wt"],
  w = perissodactyla.pgraph,
  a = steepness,
  psi = evol_rate
) -> res
res
```

It is noteworthy that contrary to function `PEM.fitSimple()`, function
`PEM.forcedSimple()` can be used to apply different weighting parameters for
different edges.

## **PEM** scores

**PEM** scores are the values of target species on the eigenfunctions underlying
the **PEM**. These scores are calculated from the graph locations and a **PEM**
object using function `Locations2PEMscores()` as follows:

```{r get_scores}
Locations2PEMscores(
  object = perissodactyla.PEM_opt2,
  gsc = perissodactyla.loc
) -> scores
scores
```

The function is used internally by `PEM-class` `predict` method, and therefore
need not be called when performing linear phylogenetic modelling as exemplified
above. It comes in handy when the **PEM** is used together with other modelling
approaches (e.g. multivariate regression trees, linear discriminant analysis,
artificial neural networks) that have `predict` methods that are not specially
adapted for phylogenetic modelling.

## Miscellaneous

Package **MPSEM** comes with functions, some implemented in the **C** language,
to simulate quantitative traits evolution by Ornstein-Uhlenbeck process on
potentially large phylogenies [@Butler2004]. These functions are only useful to
perform simulations, which is a rather advanced matter outside the scope of the
present tutorial. I refer the user to **MPSEM** help files for further details.

In addition to function `Phylo2DirectedGraph()`, which we have seen previously
**MPSEM** also has built-in graph manipulation functions to populate a graph
with vertices, add and remove vertices and edges, etc. These functions were
mainly intended to be called internally by **MPSEM** functions. They were made
visible upon loading the package because of their potential usefulness to some
advanced applications that are outside the scope of the present tutorial. Again,
I refer the user to **MPSEM** help files for further details.

# References