---
title: "RCLabels"
author: "Matthew Kuperus Heun"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{RCLabels}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup, include = FALSE}
library(magrittr)
library(RCLabels)
```

## Introduction

Working with matrices
often requires manipulating row and column labels 
to achieve desired outcomes for matrix mathematics.
The `RCLabels` package (Row and Column Labels)
provides convenient tools for manipulating those labels.


## Use cases

Two applications of matrix mathematics are 
input-output analysis in economics and
physical supply-use table (PSUT) matrices 
for energy conversion chain (ECC) analysis.
In those contexts, 
row and column labels describe processing stages or
flows of goods or services between processing stages.
Row and column labels can benefit those applications,
ensuring that like quantities are added, subtracted, multiplied, or divided, etc.,
provided that row and column labels are respected
during matrix operations.

One package that respects row and column labels is `matsbyname`,
thereby making economic and ECC input-output analyses easier.
Easy manipulation of row and column labels
is, therefore, an enabling capability for using the `matsbyname` package.
This package (`RCLabels`) provides easy manipulation of row and column labels.
In fact, the `matsbyname` package uses `RCLabels` functions internally.


## Label structure

Row and column labels are always character strings, 
often with a prefix--suffix structure, 
where the prefix and suffix are denoted by a separator
or delimited in other ways.
Example row and column labels include

* "pref -> suff" (separator "->")
* "pref [suff]" (suffix delimited by “ [” and “]”)
* "(pref) (suff)" (prefix and suffix both surrounded by “(” and “)”)
* "pref.suff" (separator ".")

Prefixes are usually the "thing" of interest, e.g.
an energy carrier ("Coal") or 
a processing stage in an energy conversion chain
("Main activity producer electricity plants"). 
Suffixes are usually modifiers or metadata about the thing (the prefix).
Suffixes can describe the destination of an energy carrier 
("Light [-> Industry in USA]").
Suffixes can describe the output of a processing stage 
("Production [of Coal in ZAR]"). 


## Working with row and column labels

The `RCLabels` package streamlines working with row and column labels.


### Notation

`RCLabels` enables creation of notation objects that describe the structure 
of a row or column label
via the `notation_vec()` function.

```{r}
# Create a notation object.
my_notation <- notation_vec(pref_start = "(", pref_end = ") ",
                            suff_start = "[", suff_end = "]")

# Notation objects are character vectors.
my_notation
```

Several notation objects are provided for convenience within RCLabels.

```{r}
arrow_notation
paren_notation
bracket_notation
first_dot_notation
from_notation
of_notation
to_notation
bracket_arrow_notation
```

Note that identical `pref_end` and `suff_start` values
(as shown in all notations above)
are interpreted as a single delimiter throughout the `RCLables` package.
Empty strings (`""`) mean that no indication is given
for the start or end of a prefix or suffix.


### Creating row and column labels

Row and column labels can be created with the `paste_pref_suff()` function

```{r}
my_label <- paste_pref_suff(pref = "Coal", suff = "from Coal mines in USA", 
                            notation = my_notation)
my_label
```


### Manipulating row and column labels (prefixes and suffixes)

Row and column labels can be manipulated using several helpful functions.

```{r}
# Split the prefix from the suffix to obtain a named list of strings.
split_pref_suff(my_label, notation = my_notation)

# Flip the prefix and suffix, maintaining the same notation.
flip_pref_suff(my_label, notation = my_notation)

# Change the notation.
switch_notation(my_label, from = my_notation, to = paren_notation)

# Change the notation and flip the prefix and suffix.
switch_notation(my_label, from = my_notation, to = paren_notation, flip = TRUE)
```

The prefix or suffix can be extracted from a row or column label.

```{r}
get_pref_suff(my_label, which = "pref", notation = my_notation)
get_pref_suff(my_label, which = "suff", notation = my_notation)
```


### Vectors and lists of row and column labels

The functions in `RCLabels` work with vectors and lists of row and column labels.

```{r}
labels <- c("a [of b in c]", "d [of e in f]", "g [of h in i]")
labels

split_pref_suff(labels, notation = bracket_notation)
```

This feature means that the functions in `RCLabels` can be used on data frames.
Note that `transpose = TRUE` ensures that a single list
column is created.

```{r}
labels

df <- tibble::tibble(labels = labels)
result <- df %>% 
  dplyr::mutate(
    split = split_pref_suff(labels, notation = bracket_notation, transpose = TRUE)
  )
result$split[[1]]
result$split[[2]]
result$split[[3]]
```


## Nouns and prepositions

As discussed above, the prefix is often the "thing" of interest, and
the remainder of the label (the suffix) modifies the prefix.
This use case is so common that we introduce additional terms 
that enable additional functionality.
The prefix is usually a _noun_ (one or more words), and
the suffix usually consists of _prepositional phrases_
(each consisting of a preposition and an object).
`RCLabels` includes a list of common prepositions.

```{r}
prepositions_list
```


## Working with row and column labels (nouns and prepositions)

`RCLabels` supports the "nouns and prepositions" view
of row and column labels 
with several functions.
`get_nouns()` extracts the nouns from a row or column label.

```{r}
labels

# Extract the nouns.
get_nouns(labels, notation = bracket_notation)

# Extract the prepositional phrases.
get_pps(labels, notation = bracket_notation)

# Extract the prepositions themselves.
get_prepositions(labels, notation = bracket_notation)

# Extract the objects of the prepositions.
# Objects are named by the preposition of their phrase.
get_objects(labels, notation = bracket_notation)

# The get_piece() function is a convenience function
# that extracts just what you want.
get_piece(labels, piece = "noun", notation = bracket_notation)
get_piece(labels, piece = "pref")
get_piece(labels, piece = "suff")
get_piece(labels, piece = "of")
get_piece(labels, piece = "in")
# An empty string is returned when the preposition is missing.
get_piece(labels, piece = "bogus")
```

Labels can be split into their component pieces.

```{r}
labels
# Split the labels into pieces, named by "noun" and prepositions.
split_labels <- split_noun_pp(labels, 
                             prepositions = prepositions_list, 
                             notation = bracket_notation)
split_labels

# Recombine split labels.
paste_noun_pp(split_labels, notation = bracket_notation)

# Recombine with a new notation.
paste_noun_pp(split_labels, notation = paren_notation)
```


## Modifying row and column labels

To modify row and column labels, use one of the `modify_*` functions.

```{r}
labels

# Set new values for nouns.
modify_nouns(labels, 
             new_nouns = c("Coal", "Oil", "Natural gas"), 
             notation = bracket_notation)
```

To modify other pieces of labels, use the `modify_label_pieces()` function.
`modify_label_pieces()` enables assigning new values using a "one-to-many" approach
that enables aggregation.

```{r}
labels

# Change nouns in several labels to "Production" and "Manufacture",
# as indicated by the modification map.
modify_label_pieces(labels, 
                    piece = "noun", 
                    mod_map = list(Production = c("a", "b", "c", "d"),
                                   Manufacture = c("g", "h", "i", "j")), 
                    notation = bracket_notation)

# Change the objects of the "in" preposition, 
# according to the modification map.
modify_label_pieces(labels, 
                    piece = "in", 
                    mod_map = list(GHA = "c", ZAF = c("f", "i")), 
                    notation = bracket_notation)

# Change the objects of "of" prepositions,
# according to the modification map.
modify_label_pieces(labels, 
                    piece = "of", 
                    mod_map = list(Coal = "b", `Crude oil` = c("e", "h")), 
                    notation = bracket_notation)
```

To eliminate a piece of a label altogether, use the `remove_label_pieces()` function.

```{r}
labels

# Eliminate all of the prepositional phrases that begin with "in".
remove_label_pieces(labels, 
                    piece = "in", 
                    notation = bracket_notation)

# Eliminate all of the prepositional phrases that begin with "of" and "in".
# Note that some spaces remain.
remove_label_pieces(labels, 
                    piece = c("of", "in"), 
                    notation = bracket_notation)
```

With much power comes much responsibility!


## Detecting strings in labels

There are times when it is helpful to know if a string is in a label.
`match_by_pattern()` searches for matches in row and column labels
by regular expression.
Internally, `match_by_pattern()` uses `grepl()` for regular expression matching.

```{r}
labels <- c("Production [of b in c]", "d [of Coal in f]", "g [of h in USA]")

# With default `pieces` argument, matching is done for whole labels.
match_by_pattern(labels, regex_pattern = "Production")
match_by_pattern(labels, regex_pattern = "Coal")
match_by_pattern(labels, regex_pattern = "USA")

# Check beginnings of labels: match!
match_by_pattern(labels, regex_pattern = "^Production")
# Check at ends of labels: no match!
match_by_pattern(labels, regex_pattern = "Production$")

# Search by prefix or suffix.
match_by_pattern(labels, regex_pattern = "Production", pieces = "pref")
match_by_pattern(labels, regex_pattern = "Production", pieces = "suff")
# When pieces is "pref" or "suff", only one can be specified.
# The following function call gives an error.
# match_by_pattern(labels, regex_pattern = "Production", pieces = c("pref", "to"))

# Search by noun or preposition.
match_by_pattern(labels, regex_pattern = "Production", pieces = "noun")
match_by_pattern(labels, regex_pattern = "Production", pieces = "in")
# Searching can be done with complicated regex patterns.
match_by_pattern(labels, 
                 regex_pattern = make_or_pattern(c("c", "f")),
                 pieces = "in")
match_by_pattern(labels,
                 regex_pattern = make_or_pattern(c("b", "Coal", "USA")),
                 pieces = "in")
match_by_pattern(labels,
                 regex_pattern = make_or_pattern(c("b", "Coal", "USA")),
                 pieces = c("of", "in"))
# Works with custom lists of prepositions.
match_by_pattern(labels,
                 regex_pattern = make_or_pattern(c("b", "Coal", "GBR", "USA")),
                 pieces = c("noun", "of", "in", "to"),
                 prepositions = c("of", "to", "in"))
```


## Replacing strings in labels

There are times when it is helpful to replace strings in labels.
The `replace_by_pattern()` function will replace strings
in row and column labels by regular expression pattern.
Note that `replace_by_pattern()` is similar to `match_by_pattern()`, 
except `replace_by_pattern()` has an additional argument, `replacement`.
Internally, `replace_by_pattern()` uses `gsub()` 
to perform regular expression matching.

```{r}
labels <- c("Production [of b in c]", "d [of Coal in f]", "g [of h in USA]")
labels

# If `pieces = "all"` (the default), the entire label is available for replacements.
replace_by_pattern(labels,
                   regex_pattern = "Production",
                   replacement = "Manufacture")
replace_by_pattern(labels,
                   regex_pattern = "Coal",
                   replacement = "Oil")
replace_by_pattern(labels,
                   regex_pattern = "USA",
                   replacement = "GHA")

# Replace by prefix and suffix.
replace_by_pattern(labels,
                   regex_pattern = "Production",
                   replacement = "Manufacture",
                   pieces = "pref")
replace_by_pattern(labels,
                   regex_pattern = "Coa",
                   replacement = "Bow",
                   pieces = "suff")
# Nothing should change, because USA is in the suffix.
replace_by_pattern(labels,
                   regex_pattern = "SA",
                   replacement = "SSR",
                   pieces = "pref")
# Now USA --> USSR, because USA is in the suffix.
replace_by_pattern(labels,
                   regex_pattern = "SA",
                   replacement = "SSR",
                   pieces = "suff")
# This will throw an error, because only "pref" or "suff" can be specified.
# replace_by_pattern(labels,
#                    regex_pattern = "SA",
#                    replacement = "SSR",
#                    pieces = c("pref", "suff")

# Replace by noun or preposition.
replace_by_pattern(labels,
                   regex_pattern = "Production",
                   replacement = "Manufacture",
                   pieces = "noun")
replace_by_pattern(labels,
                   regex_pattern = "^Pro",
                   replacement = "Con",
                   pieces = "noun")
# Won't match: wrong side of string.
replace_by_pattern(labels,
                   regex_pattern = "Pro$",
                   replacement = "Con",
                   pieces = "noun")
# No change, because "Production" is a noun.
replace_by_pattern(labels,
                   regex_pattern = "Production",
                   replacement = "Manufacture",
                   pieces = "of")
# Now try with "of".
replace_by_pattern(labels,
                   regex_pattern = "Coal",
                   replacement = "Oil",
                   pieces = "of")
# No change, because "Coal" is not "in" anything.
replace_by_pattern(labels,
                   regex_pattern = "Coal",
                   replacement = "Oil",
                   pieces = "in")
# Now try in "in".
replace_by_pattern(labels,
                   regex_pattern = "USA",
                   replacement = "GBR",
                   pieces = "in")
replace_by_pattern(labels,
                   regex_pattern = "A$",
                   replacement = "upercalifragilisticexpialidocious",
                   pieces = "in")

```


## Conclusion

The `RCLabels` package streamlines the manipulation of row and column labels 
for matrices. 
Applications include input-output analysis in economics and energy conversion chain analysis
or anywhere row and column labels are important for matrix mathematics.