---
title: "Exporting objects and functions from the workspace"
author: "Phil Chalmers"
date: "`r Sys.Date()`"
output:
  html_document:
    fig_caption: false
    number_sections: true 
    toc: true
    toc_float:
      collapsed: false
      smooth_scroll: false
vignette: >
  %\VignetteIndexEntry{Exporting objects and functions from the workspace}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r nomessages, echo = FALSE}
knitr::opts_chunk$set(
  warning = FALSE,
  message = FALSE,
  fig.height = 5,
  fig.width = 5
)
options(digits=4)
par(mar=c(3,3,1,1)+.1)
```


# Including fixed objects

R is fun language for computer programming and statistics, but it's not without it's quirks. For instance, R generally has a recursive strategy when attempting to find objects within functions. If an object can't be found, R will start to look outside the function's environment to see if the object can be located there, and if not, look within even higher-level environments... This recursive search continues until it searches for the object in the user workspace/Global environment, and only when the object can't be found here will an error be thrown. This is a strange feature to most programmers who come from other languages, and when writing simulations may cause some severely unwanted issues. This tutorial demonstrates how to make sure all required user-defined objects are visible to `SimDesign`. 

# Scoping

```{r include=FALSE}
set.seed(1234)
```

To demonstrate the issue, let's define two objects and a function which uses these objects. 

```{r}
obj1 <- 10
obj2 <- 20
```

When evaluated, these objects are visible to the user, and can be seen by typing in the R console by typing `ls()`. Functions which do not define objects with the same name will also be able to locate these values.

```{r}
myfun <- function(x) obj1 + obj2
myfun(1)
```

This behavior is indeed a bit strange, but it's one of R's quirks. Unfortunately, when running code in parallel across different cores these objects *will not be visible*, and therefore must be exported using other methods (e.g., in the `parallel` package this is done with `clusterExport()`). 

```{r eval = FALSE}
library(parallel)
cl <- makeCluster(2)
res <- try(parSapply(cl=cl, 1:4, myfun))
res
```

```{r echo=FALSE}
library(parallel)
cl <- makeCluster(2)
cat("Error in checkForRemoteErrors(val) : 
  2 nodes produced errors; first error: object 'obj1' not found")
```


Exporting the objects to the cluster fixes the issue. 

```{r}
clusterExport(cl=cl, c('obj1', 'obj2'))
parSapply(cl=cl, 1:4, myfun)
```

The same reasoning above applies to functions defined in the R workspace as well, including functions defined within external R packages. Hence, in order to use functions from other packages they must either be explicitly loaded with `require()` or `library()` within the distributed code, or referenced via their Namespace with the `::` operator (e.g., `mvtnorm::rmvtnorm()`).

```{r echo=FALSE}
stopCluster(cl)
```

# Exporting objects example 

In order to make objects safely visible in `SimDesign` the strategy is very simple: wrap all desired objects into a named list (or other object), and pass this to the `fixed_objects` argument. From here, elements can be indexed using the `$` operator or `with()` function, or whatever other method may be convenient. Note, however, this is only required for defined *objects* not *functions* --- `SimDesign` automatically makes user-defined functions available across all nodes.

As an aside, an alternative approach is simply to define/source the objects within the respective `SimDesign` functions; that way they will clearly be visible at runtime. The following `fixed_objects` approach is really only useful when the defined objects contain a large amount of code.

```{r}
library(SimDesign)
#SimFunctions(comments = FALSE)

### Define design conditions and number of replications
Design <- createDesign(N = c(10, 20, 30))
replications <- 1000

# define custom functions and objects (or use source() to read these in from an external file)
SD <- 2
my_gen_fun <- function(n, sd) rnorm(n, sd = sd)
my_analyse_fun <- function(x) c(p = t.test(x)$p.value)
fixed_objects <- list(SD=SD)

#---------------------------------------------------------------------------

Generate <- function(condition, fixed_objects) {
    Attach(condition) # make condition names available (e.g., N)
    
    # further, can use with() to use 'SD' directly instead of 'fixed_objects$SD'
    ret <- with(fixed_objects, my_gen_fun(N, sd=SD))
    ret
}

Analyse <- function(condition, dat, fixed_objects) {
    ret <- my_analyse_fun(dat)
    ret
}

Summarise <- function(condition, results, fixed_objects) {
    ret <- EDR(results, alpha = .05)
    ret
}

#---------------------------------------------------------------------------

### Run the simulation
res <- runSimulation(Design, replications, verbose=FALSE, fixed_objects=fixed_objects,
                     generate=Generate, analyse=Analyse, summarise=Summarise, debug='none')
res
```

By placing objects in a list and passing this to `fixed_objects`, the objects are safely exported to all relevant functions. Furthermore, running this code in parallel will also be valid as a consequence (see below) because all objects are properly exported to each core. 

```{r eval=FALSE}
res <- runSimulation(Design, replications, verbose=FALSE, fixed_objects=fixed_objects,
                     generate=Generate, analyse=Analyse, summarise=Summarise, debug='none',
                     parallel = TRUE)
```

Again, remember that this is only required for R **objects**, NOT for user-defined functions!