Non-Standard Evaluation (NSE hereafter) occurs when R expressions are captured and evaluated in a manner different than if they had been executed without intervention. subset
is a canonical example, which we use here with the built-in iris
data set:
subset(iris, Sepal.Width > 4.1)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
16 5.7 4.4 1.5 0.4 setosa
34 5.5 4.2 1.4 0.2 setosa
Sepal.Width
does not exist in the global environment, yet this works because subset
captures the expression and evaluates it within iris
.
A limitation of NSE is that it is difficult to use programmatically:
exp.a <- quote(Sepal.Width > 4.1)
subset(iris, exp.a)
Error in subset.data.frame(iris, exp.a): 'subset' must be logical
oshka::expand
facilitates programmable NSE, as with this simplified version of subset
:
subset2 <- function(x, subset) {
sub.exp <- expand(substitute(subset), x, parent.frame())
sub.val <- eval(sub.exp, x, parent.frame())
x[!is.na(sub.val) & sub.val, ]
}
subset2(iris, exp.a)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
16 5.7 4.4 1.5 0.4 setosa
34 5.5 4.2 1.4 0.2 setosa
expand
is recursive:
exp.b <- quote(Species == 'virginica')
exp.c <- quote(Sepal.Width > 3.6)
exp.d <- quote(exp.b & exp.c)
subset2(iris, exp.d)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
118 7.7 3.8 6.7 2.2 virginica
132 7.9 3.8 6.4 2.0 virginica
We abide by R semantics so that programmable NSE functions are almost identical to normal NSE functions, with programmability as a bonus.
If you wish to write a function that uses a programmable NSE function and forwards its NSE arguments to it, you must ensure the NSE expressions are evaluated in the correct environment, typically the parent.frame()
. This is no different than with normal NSE functions. An example:
subset3 <- function(x, subset, select, drop=FALSE) {
frm <- parent.frame() # as per note in ?parent.frame, better to call here
sub.q <- expand(substitute(subset), x, frm)
sel.q <- expand(substitute(select), x, frm)
eval(bquote(base::subset(.(x), .(sub.q), .(sel.q), drop=.(drop))), frm)
}
We use bquote
to assemble our substituted call and eval
to evaluate it in the correct frame. The parts of the call that should evaluate in subset3
are escaped with .()
. This requires some work from the programmer, but the user reaps the benefits:
col <- quote(Sepal.Length)
sub <- quote(Species == 'setosa')
subset3(iris, sub & col > 5.5, col:Petal.Length)
Sepal.Length Sepal.Width Petal.Length
15 5.8 4.0 1.2
16 5.7 4.4 1.5
19 5.7 3.8 1.7
Notice that we used expand
with the base NSE function subset
. Because expand
just generates language objects, you can use it with any NSE function.
The forwarding is robust to unusual evaluation:
col.a <- quote(I_dont_exist)
col.b <- quote(Sepal.Length)
sub.a <- quote(stop("all hell broke loose"))
threshold <- 3.35
local({
col.a <- quote(Sepal.Width)
sub.a <- quote(Species == 'virginica')
subs <- list(sub.a, quote(Species == 'versicolor'))
lapply(
subs,
function(x) subset3(iris, x & col.a > threshold, col.b:Petal.Length)
)
})
[[1]]
Sepal.Length Sepal.Width Petal.Length
110 7.2 3.6 6.1
118 7.7 3.8 6.7
132 7.9 3.8 6.4
137 6.3 3.4 5.6
149 6.2 3.4 5.4
[[2]]
Sepal.Length Sepal.Width Petal.Length
86 6 3.4 4.5
One drawback of the eval
/bquote
/.()
pattern is that the actual objects inside .()
are placed on the call stack. This is not an issue with symbols, but can be bothersome with data or functions. For example, in:
my_fun_inner <- function(x) {
# ... bunch of code
stop("end")
}
my_fun_outer <- function(x) {
eval(bquote(.(my_fun)(.(x))), parent.frame())
}
my_fun_outer(mtcars)
traceback()
The entire deparsed function definition and data frame will be displayed in the traceback, which makes it difficult to see what is happening. A simple work-around is to use:
sapply(.traceback(), head, 1)
sapply(sys.calls(), head, 1) # sys.calls is similarly affected
rlang
oshka
is simple in design and purpose. It exports a single function that substitutes expressions into other expressions. It hews closely to R semantics. rlang
is more ambitious and more complex as a result. To use it you must learn new concepts and semantics.
One manifestation of the additional complexity in rlang
is that you must unquote expressions to use them:
rlang.b <- quo(Species == 'virginica')
rlang.c <- quo(Sepal.Width > 3.6)
rlang.d <- quo(!!rlang.b & !!rlang.c)
dplyr::filter(iris, !!rlang.d)
As shown earlier, the expand
version is more straightforward as it uses the standard quote
function and does not require unquoting:
exp.b <- quote(Species == 'virginica')
exp.c <- quote(Sepal.Width > 3.6)
exp.d <- quote(exp.b & exp.c)
subset2(iris, exp.d)
On the other hand, forwarding of NSE arguments to NSE functions is simpler in rlang
due to environment capture feature of quosures:
rlang_virginica <- function(subset) {
subset <- enquo(subset)
dplyr::filter(iris, Species == 'virginica' & !!subset)
}
Because oshka
does not capture environments, we must resort to the eval
/bquote
pattern:
oshka_virginica <- function(subset) {
subset <- bquote(Species == 'virginica' & .(substitute(subset)))
eval(bquote(.(subset2)(iris, .(subset))), parent.frame())
}
oshka
minimizes the complexity in what we see as the most common use case, and sticks to R semantics for the more complicated ones.
For additional discussion on rlang
see the following presentations: