When you write functions that operate on S3 or unclassed objects you can either trust that your inputs will be structured as expected, or tediously check that they are.
vetr
takes the tedium out of structure verification so
that you can trust, but verify. It lets you express structural
requirements declaratively with templates, and it auto-generates
human-friendly error messages as needed.
vetr
is written in C to minimize overhead from parameter
checks in your functions. It has no dependencies.
Declare a template that an object should conform to, and let
vetr
take care of the rest:
[1] "`length(1:3)` should be 1 (is 3)"
[1] "`\"hello\"` should be type \"numeric\" (is \"character\")"
[1] TRUE
The template concept is based on vapply
, but generalizes
to all S3 objects and adds some special features to facilitate
comparison. For example, zero length templates match any length:
[1] TRUE
[1] TRUE
And for convenience short (<= 100 length) integer-like numerics are considered integer:
[1] TRUE
[1] "`1.0001` should be type \"integer-like\" (is \"double\")"
vetr
can compare recursive objects such as lists, or
data.frames:
tpl.iris <- iris[0, ] # 0 row DF matches any number of rows in object
iris.fake <- iris
levels(iris.fake$Species)[3] <- "sibirica" # tweak levels
vet(tpl.iris, iris)
[1] TRUE
[1] "`levels(iris.fake$Species)[3]` should be \"virginica\" (is \"sibirica\")"
From our declared template iris[0, ]
, vetr
infers all the required checks. In this case,
vet(iris[0, ], iris.fake, stop=TRUE)
is equivalent to:
stopifnot_iris <- function(x) {
stopifnot(
is.data.frame(x),
is.list(x),
length(x) == length(iris),
identical(lapply(x, class), lapply(iris, class)),
is.integer(attr(x, 'row.names')),
identical(names(x), names(iris)),
identical(typeof(x$Species), "integer"),
identical(levels(x$Species), levels(iris$Species))
)
}
stopifnot_iris(iris.fake)
Error in stopifnot_iris(iris.fake): identical(levels(x$Species), levels(iris$Species)) is not TRUE
vetr
saved us typing, and the time and thought needed to
come up with what needs to be compared.
You could just as easily have created templates for nested lists, or
data frames in lists. Templates are compared to objects with the
alike
function. For a thorough description of templates and
how they work see the alike
vignette. For template examples see example(alike)
.
Let’s revisit the error message:
[1] "`levels(iris.fake$Species)[3]` should be \"virginica\" (is \"sibirica\")"
It tells us:
levels(iris.fake$Species)[3]
vetr
does what it can to reduce the time from error to
resolution. The location of failure is generated such that you can
easily copy it in part or full to the R prompt for further
examination.
You can combine templates with &&
/
||
:
[1] TRUE
[1] TRUE
[1] "`\"foo\"` should be `NULL`, or type \"numeric\" (is \"character\")"
Templates only check structure. When you need to check values use
.
to refer to the object:
[1] "`-42 > 0` is not TRUE (FALSE)"
[1] TRUE
If you do use the .
symbol in your vetting expressions
in your packages, you will need to include
utils::globalVariables(".")
as a top-level call to avoid
the “no visible binding for global variable ‘.’” R CMD check NOTE.
You can compose vetting expressions as language objects and combine them:
scalar.num.pos <- quote(numeric(1L) && . > 0)
foo.or.bar <- quote(character(1L) && . %in% c('foo', 'bar'))
vet.exp <- quote(scalar.num.pos || foo.or.bar)
vet(vet.exp, 42)
[1] TRUE
[1] TRUE
[1] "At least one of these should pass:"
[2] " - `\"baz\" %in% c(\"foo\", \"bar\")` is not TRUE (FALSE)"
[3] " - `\"baz\"` should be type \"numeric\" (is \"character\")"
all_bw
is available for value range checks (~10x faster
than isTRUE(all(. >= x & . <= y))
for large
vectors):
[1] "`all_bw(runif(5) + 1, 0, 1)` is not TRUE (is chr: \"`1.419590` at index 1 not in `[0,1]`\")"
There are a number of predefined vetting tokens you can use in your vetting expressions such as:
[1] "`-runif(5)` should contain only positive values, but has negatives"
Vetting expressions are designed to be intuitive to use, but their
implementation is complex. We recommend you look at
example(vet)
for usage ideas, or at the “Non Standard Evaluation” section of the
vignette for the gory details.
vet
captures the first argument unevaluated. For example
in:
. > 0
is captured, processed, and evaluated in a
special manner. This is a common pattern in R (e.g. as in
with
, subset
, etc.) called Non Standard
Evaluation (NSE). One additional wrinkle with vet
is that
symbols in the captured expression are recursively substituted:
The above is thus equivalent to:
The recursive substitution removes the typical limitation on “programming” with NSE, although there are a few things to know:
fun
in fun(a, b)
); this extends to
operators..
is never substituted, though you can work around that
by escaping it with an additional .
(i.e. ..
).To illustrate the last point, suppose we want to check that an object
is a call in the form x + y
, then we could use:
Or:
Additionally, you will need to ensure that x
and
y
themselves do not evaluate to language objects in the
parent frame.
Once a vetting expression has been recursively substituted, it is
parsed into tokens. Tokens are the parts of the vetting expression
bounded by the &&
and ||
operators and
optionally enclosed in parentheses. For example, there are three tokens
in the following vetting expression:
They are logical(1)
, numeric(1)
, and
. > 0 & . < 1
. The last token is just one token
not because of the parentheses around it but because it is a call to
&
as opposed to &&
. Here we use
the parentheses to remove parsing ambiguity caused by &
and &&
having the same operator precedence.
After the tokens have been identified they are classified as standard
tokens or template tokens. Standard tokens are those that contain the
.
symbol. Every other token is considered a template
token.
Standard tokens are further processed by substituting any
.
with the value of the object being vetted. These tokens
are then evaluated and if all(<result-of-evaluation>)
is TRUE
then the tokens pass, otherwise they fail. Note
all(logical(0L))
is TRUE. With:
[1] TRUE
. > 0
becomes 1:3 > 0
, which
evaluates to c(TRUE, TRUE, TRUE)
and the token passes.
Template tokens, i.e. tokens without a .
symbol, are
evaluated and the resulting R object is sent along with the object to
vet to alike
for structural comparison. If
alike
returns TRUE
then the token passes,
otherwise it fails.
Finally, the result of evaluating each token is plugged back into the original expression. So1:
vet(logical(1) || (numeric(1) && (. > 0 & . < 1)), 42)
# becomes:
alike(logical(1L), 42) || (alike(numeric(1L), 42) && all(42 > 0 & 42 < 1))
# becomes:
FALSE || (TRUE && FALSE)
# becomes:
FALSE
And the vetting fails:
[1] "At least one of these should pass:"
[2] " - `42 > 0 & 42 < 1` is not TRUE (FALSE)"
[3] " - `42` should be type \"logical\" (is \"double\")"
If you need to reference a literal dot (.
) in a token,
you can escape it by adding another dot so that .
becomes
..
. If you want to reference ...
you’ll need
to use ....
. If you have a standard token that does not
reference the vetting object (i.e. does not use .
) you can
mark it as a standard token by wrapping it in .()
(if you
want to use a literal .()
you can use
..()
).
If you need &&
or ||
to be
interpreted literally you can wrap the call in I
to tell
vet
to treat the entire call as a single token:
vet
will stop searching for tokens at the first call to
a function other than (
, &&
, and
||
. The use of I
here is just an example of
this behavior and convenient since I
does not change the
meaning of the vetting token. An implication of this is you should not
nest template tokens inside functions as vet
will not
identify them as template tokens and you may get unexpected results. For
example:
will always fail because logical(1L)
is part of a
standard token and is evaluated as FALSE
rather than used a
template token for a scalar logical.
The vetr
function streamlines parameter checks in
functions. It behaves just like vet
, except that you need
only specify the vetting expressions. The objects to vet are captured
from the function environment:
fun <- function(x, y, z) {
vetr(
matrix(numeric(), ncol=3),
logical(1L),
character(1L) && . %in% c("foo", "bar")
)
TRUE # do work...
}
fun(matrix(1:12, 3), TRUE, "baz")
Error in fun(x = matrix(1:12, 3), y = TRUE, z = "baz"): For argument `x`, `matrix(1:12, 3)` should have 3 columns (has 4)
Error in fun(x = matrix(1:12, 4), y = TRUE, z = "baz"): For argument `z`, `"baz" %in% c("foo", "bar")` is not TRUE (FALSE)
[1] TRUE
The arguments to vetr
are matched to the arguments of
the enclosing function in the same way as with match.call
.
For example, if we wished to vet just the third argument:
fun <- function(x, y, z) {
vetr(z=character(1L) && . %in% c("foo", "bar"))
TRUE # do work...
}
fun(matrix(1:12, 3), TRUE, "baz")
Error in fun(x = matrix(1:12, 3), y = TRUE, z = "baz"): For argument `z`, `"baz" %in% c("foo", "bar")` is not TRUE (FALSE)
[1] TRUE
Vetting expressions work the same way with vetr
as they
do with vet
.
vetr
is written primarily in C to minimize the
performance impact of adding validation checks to your functions.
Performance should be faster than using stopifnot
except
for the most trivial of checks. The vetr
function itself
carries some additional overhead from matching arguments, but it should
still be faster than stopifnot
except in the simplest of
cases. Here we run our checks on valid iris objects we used to
illustrate declarative
checks:
vetr_iris <- function(x) vetr(tpl.iris)
bench_mark(times=1e4,
vet(tpl.iris, iris),
vetr_iris(iris),
stopifnot_iris(iris) # defined in "Templates" section
)
Mean eval time from 10000 iterations, in microseconds:
vet(tpl.iris, iris) ~ 13.9
vetr_iris(iris) ~ 18.9
stopifnot_iris(iris) ~ 35.2
Performance is optimized for the success case. Failure cases should still perform reasonably well, but will be slower than most success cases.
Complex templates will be slower to evaluate than simple ones, particularly for lists with lots of nested elements. Note however that the cost of the vetting expression is a function of the complexity of the template, not that of the value being vetted.
We recommend that you predefine templates in your package and not in the validation expression since some seemingly innocuous template creation expressions carry substantial overhead:
Mean eval time from 1000 iterations, in microseconds:
data.frame(a = numeric()) ~ 100
In this case the data.frame
call alone take over 100us.
In your package code you could use:
This way the template is created once on package load and re-used each time your function is called.
There are many alternatives available to vetr
. We do a
survey of the following in our parameter
validation functions review:
stopifnot
by R Corevetr
by
Yours Trulyasserthat
by Hadley Wickhamassertive
by
Richie Cottoncheckmate
by
Michel LangThe following packages also perform related tasks, although we do not review them:
valaddin
v0.1.0 by Eugene Ha, a framework for augmenting existing functions with
validation contracts. Currently the package is undergoing a major
overhaul so we will add it to the comparison once the new release
(v0.3.0) is out.ensurer
v1.1
by Stefan M. Bache, a framework for flexibly creating and combining
validation contracts. The development version adds an experimental
method for creating type safe functions, but it is not published to CRAN
so we do not test it here.validate
by Mark van der Loo and Edwin de Jonge, with a primary focus on
validating data in data frames and similar data structures.assertr
by Tony Fischetti, also focused on data validation in data frames and
similar structures.types
by Jim Hester, which implements but does not enforce type hinting.argufy
by
Gábor Csárdi, which implements parameter validation via roxygen tags
(not released to CRAN).typed
by Antoine Fabri, which enforces types of symbols, function parameters,
and return values.1We take some liberties in this example for clarity. For
instance, alike
returns a character vector on failure, not
FALSE
, so really what vet
is doing is
isTRUE(alike(...))
.