Using tidylog
adds a small overhead to each function
call. For instance, because tidylog needs to figure out how many rows
were dropped when you use tidylog::filter
, this call will
be a bit slower than using dplyr::filter
directly. The
overhead is usually not noticeable, but can be for larger datasets,
especially when using joins. The benchmarks below give some impression
of how large the overhead is.
On a small dataset:
bench::mark(
dplyr::filter(mtcars, cyl == 4),
tidylog::filter(mtcars, cyl == 4), iterations = 100
) %>%
dplyr::select(expression, min, median, n_itr) %>%
kable()
expression | min | median | n_itr |
---|---|---|---|
dplyr::filter(mtcars, cyl == 4) | 281µs | 289µs | 99 |
tidylog::filter(mtcars, cyl == 4) | 633µs | 665µs | 98 |
On a larger dataset:
df <- tibble(x = rnorm(100000))
bench::mark(
dplyr::filter(df, x > 0),
tidylog::filter(df, x > 0), iterations = 100
) %>%
dplyr::select(expression, min, median, n_itr) %>%
kable()
expression | min | median | n_itr |
---|---|---|---|
dplyr::filter(df, x > 0) | 636.32µs | 762.6µs | 96 |
tidylog::filter(df, x > 0) | 1.08ms | 1.2ms | 96 |
On a small dataset:
bench::mark(
dplyr::mutate(mtcars, cyl = as.factor(cyl)),
tidylog::mutate(mtcars, cyl = as.factor(cyl)), iterations = 100
) %>%
dplyr::select(expression, min, median, n_itr) %>%
kable()
expression | min | median | n_itr |
---|---|---|---|
dplyr::mutate(mtcars, cyl = as.factor(cyl)) | 322µs | 335µs | 99 |
tidylog::mutate(mtcars, cyl = as.factor(cyl)) | 766µs | 798µs | 97 |
On a larger dataset:
df <- tibble(x = round(runif(10000) * 10))
bench::mark(
dplyr::mutate(df, x = as.factor(x)),
tidylog::mutate(df, x = as.factor(x)), iterations = 100
) %>%
dplyr::select(expression, min, median, n_itr) %>%
kable()
expression | min | median | n_itr |
---|---|---|---|
dplyr::mutate(df, x = as.factor(x)) | 2.59ms | 2.64ms | 99 |
tidylog::mutate(df, x = as.factor(x)) | 3.03ms | 3.1ms | 97 |
Joins are the most expensive operation, as tidylog has to do two additional joins behind the scenes.
On a small dataset:
bench::mark(
dplyr::inner_join(band_members, band_instruments, by = "name"),
tidylog::inner_join(band_members, band_instruments, by = "name"), iterations = 100
) %>%
dplyr::select(expression, min, median, n_itr) %>%
kable()
expression | min | median | n_itr |
---|---|---|---|
dplyr::inner_join(band_members, band_instruments, by = “name”) | 418.32µs | 432.7µs | 98 |
tidylog::inner_join(band_members, band_instruments, by = “name”) | 2.64ms | 2.7ms | 91 |
On a larger dataset (with many row duplications):
N <- 1000
df1 <- tibble(x1 = rnorm(N), key = round(runif(N) * 10))
df2 <- tibble(x2 = rnorm(N), key = round(runif(N) * 10))
bench::mark(
dplyr::inner_join(df1, df2, by = "key"),
tidylog::inner_join(df1, df2, by = "key"), iterations = 100
) %>%
dplyr::select(expression, min, median, n_itr) %>%
kable()
#> Warning in dplyr::inner_join(df1, df2, by = "key"): Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> Detected an unexpected many-to-many relationship between `x` and `y`.
#> ℹ Row 1 of `x` matches multiple rows in `y`.
#> ℹ Row 23 of `y` matches multiple rows in `x`.
#> ℹ If a many-to-many relationship is expected, set `relationship =
#> "many-to-many"` to silence this warning.
expression | min | median | n_itr |
---|---|---|---|
dplyr::inner_join(df1, df2, by = “key”) | 6.24ms | 6.42ms | 79 |
tidylog::inner_join(df1, df2, by = “key”) | 3.42ms | 3.56ms | 88 |