bench_compare {dplyr}R Documentation

Evaluate, compare, benchmark operations of a set of srcs.

Description

These functions support the comparison of results and timings across multiple sources.

Usage

bench_tbls(tbls, op, ..., times = 10)

compare_tbls(tbls, op, ref = NULL, compare = equal_data_frame, ...)

compare_tbls2(tbls_x, tbls_y, op, ref = NULL,
  compare = equal_data_frame, ...)

eval_tbls(tbls, op)

eval_tbls2(tbls_x, tbls_y, op)

Arguments

tbls, tbls_x, tbls_y

A list of tbl()s.

op

A function with a single argument, called often with each element of tbls.

...

For compare_tbls(): additional parameters passed on the compare() function

For bench_tbls(): additional benchmarks to run.

times

For benchmarking, the number of times each operation is repeated.

ref

For checking, a data frame to test results against. If not supplied, defaults to the results from the first src.

compare

A function used to compare the results. Defaults to equal_data_frame which ignores the order of rows and columns.

Value

eval_tbls(): a list of data frames.

compare_tbls(): an invisible TRUE on success, otherwise an error is thrown.

bench_tbls(): an object of class microbenchmark::microbenchmark()

See Also

src_local() for working with local data

Examples

## Not run: 
if (require("microbenchmark") && has_lahman()) {
lahman_local <- lahman_srcs("df", "sqlite")
teams <- lapply(lahman_local, function(x) x %>% tbl("Teams"))

compare_tbls(teams, function(x) x %>% filter(yearID == 2010))
bench_tbls(teams, function(x) x %>% filter(yearID == 2010))

# You can also supply arbitrary additional arguments to bench_tbls
# if there are other operations you'd like to compare.
bench_tbls(teams, function(x) x %>% filter(yearID == 2010),
   base = subset(Lahman::Teams, yearID == 2010))

# A more complicated example using multiple tables
setup <- function(src) {
  list(
    src %>% tbl("Batting") %>% filter(stint == 1) %>% select(playerID:H),
    src %>% tbl("Master") %>% select(playerID, birthYear)
  )
}
two_tables <- lapply(lahman_local, setup)

op <- function(tbls) {
  semi_join(tbls[[1]], tbls[[2]], by = "playerID")
}
# compare_tbls(two_tables, op)
bench_tbls(two_tables, op, times = 2)

}

## End(Not run)

[Package dplyr version 0.8.5 Index]