filter {dplyr} | R Documentation |
Use filter()
to choose rows/cases where conditions are true. Unlike
base subsetting with [
, rows where the condition evaluates to NA
are
dropped.
filter(.data, ..., .preserve = FALSE)
.data |
A tbl. All main verbs are S3 generics and provide methods
for |
... |
Logical predicates defined in terms of the variables in The arguments in |
.preserve |
when |
Note that dplyr is not yet smart enough to optimise filtering optimisation
on grouped datasets that don't need grouped calculations. For this reason,
filtering is often considerably faster on ungroup()
ed data.
An object of the same class as .data
.
Because filtering expressions are computed within groups, they may yield different results on grouped tibbles. This will be the case as soon as an aggregating, lagging, or ranking function is involved. Compare this ungrouped filtering:
starwars %>% filter(mass > mean(mass, na.rm = TRUE))
With the grouped equivalent:
starwars %>% group_by(gender) %>% filter(mass > mean(mass, na.rm = TRUE))
The former keeps rows with mass
greater than the global average
whereas the latter keeps rows with mass
greater than the gender
average.
It is valid to use grouping variables in filter expressions.
When applied on a grouped tibble, filter()
automatically rearranges
the tibble by groups for performance reasons.
When applied to a data frame, row names are silently dropped. To preserve,
convert to an explicit variable with tibble::rownames_to_column()
.
The three scoped variants (filter_all()
, filter_if()
and
filter_at()
) make it easy to apply a filtering condition to a
selection of variables.
filter_all()
, filter_if()
and filter_at()
.
Other single table verbs: arrange
,
mutate
, select
,
slice
, summarise
filter(starwars, species == "Human") filter(starwars, mass > 1000) # Multiple criteria filter(starwars, hair_color == "none" & eye_color == "black") filter(starwars, hair_color == "none" | eye_color == "black") # Multiple arguments are equivalent to and filter(starwars, hair_color == "none", eye_color == "black") # The filtering operation may yield different results on grouped # tibbles because the expressions are computed within groups. # # The following filters rows where `mass` is greater than the # global average: starwars %>% filter(mass > mean(mass, na.rm = TRUE)) # Whereas this keeps rows with `mass` greater than the gender # average: starwars %>% group_by(gender) %>% filter(mass > mean(mass, na.rm = TRUE)) # Refer to column names stored as strings with the `.data` pronoun: vars <- c("mass", "height") cond <- c(80, 150) starwars %>% filter( .data[[vars[[1]]]] > cond[[1]], .data[[vars[[2]]]] > cond[[2]] ) # For more complex cases, knowledge of tidy evaluation and the # unquote operator `!!` is required. See https://tidyeval.tidyverse.org/ # # One useful and simple tidy eval technique is to use `!!` to bypass # the data frame and its columns. Here is how to filter the columns # `mass` and `height` relative to objects of the same names: mass <- 80 height <- 150 filter(starwars, mass > !!mass, height > !!height)