mutate {dplyr} | R Documentation |
mutate()
adds new variables and preserves existing ones;
transmute()
adds new variables and drops existing ones. Both
functions preserve the number of rows of the input.
New variables overwrite existing variables of the same name.
mutate(.data, ...) transmute(.data, ...)
.data |
A tbl. All main verbs are S3 generics and provide methods
for |
... |
Name-value pairs of expressions, each with length 1 or the same
length as the number of rows in the group (if using The arguments in |
An object of the same class as .data
.
Because mutating expressions are computed within groups, they may yield different results on grouped tibbles. This will be the case as soon as an aggregating, lagging, or ranking function is involved. Compare this ungrouped mutate:
starwars %>% mutate(mass / mean(mass, na.rm = TRUE)) %>% pull()
With the grouped equivalent:
starwars %>% group_by(gender) %>% mutate(mass / mean(mass, na.rm = TRUE)) %>% pull()
The former normalises mass
by the global average whereas the
latter normalises by the averages within gender levels.
Note that you can't overwrite a grouping variable within
mutate()
.
mutate()
does not evaluate the expressions when the group is empty.
The three scoped variants of mutate()
(mutate_all()
,
mutate_if()
and mutate_at()
) and the three variants of
transmute()
(transmute_all()
, transmute_if()
,
transmute_at()
) make it easy to apply a transformation to a
selection of variables.
When applied to a data frame, row names are silently dropped. To preserve,
convert to an explicit variable with tibble::rownames_to_column()
.
Other single table verbs: arrange
,
filter
, select
,
slice
, summarise
# Newly created variables are available immediately mtcars %>% as_tibble() %>% mutate( cyl2 = cyl * 2, cyl4 = cyl2 * 2 ) # You can also use mutate() to remove variables and # modify existing variables mtcars %>% as_tibble() %>% mutate( mpg = NULL, disp = disp * 0.0163871 # convert to litres ) # window functions are useful for grouped mutates mtcars %>% group_by(cyl) %>% mutate(rank = min_rank(desc(mpg))) # see `vignette("window-functions")` for more details # You can drop variables by setting them to NULL mtcars %>% mutate(cyl = NULL) # mutate() vs transmute -------------------------- # mutate() keeps all existing variables mtcars %>% mutate(displ_l = disp / 61.0237) # transmute keeps only the variables you create mtcars %>% transmute(displ_l = disp / 61.0237) # The mutate operation may yield different results on grouped # tibbles because the expressions are computed within groups. # The following normalises `mass` by the global average: starwars %>% mutate(mass / mean(mass, na.rm = TRUE)) %>% pull() # Whereas this normalises `mass` by the averages within gender # levels: starwars %>% group_by(gender) %>% mutate(mass / mean(mass, na.rm = TRUE)) %>% pull() # Note that you can't overwrite grouping variables: gdf <- mtcars %>% group_by(cyl) try(mutate(gdf, cyl = cyl * 100)) # Refer to column names stored as strings with the `.data` pronoun: vars <- c("mass", "height") mutate(starwars, prod = .data[[vars[[1]]]] * .data[[vars[[2]]]]) # For more complex cases, knowledge of tidy evaluation and the # unquote operator `!!` is required. See https://tidyeval.tidyverse.org/ # # One useful and simple tidy eval technique is to use `!!` to # bypass the data frame and its columns. Here is how to divide the # column `mass` by an object of the same name: mass <- 100 mutate(starwars, mass = mass / !!mass)