Code
library(tidyverse)
library(palmerpenguins)
|> summarize(mx_bd = max(bill_depth_mm)) penguins
# A tibble: 1 × 1
mx_bd
<dbl>
1 NA
Creating a summary is an example of an operation on a table. Joining two tables together is as well. Creating summaries of groups of data is also discussed here.
Glue two tables together using bind_rows
. Experiment to learn how different variables are handled.
Shuffle a dataset into groups, so that subsequent analysis is done “by group” using group_by()
.
Summarize()
transforms your data table into one or more rows, where each row constitutes a summary of the corresponding data.
library(tidyverse)
library(palmerpenguins)
|> summarize(mx_bd = max(bill_depth_mm)) penguins
# A tibble: 1 × 1
mx_bd
<dbl>
1 NA
|> group_by(species) |>
penguins summarize(mx_bd = max(bill_depth_mm, na.rm = TRUE))
# A tibble: 3 × 2
species mx_bd
<fct> <dbl>
1 Adelie 21.5
2 Chinstrap 20.8
3 Gentoo 17.3
n = n()
|> group_by(species) |>
penguins summarize(mx_bd = max(bill_depth_mm, na.rm = TRUE), n = n())
# A tibble: 3 × 3
species mx_bd n
<fct> <dbl> <int>
1 Adelie 21.5 152
2 Chinstrap 20.8 68
3 Gentoo 17.3 124
|> group_by(species) |>
penguins summarize(mx_bd = max(bill_depth_mm, na.rm = TRUE, n = n()))
# A tibble: 3 × 2
species mx_bd
<fct> <dbl>
1 Adelie 152
2 Chinstrap 68
3 Gentoo 124
|> group_by(species) |> summarize(n = n()) penguins
# A tibble: 3 × 2
species n
<fct> <int>
1 Adelie 152
2 Chinstrap 68
3 Gentoo 124
df |> slice_head(n = 1)
takes the first row from each group.df |> slice_tail(n = 1)
takes the last row in each group.df |> slice_min(x, n = 1)
takes the row with the smallest value of column x.df |> slice_max(x, n = 1)
takes the row with the largest value of column x.df |> slice_sample(n = 1)
takes one random row..by()
To remove grouping use ungroup()
, and to do “in-line” grouping on a per-operation basis you can use .by()
|> summarize(
penguins mx_bd = max(bill_depth_mm),
n = n(),
.by = species)
# A tibble: 3 × 3
species mx_bd n
<fct> <dbl> <int>
1 Adelie NA 152
2 Gentoo NA 124
3 Chinstrap 20.8 68