Manipulating data with dplyr

Row Functions

arrange

Permute the ordering of the rows. (Note the r in arrange, r for rows.) If you provide more than one column name, each additional column will be used to break ties in the values of preceding columns.

library(tidyverse)
library(nycflights13)
arrange(flights,month,dep_time)
# A tibble: 336,776 × 19
    year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
   <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
 1  2013     1    13        1           2249        72      108           2357
 2  2013     1    31        1           2100       181      124           2225
 3  2013     1     9        2           2359         3      432            444
 4  2013     1    13        2           2359         3      502            444
 5  2013     1    16        2           2125       157      119           2250
 6  2013     1    10        3           2359         4      426            437
 7  2013     1    13        3           2030       213      340           2350
 8  2013     1    16        3           1946       257      212           2154
 9  2013     1    30        3           2159       124      100           2306
10  2013     1    31        4           2359         5      455            444
# ℹ 336,766 more rows
# ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
#   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
#   hour <dbl>, minute <dbl>, time_hour <dttm>

distinct

Use distinct on several column names to find unique combinations. The .keep_all = TRUE argument is used to retain all columsns

library(nycflights13)
flights |> distinct(origin,dest)
# A tibble: 224 × 2
   origin dest 
   <chr>  <chr>
 1 EWR    IAH  
 2 LGA    IAH  
 3 JFK    MIA  
 4 JFK    BQN  
 5 LGA    ATL  
 6 EWR    ORD  
 7 EWR    FLL  
 8 LGA    IAD  
 9 JFK    MCO  
10 LGA    ORD  
# ℹ 214 more rows

Column functions

mutate

Add new variables (columns), usually via a formula involving existing ones.

  1. helper functions
  1. .before = 1
  2. .after = some_var_name

select

Useful if you have too many columns, choose which columns you wish to view.

  1. tips & helper functions
  1. use : to select a range
  2. use ! to exclude
  3. use where with is.factor(), or is.numeric(), or is.character()
  4. starts_with("abc"): matches names that begin with “abc”.
  5. ends_with("xyz"): matches names that end with “xyz”.
  6. contains("ijk"): matches names that contain “ijk”.
  7. num_range("x", 1:3): matches x1, x2 and x3.

rename

Explicitly rename variables. Do so in bulk with janitor::clean_names

relocate

Permute the ordering of columns (notice the c in relocate, c for columns)

Table Operations

One way to glue two tables together using bind_rows and bind_cols. Experiment to learn how different variables are handled.