Advanced Functions in R

functions that summarize & plot

Published

November 7, 2024

Fun with Functions

We want a function to compute a “grouped mean” of a given dataset. Try this, and see if it works:

grouped_mean <- function(data, group, var){
    data |> group_by(group) |> 
        summarize(mean(var), na.rm = TRUE)}

It doesn’t.

We use the curly-curly operator (see ?"{{}}").

The embrace operator {{ is used to create functions that call other data-masking functions. It transports a data-masked argument (an argument that can refer to columns of a data frame) from one function to another.

It’s useful when passing an argument that has to be substituted in place before being evaluated in another context.

library(tidyverse)
library(nycflights13)
grouped_mean <- function(.data, group, var){
  .data |> group_by({{group}}) |> 
    summarize("{{var}}" := mean({{var}}, na.rm = TRUE))
}

now a grouped function, as opposed to a grouped mean

grouped_f <- function(.data, group, .var, fn){
  # getting a string to match our 4th paramter
  fname <- as.character(substitute(fn))
  # compare output with: 
  # fname <- as.character(quote(fn))
  
  print(fname)
  
  # now get R-function that matches our 4th parameter
  my_fun <- match.fun(fn)
  
  # group_by is a data-masking function
  df <- .data |> group_by({{group}}) |> 
    summarize("{{.var}}" := my_fun({{.var}}, na.rm = TRUE))
  
  # vector of strings
  var_names <- df |> names()
  # this summarize creates twos variables, one is the "group"
  # the other is the ".var" and we 
  # glue the 2nd variable together with fname
  new_name <- str_c(fname, "_", var_names[2])
  
  # sanity check
  print(new_name)
  
  df |> rename({{new_name}} := {{.var}})
  
}

plotting function

diamonds |> ggplot(aes(x = color)) + 
  geom_bar()

flights |> ggplot(aes(x = dest)) + 
  geom_bar()

barplot_f <- function(.data, .var) {
  s <- as.character(substitute(.var)) 
  print(s)
  .data |> ggplot(aes(x = {{ .var }})) + 
    geom_bar() + 
    labs(title = str_c("My cool ", s, " plot"))
}

A big summary function

my_summary <- function(data, var) {
  data |> summarize(
    min = min({{ var }}, na.rm = TRUE),
    mean = mean({{ var }}, na.rm = TRUE),
    median = median({{ var }}, na.rm = TRUE),
    max = max({{ var }}, na.rm = TRUE),
    n = n(),
    n_miss = sum(is.na({{ var }})),
    .groups = "drop"
  )
}

diamonds |> my_summary(carat)
# A tibble: 1 × 6
    min  mean median   max     n n_miss
  <dbl> <dbl>  <dbl> <dbl> <int>  <int>
1   0.2 0.798    0.7  5.01 53940      0