Filter & Storms dataset

Create subsets data

Published

August 28, 2024

Warm-up

  1. What’s wrong with the following code
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.3     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(gapminder)
ggplot(gapminder,aes(x="year",y="pop")) + geom_point()

  1. Create an R-script that does the following.
  1. Load the gapminder data
  2. Create a variable where each value is a country in the gapminder data set, with no repeats.
  3. Use %in% to see if the variable contains various countries of your choice.
  4. Note how to execute the script using the keypad, one way to execute a script.
source("my_script.R",echo = TRUE)

The filter command

Below we make a subset of the data, whose country is China

C <- filter(gapminder, 
       country == "China")

Do a ?filter to learn how else to modify the 2nd parameter using & , | and more.

Use a filter to reduce the size of the data and then label points on a scatterplot using geom_text_repel

library(ggrepel)
hi_pop_countries <- filter(gapminder, 
                           pop > 500000000)
ggplot(hi_pop_countries, 
       aes(x = year, y = gdpPercap)) + 
       geom_point() + 
       geom_text_repel(aes(label = country))

Here’s another filter and preview of boxplots & the reorder function.

hi_pop_countries <- filter(gapminder, 
                           pop > 50000000)
ggplot(hi_pop_countries, 
       aes(x = lifeExp, y = reorder(country,lifeExp))) + 
    geom_boxplot()

And here’s a histogram:

TCU <- filter(gapminder,country %in% c("Taiwan","China","United States"))

TCU |> ggplot(aes(x = lifeExp, fill = country)) + geom_histogram(bins = 10)

Explore the storms dataset

The data() command lists all dataset included with R and the Tidyverse. Note that the storms data is in dplyr.

Use filter and varioius geoms geom_point(), geom_histogram(), geom_boxplot(). to compare storms across time.

Assignment 3

Complete these exercises from the text. Append your answers to the quarto doc for Assignment 2. Submit as a .html doc.

  1. https://r4ds.hadley.nz/data-visualize.html#exercises

  2. https://r4ds.hadley.nz/data-visualize#exercises-1