Mini-Project 1

(now) Due: September 23

Published

September 16, 2025

In your first project you are to use tools learned this far to explore a real dataset.

Datasets

Use one of these datasets below:

  1. American Time Use Survey
    1. From the U.S. Bureau of Labor Statistics
    2. ATUS Overview - link
    3. The codebook: link
    4. Load the data via:
    library(tidyverse)
    atus_data <- read_rds("https://euclid.nmu.edu/~joshthom/teaching/dat309/week2/ATUS2/ATUS_data.RDS")
  2. Panel Study of Income Dynamics
    1. Based at the University of Michigan
    2. PSID Overview - link
    3. How to Download: - link

Required Components

  1. Use one of the datasets above.
  2. Submit a .html file rendered via Quarto / Rmarkdown.
  3. Present your project to class: 3-5 minutes (max!)
  4. Use
    1. ggplot() with good labels & legends
    2. filter()
    3. select()
    4. facet_wrap() or facet_grid()
    5. rename()
    6. group_by() & summarize()
  5. Include the following plots
    1. barplot
    2. boxplot
    3. scatterplot
  6. Include a brief summary and conclusion.
  7. Include a brief discussion each of your plots. Two-three sentences should suffice.

Tips

  1. Create an R-script to do your exploratory work.
  2. Comment well.
  3. After you have some good plots and/or statistics, copy your work into the .qmd file.
  4. Remember, your markdown environment is different than your console, so you will have to load the data/tidyverse again.
  5. This page sheds light on the haven package we’re using with the ATUS data. Two tools that might be useful:
library(tidyverse)
library(janitor)

df <- read_rds("https://euclid.nmu.edu/~joshthom/teaching/dat309/week2/ATUS2/ATUS_data.RDS") |> clean_names()

# make the hvn+lbl a factor
new_atus <- df |> mutate(sex =  haven::as_factor(sex))
# use the number part of the hvn+lbl variable
new_atus <- new_atus |> mutate(sex = haven::zap_labels(sex))
library(tidyverse)
atus_data <- read_rds("https://euclid.nmu.edu/~joshthom/teaching/dat309/week2/ATUS2/ATUS_data.RDS")
# filter(data, )