Mini-Project 1

Due: September 19

Published

September 19, 2024

In your first project you are to use tools learned this far to explore a real dataset.

Datasets

Use one of these datasets below:

  1. American Time Use Survey
    1. From the U.S. Bureau of Labor Statistics
    2. ATUS - link
  2. Panel Study of Income Dynamics
    1. Based at the University of Michigan
    2. PSID - link

Required Components

  1. Use one of the datasets above.
  2. Submit a .html file rendered via Quarto / Rmarkdown.
  3. Present your project to class: 3-5 minutes (max!)
  4. Use
    1. ggplot() with good labels & legends
    2. filter()
    3. select()
    4. facet_wrap() or facet_grid()
    5. rename()
    6. group_by() & summarize()
  5. Include the following plots
    1. barplot
    2. boxplot
    3. scatterplot
  6. Include a brief summary and conclusion.
  7. Include a brief discussion each of your plots. Two-three sentences should suffice.

Tips

  1. Create an R-script to do your exploratory work.
  2. Comment well.
  3. After you have some good plots and/or statistics, copy your work into the .qmd file.
  4. Remember, your markdown environment is different than your console, so you will have to load the data/tidyverse again.
  5. This page sheds light on the haven package we’re using with the ATUS data. Two tools that might be useful:
library(tidyverse)
library(haven)

source("~/Google Drive/Teaching/DAT309/Week3/load_ATUS_data.R")
# make the hvn+lbl a factor
df |> select(sex) |> as_factor()
# A tibble: 868,270 × 1
   sex   
   <fct> 
 1 Female
 2 Female
 3 Female
 4 Female
 5 Female
 6 Female
 7 Female
 8 Female
 9 Female
10 Female
# ℹ 868,260 more rows
# use the number part of the hvn+lbl variable
df |> select(sex) |> zap_labels()
# A tibble: 868,270 × 1
     sex
   <int>
 1     2
 2     2
 3     2
 4     2
 5     2
 6     2
 7     2
 8     2
 9     2
10     2
# ℹ 868,260 more rows