Mini-Project 1

Due: September 19

Published

September 19, 2024

In your first project you are to use tools learned this far to explore a real dataset.

Datasets

Use one of these datasets below:

American Time Use Survey
1. From the U.S. Bureau of Labor Statistics
2. ATUS - link
Panel Study of Income Dynamics
1. Based at the University of Michigan
2. PSID - link

Required Components

Use one of the datasets above.
Submit a .html file rendered via Quarto / Rmarkdown.
Present your project to class: 3-5 minutes (max!)
Use
1. ggplot() with good labels & legends
2. filter()
3. select()
4. facet_wrap() or facet_grid()
5. rename()
6. group_by() & summarize()
Include the following plots
1. barplot
2. boxplot
3. scatterplot
Include a brief summary and conclusion.
Include a brief discussion each of your plots. Two-three sentences should suffice.

Tips

Create an R-script to do your exploratory work.
Comment well.
After you have some good plots and/or statistics, copy your work into the .qmd file.
Remember, your markdown environment is different than your console, so you will have to load the data/tidyverse again.
This page sheds light on the haven package we’re using with the ATUS data. Two tools that might be useful:

library(tidyverse)
library(haven)

source("~/Google Drive/Teaching/DAT309/Week3/load_ATUS_data.R")
# make the hvn+lbl a factor
df |> select(sex) |> as_factor()

# A tibble: 868,270 × 1
   sex   
   <fct> 
 1 Female
 2 Female
 3 Female
 4 Female
 5 Female
 6 Female
 7 Female
 8 Female
 9 Female
10 Female
# ℹ 868,260 more rows

# use the number part of the hvn+lbl variable
df |> select(sex) |> zap_labels()

# A tibble: 868,270 × 1
     sex
   <int>
 1     2
 2     2
 3     2
 4     2
 5     2
 6     2
 7     2
 8     2
 9     2
10     2
# ℹ 868,260 more rows