3*4 - 5
[1] 7
<- c(1:25)
x ^2 x
[1] 1 4 9 16 25 36 49 64 81 100 121 144 169 196 225 256 289 324 361
[20] 400 441 484 529 576 625
<- c("phillips",1:5,"lebowski") last_names
After you have downloaded R and RStudio open RStudio and play around with the software. RStudio has four main panes, the Source Editor, the Workspace Browser, the Plots, and the Console - each with various tabs. Learn more here.
Try some calculations in the console. While you’re in the console, browse the other tabs.
Next, the function c()
stands for collection and returns a collection, or list. Make several collections, varying the kinds of elements it contains. What do you learn about the behavior?
3*4 - 5
[1] 7
<- c(1:25)
x ^2 x
[1] 1 4 9 16 25 36 49 64 81 100 121 144 169 196 225 256 289 324 361
[20] 400 441 484 529 576 625
<- c("phillips",1:5,"lebowski") last_names
Many data sets and much of the functionality of R exists as packages. The code below downloads the package tidyverse
from CRAN. See Section 1.4.3 of the text.
install.packages("tidyverse")
The step above merely downloads the package to your machine. It does not load it in R. To use the package you check it out from the library as follows.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.2 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 3.4.3 ✔ tibble 3.2.1
✔ lubridate 1.9.2 ✔ tidyr 1.3.0
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Visit section 1.4.4 and run the second “library” command. Can you explain what happened?
Data is everywhere - you’re encouraged to find data on your own. But to get started we’ll use some of the many datasets that are available as packages. This one is from a scientific study on penguins.
install.packages("palmerpenguins")
Remember, this just downloads the data. You only have to do this once. To use the data you need to load the package into R each session.
To examine what is in this package you can use help
as shown below.
help(package="palmerpenguins")
So now you see penguins
is a data.frame inside this package. You can explore it in a number of ways. You can type View(penguins)
to view the data in RStudio. You just type penguins
, or you can look a summary of the data with summary(penguins)
. To see just the beginning of penguins do
head(penguins)
# A tibble: 6 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
4 Adelie Torgersen NA NA NA NA
5 Adelie Torgersen 36.7 19.3 193 3450
6 Adelie Torgersen 39.3 20.6 190 3650
# ℹ 2 more variables: sex <fct>, year <int>
We have data about the species Adelie and its bill length 39.1 mm, among many other things. We’ll learn how to make all kinds of graphics from datasets like this. One easy thing we can do is compare male and female bill lengths as below.
Notice the <-
symbol is an assignment of the right side to the left.
<- c(1:5) a
The |>
is the pipe which sends the left side to the right side.
c(1:5) |> sum()
[1] 15
The usage of the pipe may seem weird at first, but it’s ubiquitous so get used to using |>.
The ggplot
function is one of the main plotting tools we’ll use. In the syntax of ggplot
you notice that its first argument is a data frame, but in the code below it only accepts the aes()
argument. This is because what precedes the pipe always goes into the first argument of what follows. We’ll learn this in detail later.
<- penguins[complete.cases(penguins),]
penguins_complete ggplot(penguins_complete,aes(x = body_mass_g,y = bill_length_mm, color = sex)) +
geom_point()
is equivalent to
<- penguins[complete.cases(penguins),]
penguins_complete |>
penguins_complete ggplot(aes(x = body_mass_g,y = bill_length_mm, color = sex)) +
geom_point()
Load a data set referenced in Chapter 1 and create some kind of plot from it. See https://jonpage.github.io/r-course/intro.html for inspiration. Note the syntax to refer to a specific variable penguins$bill_length_mm
.
Export the plot as an image and upload to your Samba Share folder.
.