Class 4: Data visualization
Slides
The slides are available online as an HTML file. You can also download them in a static PDF (for printing or storing for later). You can also click in the slides below and navigate through them with your left and right arrow keys.
Readings
- Chapter 1 in Wickham, Çetinkaya-Rundel, and Grolemund (2023)
- This short primer by Andrew Heiss on on data visualization basics
Assignment
Before starting this exercise, make sure you complete the short primer mentioned in the Readings section.
For this exercise you’ll practice grouping, summarizing, and plotting data using the counts of words spoken in the Lord of the Rings trilogy across movie, sex, and fictional species.
As always, make a new or use an existing R Studio project for your assignment.
You’ll need to download these CSV files and put preferably put them in a folder named
data
in your project folder:
Read in the separate data files. Make sure you have the
tidyverse
package loaded.Use the
bind_rows
function to merge the three data sets into a single data set. We haven’t seen this function yet, look it up. Call the new merged data framelotr
(for “lord of the rings”).We later want to plot gender differences. Have a look at the data. Why is it not yet in a tidy format? Explain. Then use
pivot_longer
to reshape the data frame by adding two new variables,Gender
andWords
, to the data frame.Does a certain gender dominate a movie? (Hint: Make a new summary data frame for which you group by
Gender
and then count sum the words.)Graph your summarized data. (Hint: use
geom_col
and theWords
andGender
variables.)You’ve just plotted the averages across films. (Hint: Make a new summary data frame for which you group by both
Gender
andFilm
and then count sum the words.)Try to make a new plot in which you differentiate between the different films (Hint: use faceting by
Gender
orFilm
).How about species? Does the dominant species differ on average (don’t differentiate between the three movies here)? (Hint: Proceed just as for
Gender
in the beginning: make a new summary data frame for which you group bySpecies
and then count sum the words.)Create a plot that visualizes the number of words spoken by species, gender, and film simultaneously. Use the complete tidy
lotr
data frame. You don’t need to create a new summarized dataset (withgroup_by(Species, Gender, Film)
) because the original data already has a row for each of those (you could make a summarized dataset, but it would be identical to the full version). You need to showSpecies
,Gender
, andFilm
at the same time, but you only have two possible aesthetics (x
andfill
), so you’ll also need to facet by the third. Play around with different combinations (e.g. tryx = Species
, thenx = Film
) until you find one that tells the clearest story. For fun, add alabs()
layer to add a title and subtitle and caption.