Worksheet 9

Published

November 6, 2025

Questions are below. My solutions will be available after the tutorials are all finished. The whole point of these worksheets is for you to use your lecture notes to figure out what to do. In tutorial, the TAs are available to guide you if you get stuck. Once you have figured out how to do this worksheet, you will be prepared to tackle the assignment that depends on it.

If you are not able to finish in an hour, I encourage you to continue later with what you were unable to finish in tutorial.

Packages

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   4.0.0     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Chick weights

An experiment was carried out of the effect of diet on the early growth of chicks. 50 chicks were randomly allocated to one of four different diets, and at various times after each chick’s birth, their weight was measured (in grams). The data in http://ritsokiguess.site/datafiles/chick_weights.csv contain identifiers for the chick and for the diet that the chick was on (a chick was on the same diet all the way through the experiment). There are a lot of columns: the columns with names like weight_1 are the weight of the chick in that row at the time point shown (time point 1 in this case), and the columns with names like Time_2 are the time, in days since birth, that the time point shown corresponds to (time point 2 in this case).

  1. Read in and display (some of) the data.
  1. Rearrange the data to have a column containing all the weights and a column containing the times (in days since birth) that those weights were measured, identified by the Chick and the Diet the chick was on. Save your new dataframe.
  1. Make a spaghetti plot: that is to say, plot weight against time for each chick, joining the points for the same chick by lines, and colouring the points and lines by diet. Hint: follow the model in the lecture notes.

American Community Survey

The American Community Survey is a huge sample survey that addresses many aspects of American communities. The data in http://ritsokiguess.site/datafiles/acs4.txt, in aligned columns, contain estimates of the total housed population (that own or rent a place to live), the total number of renters, and the median rent, in two US states. The column called error contains standard errors of the estimates (obtained using methods like the ones in STAC53). The states are identified by name and number, the latter in the column geoid.

  1. Read in and display the data.
  1. Create columns containing the values in estimate for each of the three items in variable. (That is to say, you should get three new columns; the names of those new columns are the items in variable.) This first attempt will probably give you six rows and some missing values, which is fine for the moment (we discuss why in the next part).
  1. Explain briefly why your output in the previous part came out as it did.
  1. Using techniques learned in this course and your insight from the previous part, arrange the data to have three columns of estimate values whose names are the three items in variable, and only two rows, one for each state.