STAC32 Assignment 8

You are expected to complete this assignment on your own: that is, you may discuss general ideas with others, but the writeup of the work must be entirely your own. If your assignment is unreasonably similar to that of another student, you can expect to be asked to explain yourself.

If you run into problems on this assignment, it is up to you to figure out what to do. The only exception is if it is impossible for you to complete this assignment, for example a data file cannot be read. (There is a difference between you not knowing how to do something, which you have to figure out, and something being impossible, which you are allowed to contact me about.)

You must hand in a rendered document that shows your code, the output that the code produces, and your answers to the questions. This should be a file with .html on the end of its name. There is no credit for handing in your unrendered document (ending in .qmd), because the grader cannot then see whether the code in it runs properly. After you have handed in your file, you should be able to see (in Attempts) what file you handed in, and you should make a habit of checking that you did indeed hand in what you intended to, and that it displays as you expect.

Hint: render your document frequently, and solve any problems as they come up, rather than trying to do so at the end (when you may be close to the due date). If your document will not successfully render, it is because of an error in your code that you will have to find and fix. The error message will tell you where the problem is, but it is up to you to sort out what the problem is.

1 Bread

What makes bread rise? Specifically what are the effects of baking temperature and the amount of yeast on how much a loaf of bread will rise while baking? To find out, a batch of a certain bread mix was divided into 48 parts. Each part had a randomly chosen amount of yeast added (0.75, 1, or 1.25 teaspoons) and was then baked at a temperature of either 350 or 425 (degrees Fahrenheit). After baking, the height of each (very small) loaf of bread was measured (in inches). Apart from the yeast and the baking temperature, the ingredients for each small loaf were identical, so any differences in height can be attributed to one or both of the amount of yeast used and the baking temperature.

The data are in http://ritsokiguess.site/datafiles/bread_wide.csv.

(a) (1 point) Read in and display (most of) the data.

As usual:

my_url <- "http://ritsokiguess.site/datafiles/bread_wide.csv"
bread <- read_csv(my_url)
Rows: 8 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (7): row, yeast0.75_temp350, yeast0.75_temp425, yeast1_temp350, yeast1_t...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
bread

There are only 8 rows, so you should see all of them and most of the columns. The column names are rather long, so you may not see all the columns.

(b) (3 points) The data as you read in the values were stored in a spreadsheet with (originally) two rows of headers, one showing the amount of yeast and the other the baking temperature. (The headers were combined into one row for you.) Rearrange the data so that there is one column of heights, and columns showing the amount of yeast and the temperature that goes with each height. For maximum points, do your rearrangement with one command. Save your resulting dataframe.

This is one of the variations of pivot_longer, in particular the one with two names_to, because you want a column of yeast amounts and a column of temperatures. The two parts of the (current) column names are separated by an underscore, so:

bread %>% pivot_longer(-row, names_to = c("yeast", "temperature"),
                    names_sep = "_",
                    values_to = "height") -> bread_long
bread_long

Three points for that. -row is the best way to specify which columns to pivot-longer (“everything except for row); the other column names are rather long, but using starts_with("yeast") or similar is also reasonable.

If you didn’t manage that, do it in two steps: an ordinary pivot-longer:

bread %>% pivot_longer(-row, names_to = "yt", values_to = "height")

(you may have trouble coming up with a name for the column I called yt), and then use separate_wider_delim:

bread %>% pivot_longer(-row, names_to = "yt", values_to = "height") %>% 
  separate_wider_delim(yt, "_", names = c("yeast", "temperature"))

Two points for doing it this way. The last code chunk suffices for your answer, doing it this way. While you’re figuring out what to do, you should probably do the pivot-longer and then see what to do next, but it’s fine to hand in just the code chunk with the two commands in it.

(c) (2 points) Make a suitable graph of the three columns (not including row) in your final dataframe.

These columns are yeast, temperature, and height (as I called them; you can use your own names), two categorical and one quantitative, so a grouped boxplot is called for. To make that, choose one of your categorical variables to be x and the other is fill (or colour); the quantitative variable is y:

ggplot(bread_long, aes(x = yeast, y = height, fill = temperature)) + geom_boxplot()

There are three different values of yeast and only two of temperature, so I put yeast on the \(x\)-axis. It seems better to me to make the best use of the “real-estate” on the \(x\)-axis: there is lots of room for categories there, but having more than a few colours is difficult to sort out.

Having said that, for this question (with only three and two categories), I have no objection if you have three colours:

ggplot(bread_long, aes(fill = yeast, y = height, x = temperature)) + geom_boxplot()