STAC32 Assignment 6

You are expected to complete this assignment on your own: that is, you may discuss general ideas with others, but the writeup of the work must be entirely your own. If your assignment is unreasonably similar to that of another student, you can expect to be asked to explain yourself.

If you run into problems on this assignment, it is up to you to figure out what to do. The only exception is if it is impossible for you to complete this assignment, for example a data file cannot be read. (There is a difference between you not knowing how to do something, which you have to figure out, and something being impossible, which you are allowed to contact me about.)

You must hand in a rendered document that shows your code, the output that the code produces, and your answers to the questions. This should be a file with .html on the end of its name. There is no credit for handing in your unrendered document (ending in .qmd), because the grader cannot then see whether the code in it runs properly. After you have handed in your file, you should be able to see (in Attempts) what file you handed in, and you should make a habit of checking that you did indeed hand in what you intended to, and that it displays as you expect.

Hint: render your document frequently, and solve any problems as they come up, rather than trying to do so at the end (when you may be close to the due date). If your document will not successfully render, it is because of an error in your code that you will have to find and fix. The error message will tell you where the problem is, but it is up to you to sort out what the problem is.

1 Digestion in horses

Horses eat straw, and horse breeders are interested in what will make straw easier for the horses to eat. In an experiment with six horses the digestibility coefficient (in suitable units, with a higher value meaning that the horse finds the straw easier to digest) was measured twice for each horse: once after the horse had been fed straw treated with NaOH (sodium hydroxide) and once after the horse had been fed ordinary straw. The results are in The dataset contains columns identifying the horses, the digestibility coefficient when that horse ate ordinary straw, when it ate the sodium-hydroxide straw, and the difference between the two digestibility coefficients (naoh minus ordinary). The aim of the experiment was to identify any differences between the two treatments.

(a) (1 point) Read in and display the data.

There are only six rows, so you can display all of it this time:

my_url <- ""
horses <- read_csv(my_url)
Rows: 6 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (4): horse, ordinary, naoh, diff

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

You can check that the columns referred to in the question are respectively: horse, ordinary, naoh, and diff.

(b) (2 points) Why is this a matched pairs experiment? Explain briefly.

Each horse was fed both of the two types of straw, so that there are two measurements on each individual horse. (The best answer gets at “there are two observations for each individual” by making it clear that you know which are the individuals (horses) and which are the observations (the two digestibility coefficients for the two types of straw).)

(c) (3 points) Make a graph that can be used to assess the key assumption of a matched pairs t-test. Explain briefly how your graph shows no obvious problems.

The best graph is a normal quantile plot of the differences, since (sufficient) normality of the differences is what matters here. The column diff contains the differences, so you can use them directly (no need to calculate differences):

ggplot(horses, aes(sample = diff)) + stat_qq() + stat_qq_line()

There is no real deviation from the line; the points are all close to it, and thus the differences are acceptably normal in distribution. This is especially important here since the sample size (6) is so small. ’need to pass my class to get into their program!” The second-best graph is something like a histogram of the differences, but it’s hard to judge normality with so few observations:

ggplot(horses, aes(x = diff)) + geom_histogram(bins = 3)

The best you can say here is “it’s not obviously non-normal”.

A one-sample boxplot is likely to convey the same sort of message (or non-message):

ggplot(horses, aes(x = 1, y = diff)) + geom_boxplot()