STAC32 Assignment 1

You are expected to complete this assignment on your own: that is, you may discuss general ideas with others, but the writeup of the work must be entirely your own. If your assignment is unreasonably similar to that of another student, you can expect to be asked to explain yourself.

If you run into problems on this assignment, it is up to you to figure out what to do. The only exception is if it is impossible for you to complete this assignment, for example a data file cannot be read. (There is a difference between you not knowing how to do something, which you have to figure out, and something being impossible, which you are allowed to contact me about.)

You must hand in a rendered document that shows your code, the output that the code produces, and your answers to the questions. This should be a file with .html on the end of its name. There is no credit for handing in your unrendered document (ending in .qmd), because the grader cannot then see whether the code in it runs properly. After you have handed in your file, you should be able to see (in Attempts) what file you handed in, and you should make a habit of checking that you did indeed hand in what you intended to, and that it displays as you expect.

Hint: render your document frequently, and solve any problems as they come up, rather than trying to do so at the end (when you may be close to the due date). If your document will not successfully render, it is because of an error in your code that you will have to find and fix. The error message will tell you where the problem is, but it is up to you to sort out what the problem is.

1 Body Temperature of a Beaver

Beavers are large semi-aquatic rodents. They live in rivers and lakes, and build dams and lodges using tree branches, rocks, and mud. (Source: Wikipedia.) A study was carried out on beaver body temperature. A beaver’s body temperature was taken every 10 minutes, using a temperature-sensitive radio transmitter. At the same time, the location of the beaver was recorded, and was recorded in activ as no if it was inside the lodge (home), where activity was expected to be low, and yes if it was outside (activity expected to be higher). The data are in http://ritsokiguess.site/datafiles/beaver.csv, with the body temperature (in degrees Celsius) in the column temp.

(a) (3) Read the data directly from the data file into a dataframe, and display some of the dataframe. Hint: “display some of the dataframe” in this course means display 10 rows and as many columns as will fit.

If you don’t already have this at the top of your document, put it there:

library(tidyverse)

and then get to work.

I like to define the URL into a variable with a short name (URLs tend to be long), which you can do by copying and pasting it:

my_url <- "http://ritsokiguess.site/datafiles/beaver.csv"

and then read directly from this URL. This is a .csv file, as you can see from the URL, so use read_csv.1 After that, putting the name you gave the dataframe on a line by itself will display it according to the specifications:

beaver <- read_csv(my_url)
Rows: 100 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): activ
dbl (3): day, time, temp

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
beaver

I called my dataframe beaver. You can use whatever name you like, but it is better to use something that describes what the dataframe contains, so that you don’t get confused later.

To back up a bit: when you use read_csv or any other of the read_ things, you get a little report of what it read in: how many rows and columns, what the columns were separated by (in “Delimiter”: the data values in a .csv file like this one are separated by commas), what the columns are called, and what kind of thing they contain. Here activ is text, and the others are (possibly decimal) numbers.2

(b) (3) Make a suitable graph of the body temperatures (only).

The suitable graph of one quantitative variable is a histogram:

ggplot(beaver, aes(x = temp)) + geom_histogram(bins = 8)

Details:

  • in ggplot, the first thing is the name of your dataframe.

  • the next thing is aes for what to plot. A histogram requires an x, which here is the column temp.

  • geom_histogram. This is the “how to plot it”: we are making a histogram of the thing that was in x in the aes.

  • Use a number of bins that clearly displays the shape of the histogram. You must choose a number of bins because the default, 30, is always (for the kind of data we will see) too many. You might have to try several numbers of bins until you find one that gives a good picture of the shape. I think between about 7 and 12 is good.

To think about choosing the number of bins: there are 100 observations. A starting point for the number of bins is to think about the next higher power of 2, which is \(128 = 2^7\). That gives you \(7+1 = 8\) bins as a starting point. This rule is called Sturges’ rule, but it really only works for bell-shaped distributions. When you have something not bell-shaped (like ours), you might need more bins to get a good picture. This is why I said up to about 12 bins was good.

Extra: This is actually a bimodal shape (two peaks). You’ll see in a moment why that is.

(c) (3) The researchers believe that beavers have a higher body temperature on average when active than when they are not. Make a suitable plot to assess this belief.

One quantitative variable, body temperature, and one categorical, activity, so a boxplot:

ggplot(beaver, aes(x = activ, y = temp)) + geom_boxplot()

A boxplot requires an x and a y, with the x being the categorical variable (displayed on the \(x\) axis of the plot) and y being the quantitative one (on the \(y\) axis). There is nothing that you have to tweak with the boxplot (unlike a histogram’s bins).

Extra: ggplot’s boxplots go up and down. To make them go left and right, you can use coord_flip,3 which interchanges the roles of \(x\) and \(y\) on the graph:

ggplot(beaver, aes(x = activ, y = temp)) + 
  geom_boxplot() + coord_flip()