Worksheet 11

Published

November 17, 2023

Questions are below. My solutions are below all the question parts for a question; scroll down if you get stuck. There is extra discussion below that for some of the questions; you might find that interesting to read, maybe after tutorial.

For these worksheets, you will learn the most by spending a few minutes thinking about how you would answer each question before you look at my solution. There are no grades attached to these worksheets, so feel free to guess: it makes no difference at all how wrong your initial guess is!

1 The boiling point of water

The boiling point of water is commonly known to be 100 degrees C (212 degrees F). But the actual boiling point of water depends on atmospheric pressure; when the pressure is lower, the boiling point is also lower. For example, at higher altitudes, the atmospheric pressure is lower because the air is thinner, so that in Denver, Colorado, which is 1600 metres above sea level, water boils at around 95 degrees C. Source.

Some data were collected on the atmospheric pressure at seventeen locations (pressure in the data file, in inches of mercury) and the boiling temperature of water at those locations (boiling, in degrees F). This is (evidently) American data. The data are in http://ritsokiguess.site/datafiles/boiling-point.csv. Our aim is to predict boiling point from atmospheric pressure.

There are rather a lot of parts here. Questions like this tend to have rather a lot to explore. I wouldn’t necessarily put all these on an assignment, but it is certainly worth your while to work through these now, in case something like them comes up again later.

  1. Read in and display (some of) the data.

  2. Draw a suitable plot of these data.

  3. Comment briefly on your plot and any observations that appear not to belong to the pattern shown by the rest of the observations.

  4. Fit a suitable linear regression, and display the results. (Do this even if you think this is not appropriate.)

  5. Comment briefly on whether the slope makes sense, and on the overall fit of the model.

  6. Make two suitable plots than can be used to assess the appropriateness of this regression.

  7. When you looked at your scatterplot, you may have identified some observations that did not follow the pattern of the others. Describe briefly how these observations show up on the two plots you just drew.

  8. It turns out that the two observations with the lowest pressure are errors. Create a new dataframe with these observations removed, and repeat the regression. (You do not need to make the residual plots.)

  9. Compare the slope and the R-squared from this regression with the values you got in the first regression. Why is it sensible that the values differ in the way they do?

The boiling point of water - my solutions

The boiling point of water is commonly known to be 100 degrees C (212 degrees F). But the actual boiling point of water depends on atmospheric pressure; when the pressure is lower, the boiling point is also lower. For example, at higher altitudes, the atmospheric pressure is lower because the air is thinner, so that in Denver, Colorado, which is 1600 metres above sea level, water boils at around 95 degrees C. Source.

Some data were collected on the atmospheric pressure at seventeen locations (pressure in the data file, in inches of mercury) and the boiling temperature of water at those locations (boiling, in degrees F). This is (evidently) American data. The data are in http://ritsokiguess.site/datafiles/boiling-point.csv. Our aim is to predict boiling point from atmospheric pressure.

  1. Read in and display (some of) the data.

Solution

This is a .csv file, so no great difficulty:

my_url <- "http://ritsokiguess.site/datafiles/boiling-point.csv"
boiling_point <- read_csv(my_url)
Rows: 17 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (2): boiling, pressure

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
boiling_point

There are correctly 17 rows of data with two columns boiling and pressure as promised.

\(\blacksquare\)

  1. Draw a suitable plot of these data.

Solution

There are two quantitative variables, so a scatterplot is the thing. We are trying to predict boiling point, so that goes on the \(y\)-axis:

ggplot(boiling_point, aes(x = pressure, y = boiling)) + geom_point() + geom_smooth()
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'