Questions are below. My solutions will be available after the tutorials are all finished. The whole point of these worksheets is for you to use your lecture notes to figure out what to do. In tutorial, the TAs are available to guide you if you get stuck. Once you have figured out how to do this worksheet, you will be prepared to tackle the assignment that depends on it.
If you are not able to finish in an hour, I encourage you to continue later with what you were unable to finish in tutorial.
Packages
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
The boiling point of water
The boiling point of water is commonly known to be 100 degrees C (212 degrees F). But the actual boiling point of water depends on atmospheric pressure; when the pressure is lower, the boiling point is also lower. For example, at higher altitudes, the atmospheric pressure is lower because the air is thinner, so that in Denver, Colorado, which is 1600 metres above sea level, water boils at around 95 degrees C. Source.
Some data were collected on the atmospheric pressure at seventeen locations (pressure
in the data file, in inches of mercury) and the boiling temperature of water at those locations (boiling
, in degrees F). This is (evidently) American data. The data are in http://ritsokiguess.site/datafiles/boiling-point.csv. Our aim is to predict boiling point from atmospheric pressure.
- Read in and display (some of) the data.
- Draw a suitable plot of these data.
- Comment briefly on your plot and any observations that appear not to belong to the pattern shown by the rest of the observations.
- Fit a suitable linear regression, and display the results. (Do this even if you think this is not appropriate.)
- Comment briefly on whether the slope makes sense, and on the overall fit of the model.
- Make two suitable plots than can be used to assess the appropriateness of this regression.
- When you looked at your scatterplot, you may have identified some observations that did not follow the pattern of the others. Describe briefly how these observations show up on the two plots you just drew.
- It turns out that the two observations with the lowest pressure are errors. Create a new dataframe with these observations removed, and repeat the regression. (You do not need to make the residual plots.)
- Compare the slope and the R-squared from this regression with the values you got in the first regression. Why is it sensible that the values differ in the way they do?