Worksheet 10

Published

November 13, 2024

Questions are below. My solutions will be available after the tutorials are all finished. The whole point of these worksheets is for you to use your lecture notes to figure out what to do. In tutorial, the TAs are available to guide you if you get stuck. Once you have figured out how to do this worksheet, you will be prepared to tackle the assignment that depends on it.

If you are not able to finish in an hour, I encourage you to continue later with what you were unable to finish in tutorial.

Packages

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(broom)
library(MASS, exclude = "select")

Thermal spray coatings

A coating is sprayed onto stainless steel, and the strength of the bond between the coating and the stainless steel is measured (in megapascals). Five different thicknesses of coating are used (measured in micrometres), and an engineer is interested in the relationship between the thickness of the coating and its strength. Some data are in http://ritsokiguess.site/datafiles/coatings.csv.

  1. Read in and display (some of) the data.
  1. Draw a suitable graph that illustrates how the bond thickness influences the strength.
  1. Comment briefly on the kind of relationship you see here, if any.
  1. Fit a straight-line regression and display the results. (You will have an opportunity to criticize it shortly.)
  1. By making a suitable plot, demonstrate that the relationship is actually curved rather than linear.
  1. Add a squared term in thickness to your regression, and display the output.
  1. The Estimate for thickness-squared is very small in size. Why, nonetheless, was it definitely useful to add that squared term?
  1. Is the plot of residuals vs fitted values better from your second regression than it was from the first one? Draw it, and explain briefly.

Houses in Duke Forest, North Carolina

The data in http://ritsokiguess.site/datafiles/duke_forest.csv are of houses that were sold around November 2020 in the Duke Forest area of Durham, North Carolina. For each house, the selling price (in US $), called price, was recorded, along with some other features of the house:

  • bed: the number of bedrooms
  • bath: the number of bathrooms
  • area: the area of the inside of the house, in square feet
  • year_built: the year the house was originally built

Our aim is to predict the selling price of a house from its other features. There are 97 houses in the data set.

Note: this is rather long, but I wanted to give you a chance to practice everything.

  1. Read in and display (some of) the data.
  1. Make a graph of selling price against each of the explanatory variables, using one ggplot line.
  1. Comment briefly on your plots.
  1. Fit a regression predicting price from the other variables, and display the results.
  1. What is the meaning of the number in the bath row in the Estimate column?
  1. Plot the residuals from your regression against the fitted values. What evidence is there that a transformation of the selling prices might be a good idea? (Hint: look at the right side of your graph.)
  1. Run Box-Cox. What transformation of price is suggested, if any?
  1. Rerun your regression with a suitably transformed response variable, and display the results.
  1. Confirm that the plot of residuals against fitted values now looks better.
  1. Build a better model by removing any explanatory variables that play no role, one at a time.
  1. If you want to, make a full set of residual plots for your final model (residuals vs fitted values, normal quantile plot of residuals, residuals vs all the explanatory) and convince yourself that all is now at least reasonably good. (I allow for the possibility that you are now bored with this and would like to move on to something else, but I had already done these, so…)