Worksheet 99

Published

November 2, 2024

Opening blurb here.

Packages

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Motor Trend cars

In 1974, the Motor Trend magazine collected data on fuel consumption and other features of 32 different makes of car. The data are available in the built-in dataset mtcars. The variables of interest to us are:

  • mpg: fuel consumption in miles per US gallon
  • cyl: number of cylinders in the engine
  • wt: weight of car, in thousands of pounds.
  1. Make a suitable plot of fuel consumption against weight.

An ordinary scatterplot:

ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point() 

  1. Modify your plot to distinguish cars with different numbers of cylinders by colour.

Like this:

ggplot(mtcars, aes(x = wt, y = mpg, colour = cyl)) + geom_point() 

Making soap

A factory makes soap. There are two production lines, a and b. These can be run at different speeds; running the production line faster produces more soap, but it also produces more scrap (soap that cannot be sold). Does the amount of scrap differ by production line? Answer the questions below to find out. The data is in https://ritsokiguess.site/datafiles/soap.txt.

  1. Read in and display some of the data.
my_url <- "https://ritsokiguess.site/datafiles/soap.txt"
soap <- read_delim(my_url, " ")
Rows: 27 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: " "
chr (1): line
dbl (3): case, scrap, speed

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
soap
  1. Make a suitable plot of the scrap produced and the production line. How do the production lines compare?

A boxplot:

ggplot(soap, aes(x = line, y = scrap)) + geom_boxplot()

Not much difference between the production lines relative to the amount of variability present.

  1. Do you get a different story if you include speed in your plot?

A scatterplot this time:

ggplot(soap, aes(x = speed, y = scrap, colour = line)) +
  geom_point() + geom_smooth(method = "lm")
`geom_smooth()` using formula = 'y ~ x'

Once you allow for the speed at which the production line is run, line A produces more scrap than line B at any speed.