Worksheet 6

Published

October 12, 2025

Questions are below. My solutions will be available after the tutorials are all finished. The whole point of these worksheets is for you to use your lecture notes to figure out what to do. In tutorial, the TAs are available to guide you if you get stuck. Once you have figured out how to do this worksheet, you will be prepared to tackle the assignment that depends on it.

If you are not able to finish in an hour, I encourage you to continue later with what you were unable to finish in tutorial.

Packages

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(smmr)

Two-sample power

Suppose we have two populations, which we are supposing to both be normally distributed. The first population has a mean of 20, and the second population has a mean of 25. Both populations have the same SD of 9.

  1. Suppose we take samples of 30 observations from each population. Use power.t.test to find the probability that a two-sample \(t\)-test will (correctly) reject the null hypothesis that the two populations have the same mean, in favour of a one-sided alternative. (Hint: delta should be positive.)
  1. Find the sample size needed to obtain a power of 0.75. Comment briefly on whether your sample size makes sense.
  1. Reproduce your power result from Question 1 by simulation. Some things to consider:
  • you will need to generate two columns of random samples, one from each population
  • t.test can also run a two-sample \(t\)-test by giving the two columns separately, rather than as we have done it before by having a column with all the measurements and a separate column saying which group they came from.
  • you will need to get the right alternative. With two columns input like this, the alternative is relative to the column you give first.
  1. Give an example of a situation where the simulation approach could be used and power.t.test not.

The thickness of stamps

Collectors of postage stamps know that the same stamp may be made from several different batches of paper of different thicknesses. Our data set, in http://ritsokiguess.site/datafiles/stamp.csv, contains the thickness, in millimetres, of each of 485 stamps that were printed in 1872. It is suspected that the paper used in that year was thinner than in previous years.

  1. Read in and display (some of) the data.
  1. Make a suitable graph of these data. Justify your choice briefly.
  1. From your graph, why do think it might be a good idea to do a sign test rather than a one-sample \(t\)-test on these data? Explain briefly.
  1. The median thickness in years prior to 1872 was 0.081 mm. Is there evidence that the paper on which stamps were printed in 1872 is thinner than in previous years? Explain briefly.

Canned tuna

Light tuna is sold in 170-gram cans. The tuna can be canned in either water or oil. Is there a difference in the typical selling price of tuna canned in water or in oil? To find out, 25 supermarkets were sampled. In 14 of them (randomly chosen), the price of a (randomly chosen) brand of tuna in water was recorded, and in the other 11 supermarkets, the price of a (randomly chosen) brand of tuna in oil was recorded. The data are in http://ritsokiguess.site/datafiles/tuna.csv, with the column canned_in saying whether the tuna was canned in oil or water, and the price being in cents.

  1. Read in and display (some of) the data.
  1. Make a graph of just the prices (that is to say, not including what each can of tuna was canned in).
  1. Make a normal quantile plot of the prices (again, ignoring what the tuna was canned in).
  1. What does your normal quantile plot tell you about the distribution of prices? Is that consistent with your first plot? (I would add “explain briefly” if this were for marks, but you should probably think about how your explanation would look in any case.)
  1. Some of the tuna was canned in oil and some in water, which may make a difference to the distribution of price. Make normal quantile plots of price for the tuna packed in oil and packed in water separately, with one ggplot command.
  1. Why might it be a bad idea to run a two-sample \(t\)-test on these data? Explain briefly.