Questions are below. My solutions will be available after the tutorials are all finished. The whole point of these worksheets is for you to use your lecture notes to figure out what to do. In tutorial, the TAs are available to guide you if you get stuck. Once you have figured out how to do this worksheet, you will be prepared to tackle the assignment that depends on it.
If you are not able to finish in an hour, I encourage you to continue later with what you were unable to finish in tutorial.
Packages
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Fuel efficiency comparison
Some cars have on-board computers that calculate quantities related to the car’s performance. One of the things measured is the fuel efficiency, that is, how much gasoline the car uses. On an American car, this is measured in miles per (US) gallon. On one type of vehicle equipped with such a computer, the fuel efficiency was measured each time the gas tank was filled up, and the computer was then reset. Twenty observations were made, and are in http://ritsokiguess.site/datafiles/mpgcomparison.txt. The computer’s values are in the column Computer. The driver also calculated the fuel efficiency by hand, by noting the number of miles driven between fill-ups, and the number of gallons of gas required to fill the tank each time. The driver’s values are in Driver. The final column Diff is the difference between the computer’s value and the driver’s value for each fill-up. The data values are separated by tabs.
- Read in and display (some of) the data.
- What is it that makes this paired data? Explain briefly.
- Draw a suitable graph of these data, bearing in mind what you might want to learn from your graph.
- Is there any difference between the average results of the driver and the computer? (Average could be mean or median, whichever you think is best). Do an appropriate test.
- The people who designed the car’s computer are interested in whether the values calculated by the computer and by the driver on the same fill-up are usually close together. Explain briefly why it is that looking at the average (mean or median) difference is not enough. Describe what you would look at in addition, and how that would help.
Canned tuna
Light tuna is sold in 170-gram cans. The tuna can be canned in either water or oil. Is there a difference in the typical selling price of tuna canned in water or in oil? To find out, 25 supermarkets were sampled. In 14 of them (randomly chosen), the price of a (randomly chosen) brand of tuna in water was recorded, and in the other 11 supermarkets, the price of a (randomly chosen) brand of tuna in oil was recorded. The data are in http://ritsokiguess.site/datafiles/tuna.csv, with the column canned_in saying whether the tuna was canned in oil or water, and the price being in cents.
- Read in and display (some of) the data.
- Make a suitable graph of these data. (There are actually two graphs that would be suitable; see if you can make both.) Comment on the shape of the distributions of prices.
- Why might it be a bad idea to run a two-sample \(t\)-test on these data? Explain briefly.
- We just decided that running a \(t\)-test was a bad idea here. Run Mood’s median test for these data to determine whether the median selling price differs between tuna canned in water and in oil. What do you conclude, in the context of the data? Hint: use something from the
smmr package.
- Explain briefly why the counts of values above and below the overall median (in the previous part) are entirely consistent with the P-value that you found.
Neuropathy and pain relief
The data in http://ritsokiguess.site/datafiles/exactRankTests_neuropathy.csv are the results of a study of pain relief in diabetic patients. The patients were randomly assigned to a standard treatment (“control”) or to a new treatment (“treat”), and for each patient a pain score was recorded, with a higher score indicating better pain relief. There were 58 patients altogether, 30 in the control group and 28 in the treatment group.
- Read the data and display at least some of it.
- Draw a suitable graph of the two variables.
- Does your plot suggest that there will be a significant treatment effect, or not? Explain briefly.
- The researchers decided to run a Mood’s median test to compare the pain scores for the treatment and control groups. Why do you think they decided to do that, on the evidence you have so far?
- Run Mood’s median test on these data. What do you conclude?