Worksheet 7

Published

October 18, 2024

Questions are below. My solutions will be available after the tutorials are all finished. The whole point of these worksheets is for you to use your lecture notes to figure out what to do. In tutorial, the TAs are available to guide you if you get stuck. Once you have figured out how to do this worksheet, you will be prepared to tackle the assignment that depends on it.

If you are not able to finish in an hour, I encourage you to continue later with what you were unable to finish in tutorial.

Note: questions 11 through 25 are about analysis of variance. Some of these questions may make more sense after Thursday’s lecture. You might like to leave those until then. I will publish all my solutions on Wednesday evening as usual. Be guided by Thursday’s lecture as to how much of this you will need to know for the midterm. This makes for a rather heavyweight worksheet, but I want to make sure you get some practice at everything you might need for the midterm. (This same material is on Assignment 6, which will not be due until after the midterm.)

Fuel efficiency comparison

Some cars have on-board computers that calculate quantities related to the car’s performance. One of the things measured is the fuel efficiency, that is, how much gasoline the car uses. On an American car, this is measured in miles per (US) gallon. On one type of vehicle equipped with such a computer, the fuel efficiency was measured each time the gas tank was filled up, and the computer was then reset. Twenty observations were made, and are in http://ritsokiguess.site/datafiles/mpgcomparison.txt. The computer’s values are in the column Computer. The driver also calculated the fuel efficiency by hand, by noting the number of miles driven between fill-ups, and the number of gallons of gas required to fill the tank each time. The driver’s values are in Driver. The final column Diff is the difference between the computer’s value and the driver’s value for each fill-up. The data values are separated by tabs.

Read in and display (some of) the data.

What is it that makes this paired data? Explain briefly.

Draw a suitable graph of these data, bearing in mind what you might want to learn from your graph.

Is there any difference between the average results of the driver and the computer? (Average could be mean or median, whichever you think is best). Do an appropriate test.

The people who designed the car’s computer are interested in whether the values calculated by the computer and by the driver on the same fill-up are usually close together. Explain briefly why it is that looking at the average (mean or median) difference is not enough. Describe what you would look at in addition, and how that would help.

Neuropathy and pain relief

The data in http://ritsokiguess.site/datafiles/exactRankTests_neuropathy.csv are the results of a study of pain relief in diabetic patients. The patients were randomly assigned to a standard treatment (“control”) or to a new treatment (“treat”), and for each patient a pain score was recorded, with a higher score indicating better pain relief. There were 58 patients altogether, 30 in the control group and 28 in the treatment group.

Read the data and display at least some of it.

Draw a suitable graph of the two variables.

Does your plot suggest that there will be a significant treatment effect, or not? Explain briefly.

The researchers decided to run a Mood’s median test to compare the pain scores for the treatment and control groups. Why do you think they decided to do that, on the evidence you have so far?

Run Mood’s median test on these data. What do you conclude?

Cuckoo eggs

The cuckoo is a European bird that lays its eggs in the nests of other birds (rather than building nests itself). The other bird, known as a “host”, raises and cares for the newly hatched cuckoo chick as if it was its own. Each cuckoo returns to the same territory year after year and lays its eggs in a nest of the same host species. Thus, cuckoos are actually several sub-species, each with a different host bird that it lays its eggs in the nests of. In a study, 120 cuckoo eggs were found in the nests of six other bird species: hedge sparrow, meadow pipit, pied wagtail, robin, tree pipit, and wren. These are birds of different sizes, so researchers were interested in whether the cuckoo eggs laid in the nests of different host birds differed in size as well. (For example, wrens are small birds, so you might be interested in whether cuckoo eggs laid in wren nests are smaller than cuckoo eggs laid in the nests of other birds. If this is the case, the cuckoo eggs will look less different from the wren eggs in the same nest.)

The data are in http://ritsokiguess.site/datafiles/cuckoo.txt.

Read in and display (some of) the data. Note that some of the host bird names are misspelled. (You do not need to correct the misspellings.)

Bearing in mind that we will be interested in running some kind of ANOVA shortly, explain briefly why a normal quantile plot, for each host species separately, will be useful.

Draw a suitable normal quantile plot. Based on what you see, what would you recommend as a suitable test to compare the egg lengths in the nests of the different host species? Explain briefly.

Run an (ordinary) analysis of variance, including any follow-up if warranted. What do you conclude, in the context of the data? (Run this analysis even if you don’t think it’s the best thing to do.)

Run a Mood’s median test, and, if appropriate, follow-up tests. What do you now conclude, in the context of the data?

Compare all your significant results from the previous two parts. Are the results substantially different? Explain briefly.

Cars in 1993

The dataset in http://ritsokiguess.site/datafiles/Cars93.csv contains a lot of information about 93 different vehicles that were available in 1993. Each vehicle is classified as one of six Types, and for each vehicle, the gas mileage in miles per US gallon (of gas consumed) in city driving is recorded in MPG.city. Our aim in this question is to compare the city gas mileage of vehicles of different types, assuming that the ones in our dataset are (something like) a random sample of all vehicles available in North America in 1993.

Read in and display (some of) the data.

Make a graph that will enable you to compare the distributions of city gas mileage for vehicles of different types.

What can you say about the spread of distribution of gas mileage for vehicles where the median is large (compared to when it is small)?

It is suggested that we should use a “reciprocal transformation” of gas mileage: that is, instead of using gas mileage itself in our analysis, we should use one divided by gas mileage instead. Explain briefly how this is also a sensible measure of gas consumption. (Hint: think about the units.)

Create and save a new column in your dataframe that contains the reciprocal of the gas mileage. Re-draw your plot from earlier to use your new column instead of the original MPG.city.

Explain why it is now at least somewhat defensible to run an ordinary ANOVA.

Run a suitable analysis of variance, along with any appropriate followup. (You don’t need to interpret the followup yet.)

Is there any type of vehicle that has significantly better or significantly worse fuel consumption than all of the other vehicle types?

Run Mood’s median test on the original data (no assumption of normality or equal spreads needed), along with any appropriate followup. Compare the results with your ANOVA in the previous two questions.