Questions are below. My solutions are below all the question parts for a question; scroll down if you get stuck. There is extra discussion below that for some of the questions; you might find that interesting to read, maybe after tutorial.
For these worksheets, you will learn the most by spending a few minutes thinking about how you would answer each question before you look at my solution. There are no grades attached to these worksheets, so feel free to guess: it makes no difference at all how wrong your initial guess is!
1 Cuckoo eggs
The cuckoo is a European bird that lays its eggs in the nests of other birds (rather than building nests itself). The other bird, known as a “host”, raises and cares for the newly hatched cuckoo chick as if it was its own. Each cuckoo returns to the same territory year after year and lays its eggs in a nest of the same host species. Thus, cuckoos are actually several sub-species, each with a different host bird that it lays its eggs in the nests of. In a study, 120 cuckoo eggs were found in the nests of six other bird species: hedge sparrow, meadow pipit, pied wagtail, robin, tree pipit, and wren. These are birds of different sizes, so researchers were interested in whether the cuckoo eggs laid in the nests of different host birds differed in size as well. (For example, wrens are small birds, so you might be interested in whether cuckoo eggs laid in wren nests are smaller than cuckoo eggs laid in the nests of other birds. If this is the case, the cuckoo eggs will look less different from the wren eggs in the same nest.)
Read in and display (some of) the data. Note that some of the host bird names are misspelled. (You do not need to correct the misspellings.)
Bearing in mind that we will be interested in running some kind of ANOVA shortly, explain briefly why a normal quantile plot, for each host species separately, will be useful.
Draw a suitable normal quantile plot. Based on what you see, what would you recommend as a suitable test to compare the egg lengths in the nests of the different host species? Explain briefly.
Run an (ordinary) analysis of variance, including any follow-up if warranted. What do you conclude, in the context of the data? (Run this analysis even if you don’t think it’s the best thing to do.)
Run a Mood’s median test, and, if appropriate, follow-up tests. What do you now conclude, in the context of the data?
Compare all your significant results from the previous two parts. Are the results substantially different? Explain briefly.
Cuckoo eggs: my solutions
The cuckoo is a European bird that lays its eggs in the nests of other birds (rather than building nests itself). The other bird, known as a “host”, raises and cares for the newly hatched cuckoo chick as if it was its own. Each cuckoo returns to the same territory year after year and lays its eggs in a nest of the same host species. Thus, cuckoos are actually several sub-species, each with a different host bird that it lays its eggs in the nests of. In a study, 120 cuckoo eggs were found in the nests of six other bird species: hedge sparrow, meadow pipit, pied wagtail, robin, tree pipit, and wren. These are birds of different sizes, so researchers were interested in whether the cuckoo eggs laid in the nests of different host birds differed in size as well. (For example, wrens are small birds, so you might be interested in whether cuckoo eggs laid in wren nests are smaller than cuckoo eggs laid in the nests of other birds. If this is the case, the cuckoo eggs will look less different from the wren eggs in the same nest.)
Rows: 120 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: " "
chr (1): bird_species
dbl (1): egg_length
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
eggs
There are indeed 120 eggs, with lengths (evidently measured in millimetres) and with the two “pipets” misspelled. The column called bird_species is the host bird; the eggs themselves are all cuckoo eggs.
\(\blacksquare\)
Bearing in mind that we will be interested in running some kind of ANOVA shortly, explain briefly why a normal quantile plot, for each host species separately, will be useful.
Solution
The major assumption of ANOVA is that the observations within each group are (approximately) normal in shape. Since we are specifically interested in normality, rather than shape generally, a normal quantile plot will be better than a boxplot.
\(\blacksquare\)
Draw a suitable normal quantile plot. Based on what you see, what would you recommend as a suitable test to compare the egg lengths in the nests of the different host species? Explain briefly.
Solution
Do the normal quantile plot facetted by groups (bird_species), since we are interested in normality within each group, not of all the observations taken together:
We are looking for all the distributions to be roughly normal. Commentary (which you don’t need to write but you certainly need to think):
the Hedge Sparrow distribution has two or three mild outliers at the bottom. (This, to me, is better than calling it “long tails” because the high observation is (i) only one and (ii) not really too high.)
the Meadow Pipit has long tails (I say this because the pattern seems to be a feature of the whole distribution, rather than of a few observations. Having said that, those four lowest values do seem to be noticeably lower than the rest, so you can sell the idea of those four lowest values being outliers).
the Pied Wagtail is if anything short-tailed, so this is not a problem. You could also call this acceptably normal.
I cannot see any problems at all with the Robin or the Wren.
The Tree Pipit distribution is, as you see it, either acceptably normal, or very slightly skewed to the left (is that a curved shape?)
The second part of the thought process is sample size, because the larger the sample size within a group, the less picky you have to be about normality. You can guess this by looking at how many dots there appear to be on the normal quantile plots, or you can get the exact sample sizes (better):