Worksheet 2
Questions are below. My solutions will be available after the tutorials are all finished. The whole point of these worksheets is for you to use your lecture notes to figure out what to do. In tutorial, the TAs are available to guide you if you get stuck. Once you have figured out how to do this worksheet, you will be prepared to tackle the assignment that depends on it.
There is probably too much here for you to finish in an hour, but I encourage you to continue later with what you were unable to finish in tutorial.
The Boat Race
Each year, the Oxford and Cambridge University rowing teams race against each other on the River Thames in London (England). For the 1992 race, the weights (in pounds) of the participants on each team were recorded, and can be found in https://ritsokiguess.site/datafiles/boat_race.txt.
- Take a look at the data file, and describe how the data values are separated one from the next.
- Read the file into a dataframe and display at least some of it.
- Make a suitable graph of your data.
- Would you say, based on your plot, that the average or typical weights of the rowers on the two teams are similar or different? Explain briefly.
- Each rowing team consists of eight rowers plus a cox, whose job is to keep the rowers in tempo. The cox does not row themselves. Which of the nine individuals in each team do you think is the cox? Explain briefly.
Intensive Care Unit patients
The Intensive Care Unit (ICU) at a hospital is where incoming patients that need the most urgent treatment are admitted. When a patient is admitted, a large number of measurements are taken, to help the ICU doctor decide on an appropriate treatment. The variables of interest to us here are these two (there are actually many others, as you will see):
sta
: vital status (0 = lived, 1 = died)typ
: type of admission (0 = elective, 1 = emergency)
The data for 200 patients were in http://www.medicine.mcgill.ca/epidemiology/Joseph/courses/EPIB-621/icudat.txt.
- In your web browser, take a look at the data, and describe how the data are laid out.
- Read in and display (some of) the data.
- Make a suitable graph of the two variables of interest. Make sure you consider what type of variable these are (which might not be the same as how they are recorded). (Hint: you may need to use
factor(typ)
orfactor(sta)
, or both, inside youraes()
.)
- What do you learn from your graph, in the context of the data?
Hummingbirds and flowers
The tropical flower Heliconia is fertilized by hummingbirds, a different species for each variety of Heliconia. Over time, the lengths of the flowers and the form of the hummingbirds’ beaks have evolved to match each other. The length of the Heliconia flower is therefore an important measurement. Does it differ among varieties?
The data set at http://ritsokiguess.site/datafiles/heliconia_long.csv contains the lengths (in millimetres) of samples of flowers from each of three varieties of Heliconia: bihai, caribaea red, and caribaea yellow.
- Read in and display (some of) the dataset.
- Why would a boxplot be a suitable graph for these data? Explain briefly.
- Draw a boxplot1 of these data.
- What do you learn from your boxplot? Explain briefly.
- An alternative graph in this situation is a set of three histograms next to each other. Draw this graph. Consider your aims in drawing the graph; you will need to think about what “next to each other” most usefully means for you. Hint: problem 7.9(c) in PASIAS.
Footnotes
It’s up to you whether you call this one boxplot, or one graph containing three side-by-side boxplots. Think about what makes more sense to you, and what will make more sense to your reader.↩︎