Worksheet 4
Questions are below. My solutions will be available after the tutorials are all finished. The whole point of these worksheets is for you to use your lecture notes to figure out what to do. In tutorial, the TAs are available to guide you if you get stuck. Once you have figured out how to do this worksheet, you will be prepared to tackle the assignment that depends on it.
If you are not able to finish in an hour, I encourage you to continue later with what you were unable to finish in tutorial.
Prison stress
Being in prison is stressful, for anybody. 26 prisoners took part in a study where their stress was measured at the start and end of the study. Some of the prisoners, chosen at random, completed a physical training program (for these prisoners, the Group
column is Sport
) and some did not (Group
is Control
). The researchers’ main aim was to see whether the physical training program reduced stress on average in the population of prisoners. The data are in http://www.ritsokiguess.site/datafiles/PrisonStress.csv, in four columns, respectively an identifier for the prisoner, whether or not they did physical training, their stress score at the start of the study, and their stress score at the end.
- Read in and display (some of) the data.
- Make a suitable graph of the stress scores at the end of the study and whether or not each prisoner was in the Sport group.
- Run the most appropriate \(t\)-test to compare the stress scores at the end of the study for the two groups of prisoners. Bear in mind what the researchers are trying to show. What do you conclude from your test, in the context of the data?
- Make a suitable plot of the stress measurements before the study for each group of prisoners. How, if at all, does that impact the conclusion you drew in the previous part? Explain briefly.
- Going back to your plot of the second part of this question (the boxplot of after scores against group), why might you be concerned about the Control group of prisoners for your \(t\)-test? Explain briefly (two reasons).
- Obtain a bootstrap sampling distribution of the sample mean for the
PSSafter
values in the Control group. From this distribution, do you think your \(t\)-test is reasonable? Explain briefly. (You may assume that we are happy with the distribution ofPSSafter
values in the Sport group.)
Home prices
A realtor kept track of the asking prices of 37 homes for sale in West Lafayette, Indiana, in a particular year. The asking prices are in http://ritsokiguess.site/datafiles/homes.csv. There are two columns, the asking price (in $) and the number of bedrooms that home has (either 3 or 4, in this dataset). The realtor was interested in whether the mean asking price for 4-bedroom homes was bigger than for 3-bedroom homes.
- Read in and display (some of) the data.
- Draw a suitable graph of these data. Hint: if you do the obvious thing, you’ll get a graph that makes no sense. What happened, and how can you fix it up? The warning you might get on your graph will give you a hint.
- Comment briefly on your plot. Does it suggest an answer to the realtor’s question? Do you have any doubts about the appropriateness of a \(t\)-test in this situation? Explain briefly. (Hint: your plot should have two groups. If it only has one, make sure you have asked a TA for help to get the right graph.)
- Sometimes prices work better on a log scale. This is because percent changes in prices are often of more interest than absolute dollar-value changes. Re-draw your plot using logs of asking prices. (In R,
log()
takes natural (base \(e\)) logs, which are fine here.) Do you like the shapes of the distributions better? Hint: you have a couple of options. One is to use thelog
right in your plotting (or, later, testing) functions. Another is to define a new column containing the log-prices and work with that.
- Run a suitable \(t\)-test to compare the log-prices. What do you conclude? Hint: as for the graph in the previous part, you can use
log
directly int.test
, or use the new columns with the log-prices in them (if you did that).