STAC32 Assignment 1
You are expected to complete this assignment on your own: that is, you may discuss general ideas with others, but the writeup of the work must be entirely your own. If your assignment is unreasonably similar to that of another student, you can expect to be asked to explain yourself.
If you run into problems on this assignment, it is up to you to figure out what to do. The only exception is if it is impossible for you to complete this assignment, for example a data file cannot be read. (There is a difference between you not knowing how to do something, which you have to figure out, and something being impossible, which you are allowed to contact me about.)
You must hand in a rendered document that shows your code, the output that the code produces, and your answers to the questions. This should be a file with .html
on the end of its name. There is no credit for handing in your unrendered document (ending in .qmd
), because the grader cannot then see whether the code in it runs properly. After you have handed in your file, you should be able to see (in Attempts) what file you handed in, and you should make a habit of checking that you did indeed hand in what you intended to, and that it displays as you expect.
Hint: render your document frequently, and solve any problems as they come up, rather than trying to do so at the end (when you may be close to the due date). If your document will not successfully render, it is because of an error in your code that you will have to find and fix. The error message will tell you where the problem is, but it is up to you to sort out what the problem is.
Walking the dogs
The author of a statistics textbook owns dogs, and likes to walk (sometimes with the dogs, sometimes without). The author carries a simple phone app that measures the following quantities, except for Walk
that was added by the author:
StepCount
: Number of steps taken in the dayKcal
: Calories burned (according to pedometer)Miles
: Total distance walked (in miles)Weather
: classified as cold, rain, or shine (sunny).Day
: Day of week (as you would guess, except R=Thursday, U=Sunday)Walk
: Were the dogs walked? (1=yes or 0=no)Steps
: Steps in units of 1,000 (so StepCount/1000)
The data are in http://ritsokiguess.site/datafiles/Stat2Data_WalkTheDogs.csv.
- (3 points) Read (into R) the datafile and display at least some of the dataframe. (Here, and elsewhere in the course, “display at least some of the dataframe” means to display at least 10 rows and as many columns as will fit in your display.)
- (3 points) Make a graph that shows how many times each type of weather was observed. Which type of weather was most common? Explain briefly.
- (2 points) Make a graph that shows the distribution of distance walked each day.
- (3 points) Does the person walk further on average on a day when they walked the dogs, as compared to when they did not? Make a suitable graph that will help you answer this question. What do you conclude? (Hint: for the column that indicates whether they walked the dogs, should this be quantitative or categorical? What will R treat this as? Your first attempt at a graph may not come out right. Read the message carefully to figure out what to do.)
- (2 points) Make a graph that shows how many times the author walked on each day of the week (without considering any of the other variables). What is wrong with your graph? Explain briefly. (If your graph does not have anything wrong with it, explain briefly how you fixed the problem you encountered.)
- (3 points) Is there a relationship between the number of steps taken in a day (explanatory) and the distance travelled in that day (response)? Make a graph to investigate. What do you learn from your graph? Based on what you can find out about pedometer apps, does your graph make sense? Explain briefly. (Cite a source for your answer.)
Reading files
- (2 points) Take a look at the file at http://ritsokiguess.site/datafiles/adhd-2.txt. There are four observations on a variable
y
for each of 24 individuals, measured at times 0, 15, 30, and 60 minutes after a treatment. Considering that you will want to read these data into a dataframe, what is the key issue that will help you decide how to do it?
- (2 points) Hence, read in and display (some of) the data.
The data in http://ritsokiguess.site/datafiles/heart-rates.xlsx are a spreadsheet of heart rate values for a sample of male and female patients (as identified by the patients themselves). We want to read these data into a dataframe and make a graph.
- (2 points) When reading a spreadsheet into R, the file with the spreadsheet in it has to be “local”, that is, on the same machine that you are running R. R has a function
download.file
that takes two inputs: a URL and a name for the local version of the spreadsheet file. Run this to save a local copy of the spreadsheet. Hint: if you are running R on your own machine that is running Windows, you may need to add a third inputmode = "wb"
todownload.file
.)
- (2 points) Read the spreadsheet file that you downloaded into a dataframe, and display (some of) that dataframe. There is only one sheet in the workbook, and it is called
Sheet1
. The columns are calledgender
andheartrate
.
- (2 points) Make a suitable graph of the two columns in your dataframe.