Worksheet 1

The questions here have solutions attached. Follow the solutions to see what to do, if you cannot otherwise guess.

It is very much worth your while to work through these problems, and the ones in future tutorial worksheets, because they will get you used to how R operates, and gain you some comfort in coding things. If you do not work through these problems now, any issues that you could have dealt with this week (with help available) will come back to bite you later, when you will have an assignment due. This is stress you would do well to be without.

If you don’t get to the end in tutorial, it’s a good idea to finish them on your own time this week, maybe after Thursday’s lecture in the case of the last question.

Using R Studio online

Point your web browser at http://r.datatools.utoronto.ca. Click on the blue “Log In” button below “R Studio”. You might see a “CILogon” screen. If you do, make sure that it says University of Toronto near the bottom and click the green Log On. Log in with your UTorId and password, then wait for R Studio to start up.

Create a new Project for your work in this course. A good name for the project is the course code. See below for how to do this.

One last piece of testing: find the Console window (which is probably on the left). Click next to the blue >, and type library(tidyverse). Press Enter.

Getting started

This question is to get you started using R.

Start R Studio on r.datatools (or on your computer), in your course project that you created in the previous question.

We’re going to do some stuff in R here, just to get used to it. First, make a Quarto document by selecting File, New File and Quarto Document.

You can delete the template code below the YAML block now (that is, everything from the title “Quarto” to the end). Somewhere in the space opened up below the YAML block (it might say “Heading 2”, greyed out), type a /. This, like Notion, gives you a list of things to choose from to insert there. Pressing Enter will insert a “code chunk”, sometimes known as a “code cell”. We are going to use this in a moment.

On the line below the {r}, type these two lines of code into the chunk in the Quarto document:

Run this command. To do that, look at the top right of your code chunk block (shaded in a slightly different colour). You should see a down arrow and a green “play button”. Click the play button. This will run the code, and show the output below the code chunk.

Something a little more interesting: summary obtains a summary of whatever you feed it (the five-number summary plus the mean for numerical variables). Obtain this for our data frame. To do this, create a new code chunk below the previous one, type summary(mtcars) into the code chunk, and run it.

Let’s make a histogram of the gas mileage data. Type the code below into another new code chunk, and run it:

Some aesthetics: Add some narrative text above and below your code chunks. Above the code chunk is where you say what you are going to do (and maybe why you are doing it), and below is where you say what you conclude from the output you just obtained. I find it looks better if you have a blank line above and below each code chunk.

Save your Quarto document (the usual way with File and Save). This saves it on the jupyter servers (and not on your computer). This means that when you come back to it later, even from another device, this notebook will still be available to you. (This of course does not apply if you are running R Studio on your own computer.) Now click Render. This produces a pretty HTML version of your Quarto document. This will appear in a new tab of your web browser,¹ which you might need to encourage to appear (if you have a pop-up blocker) by clicking a Try Again.

The rendering process as you did it doesn’t produce that nice display of a dataframe that I had in one of my screenshots. To get that, alter the YAML block (at the very top) to read as below. Re-render, and note what it does.

format: 
  html:
     df-print: paged
     embed-resources: true

You should keep anything else you had there before (such as a title), but rearrange the format-html part to look like this.

Practice handing in your rendered Quarto document, as if it were an assignment that was worth something. (It is good to get the practice in a low-stakes situation, so that you’ll know what to do next week.)

Something more ambitious: make a scatterplot of gas mileage mpg, on the \(y\) axis, against horsepower, hp, on the \(x\)-axis.

Reading data from a file

In this question, we read a file from the web and do some descriptive statistics and a graph. This is very like what you will be doing on future assignments, so it’s good to practice it now.

Take a look at the data file at http://ritsokiguess.site/datafiles/jumping.txt. These are measurements on 30 rats that were randomly made to do different amounts of jumping by group (we’ll see the details later in the course). The control group did no jumping, and the other groups did “low jumping” and “high jumping”. The first column says which jumping group each rat was in, and the second is the rat’s bone density (the experimenters’ supposition was that more jumping should go with higher bone density).

What are the two columns of data separated by? (The fancy word is “delimited”).

Make a new Quarto document. Leave the YAML block, but get rid of the rest of the template document. Start with a code chunk containing library(tidyverse). Run it.

Put the URL of the data file in a variable called my_url. Then use read_delim to read in the file. (See solutions for how.) read_delim reads data files where the data values are always separated by the same single character, here a space. Save the data frame in a variable rats.

Take a look at your data frame, by making a new code chunk and putting the data frame’s name in it (as we did with mtcars).

Find the mean bone density for rats that did each amount of jumping.

Make a boxplot of bone density for each jumping group.

Footnotes

Or possibly in the Viewer tab of R Studio, depending on how things are set up.↩︎