First thing we need to do is to read in data, so that we can use our software to analyze.
Consider these:
Spreadsheet data saved as .csv file.
“Delimited” data such as values separated by spaces.
Actual Excel spreadsheets.
Packages for this section
library(tidyverse)
A spreadsheet
Save as .csv
.csv or “comma-separated values” is a way of turning spreadsheet values into plain text.
Easy to read into R
but does not preserve formulas. (This is a reason for doing all your calculations in your statistical software, and only having data in your spreadsheet.)
Upload this .csv file. (Bottom right, next to New Folder, Upload.) Click Choose File, find the file, click Open. Click OK. See the file appear bottom right.
Make a new Quarto document
File, New File, Quarto Document
…and get rid of the template document (leaving the first four lines).
Make a code chunk and in it put this. Run it.
library(tidyverse)
Reading in the file
Use read_csv with the name of the file, in quotes. Save the read-in file in something, here called mydata. Make a new code chunk for this:
mydata <-read_csv("test1.csv")mydata
More on the above
read_csv guesses what kind of thing is in each column. Here it correctly guesses that:
id and group are text (categorical variables). id is actually “identifier variable”: identifies individuals.
x and y are “double”: numbers that might have a decimal point in them.
R Studio on your own computer
Put the .csv file in the same folder as your project. Then read it in as above like read_csv("test1.csv").
Or, use
# f <- file.choose()f
which brings up a file selector (as if you were going to find a file to load or save it). Find your .csv file, the address of which will be saved in f, and then:
mydata <-read_csv(f)
When you have selected the file, comment out the file.choose line by putting a # on the front of it. That will save you having to find the file again by mistake. (Keyboard shortcut: go to the line, type control-shift-C or Mac equivalent with Cmd.)
Looking at what we read in
Again, type the name of the thing to display it:
mydata
This is a “tibble” or data frame, the standard way of storing a data set in R.
Tibbles print as much as will display on the screen. If there are more rows or columns, it will say so.
You will see navigation keys to display more rows or columns (if there are more).
View-ing your data frame
Another way to examine your data frame is to View it, like this:
View(mydata)
…or find your data frame in the Global Environment top right and click it. - This pops up a “data frame viewer” top left:
This View
Read-only: cannot edit data
Can display data satisfying conditions: click on Filter, then:
for a categorical variable, type name of category you want
for a quantitative variable, use slider to describe values you want.
Can sort a column into ascending or descending order (click little arrows next to column name).
Clicking the symbol with arrow on it left of Filter “pops out” View into separate (bigger) window.
Summarizing what we read in
It is always a good idea to look at your data after you have read it in, to make sure you have believable numbers (and the right number of individuals and variables).
Quick check for errors: these often show up as values too high or too low, so the min and/or max will be unreasonable.
Five-number summary:
summary(mydata)
id x y group
Length:6 Min. :10.00 Min. :20.00 Length:6
Class :character 1st Qu.:11.50 1st Qu.:22.00 Class :character
Mode :character Median :14.00 Median :26.00 Mode :character
Mean :13.67 Mean :25.67
3rd Qu.:15.75 3rd Qu.:29.25
Max. :17.00 Max. :31.00