---
title: "Drawing graphs"
editor:
markdown:
wrap: 72
---
## Our data
- To illustrate making graphs, we need some data.
- Data on 202 male and female athletes at the Australian Institute of
Sport.
- Variables:
- categorical: Sex of athlete, sport they play
- quantitative: height (cm), weight (kg), lean body mass, red and
white blood cell counts, haematocrit and haemoglobin (blood),
ferritin concentration, body mass index, percent body fat.
- Values separated by tabs (which impacts reading in).
## Packages for this section
```{r graphs-R-1}
library(tidyverse)
```
## Reading data into R
- Use `read_tsv` ("tab-separated values"), like `read_csv`.
- Data in `ais.txt`:
```{r graphs-R-2}
my_url <- "http://ritsokiguess.site/datafiles/ais.txt"
athletes <- read_tsv(my_url)
```
## The data (some)
```{r graphs-R-3}
athletes
```
## Types of graph {.smaller}
Depends on number and type of variables:
| Categorical | Quantitative | Graph |
|-------------:|-------------:|:-------------------------------------------|
| 1 | 0 | bar chart |
| 0 | 1 | histogram |
| 2 | 0 | grouped bar charts |
| 1 | 1 | side-by-side boxplots |
| 0 | 2 | scatterplot |
| 2 | 1 | grouped boxplots |
| 1 | 2 | scatterplot with points identified by group (eg. by colour) |
With more (categorical) variables, might want *separate plots by
groups*. This is called `facetting` in R.
## `ggplot`
- R has a standard graphing procedure `ggplot`, that we use for all
our graphs.
- Use in different ways to get precise graph we want.
- Let's start with bar chart of the sports played by the athletes.
## Bar chart
```{r graphs-R-4, fig.height=3.9}
ggplot(athletes, aes(x = Sport)) + geom_bar()
```
## Histogram of body mass index
```{r graphs-R-5, fig.height=3.9}
ggplot(athletes, aes(x = BMI)) + geom_histogram(bins = 10)
```
## Which sports are played by males and females?
Grouped bar chart:
```{r graphs-R-6, fig.height=3.15}
ggplot(athletes, aes(x = Sport, fill = Sex)) +
geom_bar(position = "dodge")
```
## BMI by gender
```{r graphs-R-7, fig.height=4}
ggplot(athletes, aes(x = Sex, y = BMI)) + geom_boxplot()
```
## Height vs. weight
Scatterplot:
```{r graphs-R-8, fig.height=3.4}
ggplot(athletes, aes(x = Ht, y = Wt)) + geom_point()
```
## With regression line
```{r graphs-R-9, fig.height=3.6}
ggplot(athletes, aes(x = Ht, y = Wt)) +
geom_point() + geom_smooth(method = "lm")
```
## BMI by sport and gender
```{r graphs-R-10, fig.height=3.6}
ggplot(athletes, aes(x = Sport, y = BMI, fill = Sex)) +
geom_boxplot()
```
A variation that uses `colour` instead of `fill`:
```{r}
ggplot(athletes, aes(x = Sport, y = BMI, colour = Sex)) +
geom_boxplot()
```
## Height and weight by gender
```{r}
#| fig-height: 9
#| fig-width: 9
ggplot(athletes, aes(x = Ht, y = Wt, colour = Sex)) +
geom_point()
```
## Height by weight by gender for each sport, with facets
```{r graphs-R-12, fig.height=3.6}
ggplot(athletes, aes(x = Ht, y = Wt, colour = Sex)) +
geom_point() + facet_wrap(~Sport)
```
## Filling each facet
Default uses same scale for each facet. To use different scales for each
facet, this:
```{r graphs-R-13, fig.height=5.2}
ggplot(athletes, aes(x = Ht, y = Wt, colour = Sex)) +
geom_point() + facet_wrap(~Sport, scales = "free")
```
## Another view of height vs weight
```{r}
ggplot(athletes, aes(x = Ht, y = Wt)) +
geom_point() + facet_wrap(~ Sex)
```