Drawing graphs

Our data

  • To illustrate making graphs, we need some data.
  • Data on 202 male and female athletes at the Australian Institute of Sport.
  • Variables:
    • categorical: Sex of athlete, sport they play
    • quantitative: height (cm), weight (kg), lean body mass, red and white blood cell counts, haematocrit and haemoglobin (blood), ferritin concentration, body mass index, percent body fat.
  • Values separated by tabs (which impacts reading in).

Packages for this section

library(tidyverse)

Reading data into R

  • Use read_tsv (“tab-separated values”), like read_csv.
  • Data in ais.txt:
my_url <- "http://ritsokiguess.site/datafiles/ais.txt"
athletes <- read_tsv(my_url)

The data (some)

athletes

Types of graph

Depends on number and type of variables:

Categorical Quantitative Graph
1 0 bar chart
0 1 histogram
2 0 grouped bar charts
1 1 side-by-side boxplots
0 2 scatterplot
2 1 grouped boxplots
1 2 scatterplot with points identified by group (eg. by colour)

With more (categorical) variables, might want separate plots by groups. This is called facetting in R.

ggplot

  • R has a standard graphing procedure ggplot, that we use for all our graphs.
  • Use in different ways to get precise graph we want.
  • Let’s start with bar chart of the sports played by the athletes.

Bar chart

ggplot(athletes, aes(x = Sport)) + geom_bar()

Histogram of body mass index

ggplot(athletes, aes(x = BMI)) + geom_histogram(bins = 10)

Which sports are played by males and females?

Grouped bar chart:

ggplot(athletes, aes(x = Sport, fill = Sex)) +
  geom_bar(position = "dodge")

BMI by gender

ggplot(athletes, aes(x = Sex, y = BMI)) + geom_boxplot() 

Height vs. weight

Scatterplot:

ggplot(athletes, aes(x = Ht, y = Wt)) + geom_point()

With regression line

ggplot(athletes, aes(x = Ht, y = Wt)) +
  geom_point() + geom_smooth(method = "lm")

BMI by sport and gender

ggplot(athletes, aes(x = Sport, y = BMI, fill = Sex)) +
  geom_boxplot()

A variation that uses colour instead of fill:

ggplot(athletes, aes(x = Sport, y = BMI, colour = Sex)) +
  geom_boxplot()

Height and weight by gender

ggplot(athletes, aes(x = Ht, y = Wt, colour = Sex)) +
  geom_point()

Height by weight by gender for each sport, with facets

ggplot(athletes, aes(x = Ht, y = Wt, colour = Sex)) +
  geom_point() + facet_wrap(~Sport)

Filling each facet

Default uses same scale for each facet. To use different scales for each facet, this:

ggplot(athletes, aes(x = Ht, y = Wt, colour = Sex)) +
  geom_point() + facet_wrap(~Sport, scales = "free")

Another view of height vs weight

ggplot(athletes, aes(x = Ht, y = Wt)) +
  geom_point() + facet_wrap(~ Sex)