Data on 202 male and female athletes at the Australian Institute of Sport.
Variables:
categorical: Sex of athlete, sport they play
quantitative: height (cm), weight (kg), lean body mass, red and white blood cell counts, haematocrit and haemoglobin (blood), ferritin concentration, body mass index, percent body fat.
Values separated by tabs (which impacts reading in).
Packages for this section
library(tidyverse)
Reading data into R
Use read_tsv (“tab-separated values”), like read_csv.
The distribution of BMI for females is closer to normal, with only the highest few values being too high
The distribution of BMI values for males might even be right-skewed: not only are the upper values too high, but some of the lowest ones are not low enough.
More normal quantile plots
How straight does a normal quantile plot have to be?
There is randomness in real data, so even a normal quantile plot from normal data won’t look perfectly straight.
With a small sample, can look not very straight even from normal data.
Looking for systematic departure from a straight line; random wiggles ought not to concern us.
Look at some examples where we know the answer, so that we can see what to expect.
Normal data, large sample
d <-tibble(x=rnorm(200))ggplot(d, aes(x=x)) +geom_histogram(bins=10)
The normal quantile plot
ggplot(d,aes(sample=x))+stat_qq()+stat_qq_line()
Normal data, small sample
Not so convincingly normal, but not obviously skewed:
d <-tibble(x=rnorm(20))ggplot(d, aes(x=x)) +geom_histogram(bins=5)
The normal quantile plot
Good, apart from the highest and lowest points being slightly off. I’d call this good:
Comments