---
title: "Basic statistical inference"
---
## Packages for this section
```{r inference-1-R-1}
library(tidyverse)
```
## Inference for means (from STAB57)
Three kinds of inference for means of normally-distributed data:
- **One-sample $t$**: a single sample from a population, estimate that population's mean
- **Two-sample $t$**: one sample from each of 2 populations, estimate difference in population means
- **Matched pairs $t$**: two paired measurements on same (or matched) individuals, estimate population mean difference
Two forms of inference for a population parameter:
- **Confidence interval**: "what is the population parameter?"
- **Hypothesis test**: "could the population parameter be equal to this value?"
## Examples:
- Blue jays attendances (one-sample)
- Kids learning to read (two-sample)
- Pain relief (matched pairs)
## Confidence interval
- You have a sample from some population
- Imagine repeated sampling from that population
- Procedure that gives an interval containing the true parameter in 95% (or 90% or 99%) of all possible samples
## Hypothesis test
- Null hypothesis gives value for population parameter
- Alternative hypothesis says how you are trying to prove the null hypothesis wrong (not equal, greater, less).
- Test statistic measures "distance" between data and null hypothesis
- P-value gives probability of observing test statistic *as extreme or more extreme*, **if the null hypothesis is true**.
- Reject null hypothesis if P-value small enough (eg smaller than 0.05).
## Why 0.05? This man.
::: columns
::: {.column width="40%"}
![](fisher.png)
:::
::: {.column width="60%"}
- analysis of variance
- Fisher information
- Linear discriminant analysis
- Fisher's $z$-transformation
- Fisher-Yates shuffle
- Behrens-Fisher problem
Sir Ronald A. Fisher, 1890--1962.
:::
:::
## Why 0.05? (2)
- From The Arrangement of Field Experiments (1926):
![](fisher1.png){width="200%"}
- and
![](fisher2.png){width="200%"}
## $\alpha$ and errors
- Hypothesis test ends with decision:
- reject null hypothesis
- do not reject null hypothesis.
- but decision may be wrong:
| | Decision | |
|----------------|-------------------|-----------------|
| **Truth** | **Do not reject** | **reject null** |
| **Null true** | Correct | Type I error |
| **Null false** | Type II error | Correct |
- Either type of error is bad, but for now focus on controlling Type I
error: write $\alpha$ = P(type I error), and devise test so that
$\alpha$ small, typically 0.05.
- That is, **if null hypothesis true**, have only small chance to
reject it (which would be a mistake).
- Worry about type II errors later (when we consider power of test).
## One sample: the Blue Jays attendances
- The Toronto Blue Jays' average home attendance in part of 2015
season was 25,070 (up to May 27 2015, from baseball-reference.com).
- Does that mean the attendance at every game was exactly 25,070?
Certainly not. Actual attendance depends on many things, eg.:
- how well the Jays are playing
- the opposition
- day of week
- weather
- random chance
## Reading the attendances
...as a `.csv` file:
```{r inference-1-R-2}
my_url <- "http://ritsokiguess.site/datafiles/jays15-home.csv"
jays <- read_csv(my_url)
jays
```
## Another way
- This is a big data set: only 25 observations, but a lot of
*variables*.
- To see the first few values in all the variables, can also use
`glimpse`:
```{r inference-1-R-5}
glimpse(jays)
```
## Attendance histogram
```{r inference-1-R-6, fig.height=3.8}
ggplot(jays, aes(x = attendance)) + geom_histogram(bins = 6)
```
## Comments
- Attendances have substantial variability, ranging from just over
10,000 to around 50,000.
- Distribution somewhat skewed to right (but no outliers).
- These are a sample of "all possible games" (or maybe "all possible
games played in April and May"). What can we say about mean
attendance in all possible games based on this evidence?
## CI for mean attendance
- `t.test` function does CI and test. Look at CI first:
```{r inference-1-R-7}
t.test(jays$attendance)
```
- From 20,500 to 29,600.
## Or, 90% CI
- by including a value for conf.level:
```{r inference-1-R-8}
t.test(jays$attendance, conf.level = 0.90)
```
- From 21,300 to 28,800. (Shorter, as it should be.)
## Comments
- Need to say "column attendance within data frame `jays`" using `$`.
- 95% CI from about 20,000 to about 30,000.
- Not estimating mean attendance well at all!
- Generally want confidence interval to be shorter, which happens if:
- SD smaller
- sample size bigger
- confidence level smaller
- Last one is a cheat, really, since reducing confidence level
increases chance that interval won't contain pop. mean at all!
## Another way to access data frame columns
```{r inference-1-R-9}
with(jays, t.test(attendance))
```
## Hypothesis testing for Blue Jays attendances
- Previous year's mean attendance was 29,327, so test to see whether the mean is different from that in any way (two-sided test):
```{r inference-1-R-10}
t.test(jays$attendance, mu = 29327)
```
- See test statistic $-1.93$, P-value 0.065.
- Do not reject null at $\alpha=0.05$: no evidence that mean
attendance has changed.
## Another example: learning to read
- You devised new method for teaching children to read.
- Guess it will be more effective than current methods.
- To support this guess, collect data.
- Want to generalize to “all children in Canada”.
- So take random sample of all children in Canada.
- Or, argue that sample you actually have is “typical” of all children in
Canada.
- Randomization (1): whether or not a child in sample or not has
nothing to do with anything else about that child.
- Randomization (2): randomly choose whether each child gets new
reading method (t) or standard one (c).
## Reading in data
- File at .
- Proper reading-in function is `read_delim` (check file to see)
- Read in thus:
```{r inference-1-R-12}
my_url <- "http://ritsokiguess.site/datafiles/drp.txt"
kids <- read_delim(my_url," ")
```
## The data (some)
```{r inference-1-R-13}
kids
```
## Boxplots
```{r inference-1-R-14, fig.height=3.7}
ggplot(kids, aes(x = group, y = score)) + geom_boxplot()
```
## Two kinds of two-sample t-test
- Do the two groups have same spread (SD, variance)?
- If yes (shaky assumption here), can use pooled t-test.
- If not, use Welch-Satterthwaite t-test (safe).
- Pooled test derived in STAB57 (easier to derive, but assumes equal variances).
- Welch-Satterthwaite does not assume equality of variances.
- Assess (approx) equality of spreads using boxplot.
## The (Welch-Satterthwaite) t-test
- `c` (control) before `t` (treatment) alphabetically, so proper alternative
is “less”.
- R does Welch-Satterthwaite test by default
- Answer to "does the new reading program really help?"
- (in a moment) how to get R to do pooled test?
## Welch-Satterthwaite
```{r inference-1-R-15}
t.test(score ~ group, data = kids, alternative = "less")
```
## The pooled t-test
```{r inference-1-R-16}
t.test(score ~ group, data = kids,
alternative = "less", var.equal = TRUE)
```
## Two-sided test; CI
- To do 2-sided test, leave out `alternative`:
```{r inference-1-R-17}
t.test(score ~ group, data = kids)
```
## Comments:
- P-values for pooled and Welch-Satterthwaite tests very similar (even though the pooled test seemed inferior): 0.013 vs.\ 0.014.
- Two-sided test also gives CI: new reading program increases average scores by
somewhere between about 1 and 19 points.
- Confidence intervals inherently two-sided, so do 2-sided test to get
them.
## Pain relief
Some data:
\centering{
\includegraphics[height=0.7\textheight]{Screenshot_2019-04-26_13-41-29}
}
## Matched pairs data
- Data are comparison of 2 drugs for effectiveness at reducing pain.
- 12 subjects (cases) were arthritis sufferers
- Response is #hours of pain relief from each drug.
- In reading example, each child tried only one reading method.
- But here, each subject tried out both drugs, giving us two
measurements.
- Possible because, if you wait long enough, one drug has no influence
over effect of other.
- Advantage: focused comparison of drugs. Compare one drug with
another on same person, removes a lot of variability due to differences between people.
- Matched pairs, requires different analysis.
- Design: randomly choose 6 of 12 subjects to get drug A first, other 6
get drug B first.
## Paired t test: reading the data
Values aligned in columns:
```{r inference-4-R-1}
my_url <- "http://ritsokiguess.site/datafiles/analgesic.txt"
pain <- read_table(my_url)
```
## The data
```{r inference-4-R-2}
pain
```
## Paired *t*-test
\small
```{r inference-4-R-3}
with(pain, t.test(druga, drugb, paired = T))
```
\normalsize
- P-value is 0.053.
- Not quite evidence of difference between drugs.
## t-testing the differences
- Likewise, you can calculate the differences yourself and
do a 1-sample t-test on them.
- First calculate a column of differences:
\footnotesize
```{r inference-4-R-4}
(pain %>% mutate(diff=druga-drugb) -> pain)
```
\normalsize
## t-test on the differences
- then throw them into t.test, testing that the mean is zero, with
same result as before:
```{r inference-4-R-5}
with(pain, t.test(diff, mu=0))
```