---
title: "Analysis of Covariance"
editor:
markdown:
wrap: 72
---
## Analysis of covariance
- ANOVA: explanatory variables categorical (divide data into groups)
- traditionally, analysis of covariance has categorical $x$'s plus one
numerical $x$ ("covariate") to be adjusted for.
- `lm` handles this too.
- Simple example: two treatments (drugs) (`a` and `b`), with before
and after scores.
- Does knowing before score and/or treatment help to predict after
score?
- Is after score different by treatment/before score?
## Data
Treatment, before, after:
```
a 5 20
a 10 23
a 12 30
a 9 25
a 23 34
a 21 40
a 14 27
a 18 38
a 6 24
a 13 31
b 7 19
b 12 26
b 27 33
b 24 35
b 18 30
b 22 31
b 26 34
b 21 28
b 14 23
b 9 22
```
\normalsize
## Packages
```{r bAncova-1}
library(tidyverse)
library(broom)
library(marginaleffects)
```
the last of these for predictions.
## Read in data
```{r bAncova-2, message=F}
url <- "http://ritsokiguess.site/datafiles/ancova.txt"
prepost <- read_delim(url, " ")
prepost
```
## Making a plot
```{r ancova-plot, fig.height=4.5}
ggplot(prepost, aes(x = before, y = after, colour = drug)) +
geom_point() + geom_smooth(method = "lm")
```
## Comments
- As before score goes up, after score goes up.
- Red points (drug A) generally above blue points (drug B), for
comparable before score.
- Suggests before score effect *and* drug effect.
## The means
```{r bAncova-3 }
prepost %>%
group_by(drug) %>%
summarize(
before_mean = mean(before),
after_mean = mean(after)
)
```
- Mean "after" score slightly higher for treatment A.
- Mean "before" score much higher for treatment B.
- Greater *improvement* on treatment A.
## Testing for interaction
```{r bAncova-4 }
prepost.1 <- lm(after ~ before * drug, data = prepost)
anova(prepost.1)
summary(prepost.1)
```
- Interaction not significant. Will remove later.
## Predictions
Set up values to predict for:
```{r}
summary(prepost)
```
```{r}
new <- datagrid(before = c(9.75, 14, 21.25),
drug = c("a", "b"), model = prepost.1)
new
```
## and then
```{r}
cbind(predictions(prepost.1, newdata = new)) %>%
select(drug, before, estimate, conf.low, conf.high)
```
\normalsize
## Predictions (with interaction included), plotted
```{r, fig.height=4}
plot_predictions(model = prepost.1, condition = c("before", "drug"))
```
Lines almost parallel, but not quite.
## Taking out interaction
\small
```{r bAncova-8 }
prepost.2 <- update(prepost.1, . ~ . - before:drug)
summary(prepost.2)
anova(prepost.2)
```
\normalsize
- Take out non-significant interaction.
- `before` and `drug` strongly significant.
- Do predictions again and plot them.
## Predictions
```{r}
cbind(predictions(prepost.2, newdata = new)) %>%
select(drug, before, estimate)
```
## Plot of predicted values
```{r, fig.height=4}
plot_predictions(prepost.2, condition = c("before", "drug"))
```
This time the lines are *exactly* parallel. No-interaction model forces
them to have the same slope.
## Different look at model output
- `anova(prepost.2)` tests for significant effect of before score and
of drug, but doesn't help with interpretation.
- `summary(prepost.2)` views as regression with slopes:
\scriptsize
```{r bAncova-11 }
summary(prepost.2)
```
\normalsize
## Understanding those slopes
\footnotesize
```{r bAncova-12}
tidy(prepost.2)
```
\normalsize
- `before` ordinary numerical variable; `drug` categorical.
- `lm` uses first category `druga` as baseline.
- Intercept is prediction of after score for before score 0 and *drug
A*.
- `before` slope is predicted change in after score when before score
increases by 1 (usual slope)
- Slope for `drugb` is *change* in predicted after score for being on
drug B rather than drug A. Same for *any* before score (no
interaction).
## Summary
- ANCOVA model: fits different regression line for each group,
predicting response from covariate.
- ANCOVA model with interaction between factor and covariate allows
different slopes for each line.
- Sometimes those lines can cross over!
- If interaction not significant, take out. Lines then parallel.
- With parallel lines, groups have consistent effect regardless of
value of covariate.