ANOVA: explanatory variables categorical (divide data into groups)
traditionally, analysis of covariance has categorical \(x\)’s plus one numerical \(x\) (“covariate”) to be adjusted for.
lm handles this too.
Simple example: two treatments (drugs) (a and b), with before and after scores.
Does knowing before score and/or treatment help to predict after score?
Is after score different by treatment/before score?
Data
Treatment, before, after:
a 5 20
a 10 23
a 12 30
a 9 25
a 23 34
a 21 40
a 14 27
a 18 38
a 6 24
a 13 31
b 7 19
b 12 26
b 27 33
b 24 35
b 18 30
b 22 31
b 26 34
b 21 28
b 14 23
b 9 22
Packages
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.2 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 3.4.2 ✔ tibble 3.2.1
✔ lubridate 1.9.2 ✔ tidyr 1.3.0
✔ purrr 1.0.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# A tibble: 20 × 3
drug before after
<chr> <dbl> <dbl>
1 a 5 20
2 a 10 23
3 a 12 30
4 a 9 25
5 a 23 34
6 a 21 40
7 a 14 27
8 a 18 38
9 a 6 24
10 a 13 31
11 b 7 19
12 b 12 26
13 b 27 33
14 b 24 35
15 b 18 30
16 b 22 31
17 b 26 34
18 b 21 28
19 b 14 23
20 b 9 22
# A tibble: 2 × 3
drug before_mean after_mean
<chr> <dbl> <dbl>
1 a 13.1 29.2
2 b 18 28.1
Mean “after” score slightly higher for treatment A.
Mean “before” score much higher for treatment B.
Greater improvement on treatment A.
Testing for interaction
prepost.1<-lm(after ~ before * drug, data = prepost)anova(prepost.1)
Analysis of Variance Table
Response: after
Df Sum Sq Mean Sq F value Pr(>F)
before 1 430.92 430.92 62.6894 6.34e-07 ***
drug 1 115.31 115.31 16.7743 0.0008442 ***
before:drug 1 12.34 12.34 1.7948 0.1990662
Residuals 16 109.98 6.87
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(prepost.1)
Call:
lm(formula = after ~ before * drug, data = prepost)
Residuals:
Min 1Q Median 3Q Max
-4.8562 -1.7500 0.0696 1.8982 4.0207
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 16.4226 2.0674 7.944 6.08e-07 ***
before 0.9754 0.1446 6.747 4.69e-06 ***
drugb -1.3139 3.1310 -0.420 0.680
before:drugb -0.2536 0.1893 -1.340 0.199
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.622 on 16 degrees of freedom
Multiple R-squared: 0.8355, Adjusted R-squared: 0.8046
F-statistic: 27.09 on 3 and 16 DF, p-value: 1.655e-06
Interaction not significant. Will remove later.
Predictions
Set up values to predict for:
summary(prepost)
drug before after
Length:20 Min. : 5.00 Min. :19.00
Class :character 1st Qu.: 9.75 1st Qu.:23.75
Mode :character Median :14.00 Median :29.00
Mean :15.55 Mean :28.65
3rd Qu.:21.25 3rd Qu.:33.25
Max. :27.00 Max. :40.00
new <-datagrid(before =c(9.75, 14, 21.25), drug =c("a", "b"), model = prepost.1)new
before drug rowid
1 9.75 a 1
2 9.75 b 2
3 14.00 a 3
4 14.00 b 4
5 21.25 a 5
6 21.25 b 6
Comments
As before score goes up, after score goes up.
Red points (drug A) generally above blue points (drug B), for comparable before score.
Suggests before score effect and drug effect.