ANOVA: explanatory variables categorical (divide data into groups)
traditionally, analysis of covariance has categorical \(x\)’s plus one numerical \(x\) (“covariate”) to be adjusted for.
lm
handles this too.
Simple example: two treatments (drugs) (a
and b
), with before and after scores.
Does knowing before score and/or treatment help to predict after score?
Is after score different by treatment/before score?
Treatment, before, after:
a 5 20
a 10 23
a 12 30
a 9 25
a 23 34
a 21 40
a 14 27
a 18 38
a 6 24
a 13 31
b 7 19
b 12 26
b 27 33
b 24 35
b 18 30
b 22 31
b 26 34
b 21 28
b 14 23
b 9 22
the last of these for predictions.
Mean “after” score slightly higher for treatment A.
Mean “before” score much higher for treatment B.
Greater improvement on treatment A.
Call:
lm(formula = after ~ before * drug, data = prepost)
Residuals:
Min 1Q Median 3Q Max
-4.8562 -1.7500 0.0696 1.8982 4.0207
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 16.4226 2.0674 7.944 6.08e-07 ***
before 0.9754 0.1446 6.747 4.69e-06 ***
drugb -1.3139 3.1310 -0.420 0.680
before:drugb -0.2536 0.1893 -1.340 0.199
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.622 on 16 degrees of freedom
Multiple R-squared: 0.8355, Adjusted R-squared: 0.8046
F-statistic: 27.09 on 3 and 16 DF, p-value: 1.655e-06
Set up values to predict for:
drug before after
Length:20 Min. : 5.00 Min. :19.00
Class :character 1st Qu.: 9.75 1st Qu.:23.75
Mode :character Median :14.00 Median :29.00
Mean :15.55 Mean :28.65
3rd Qu.:21.25 3rd Qu.:33.25
Max. :27.00 Max. :40.00
Lines almost parallel, but not quite.
Call:
lm(formula = after ~ before + drug, data = prepost)
Residuals:
Min 1Q Median 3Q Max
-3.6348 -2.5099 -0.2038 1.8871 4.7453
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 18.3600 1.5115 12.147 8.35e-10 ***
before 0.8275 0.0955 8.665 1.21e-07 ***
drugb -5.1547 1.2876 -4.003 0.000921 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.682 on 17 degrees of freedom
Multiple R-squared: 0.817, Adjusted R-squared: 0.7955
F-statistic: 37.96 on 2 and 17 DF, p-value: 5.372e-07
Take out non-significant interaction.
before
and drug
strongly significant.
Do predictions again and plot them.
This time the lines are exactly parallel. No-interaction model forces them to have the same slope.
anova(prepost.2)
tests for significant effect of before score and of drug, but doesn’t help with interpretation.
summary(prepost.2)
views as regression with slopes:
Call:
lm(formula = after ~ before + drug, data = prepost)
Residuals:
Min 1Q Median 3Q Max
-3.6348 -2.5099 -0.2038 1.8871 4.7453
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 18.3600 1.5115 12.147 8.35e-10 ***
before 0.8275 0.0955 8.665 1.21e-07 ***
drugb -5.1547 1.2876 -4.003 0.000921 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.682 on 17 degrees of freedom
Multiple R-squared: 0.817, Adjusted R-squared: 0.7955
F-statistic: 37.96 on 2 and 17 DF, p-value: 5.372e-07
before
ordinary numerical variable; drug
categorical.
lm
uses first category druga
as baseline.
Intercept is prediction of after score for before score 0 and drug A.
before
slope is predicted change in after score when before score increases by 1 (usual slope)
Slope for drugb
is change in predicted after score for being on drug B rather than drug A. Same for any before score (no interaction).
ANCOVA model: fits different regression line for each group, predicting response from covariate.
ANCOVA model with interaction between factor and covariate allows different slopes for each line.
Sometimes those lines can cross over!
If interaction not significant, take out. Lines then parallel.
With parallel lines, groups have consistent effect regardless of value of covariate.
Comments
As before score goes up, after score goes up.
Red points (drug A) generally above blue points (drug B), for comparable before score.
Suggests before score effect and drug effect.