Standard ANOVA has just one response variable.
What if you have more than one response?
Try an ANOVA on each response separately.
But might miss some kinds of interesting dependence between the responses that distinguish the groups.
Measure yield and seed weight of plants grown under 2 conditions: low and high amounts of fertilizer.
Data (fertilizer, yield, seed weight):
Yields overlap for fertilizer groups.
Weights overlap for fertilizer groups.
Df Sum Sq Mean Sq F value Pr(>F)
fertilizer 1 12.5 12.500 2.143 0.194
Residuals 6 35.0 5.833
Df Sum Sq Mean Sq F value Pr(>F)
fertilizer 1 3.125 3.125 1.471 0.271
Residuals 6 12.750 2.125
Neither response depends significantly on fertilizer. But…
d
:response <- with(hilo, cbind(yield, weight))
hilo.1 <- manova(response ~ fertilizer, data = hilo)
summary(hilo.1)
Df Pillai approx F num Df den Df Pr(>F)
fertilizer 1 0.80154 10.097 2 5 0.01755 *
Residuals 6
---
Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Create new response variable by gluing together columns of responses, using cbind
.
Use manova
with new response, looks like lm
otherwise.
With more than 2 responses, cannot draw graph. What then?
If MANOVA test significant, cannot use Tukey. What then?
Use discriminant analysis (of which more later).
using Manova
from package car
:
Type II MANOVA Tests:
Sum of squares and products for error:
yield weight
yield 35 -18.00
weight -18 12.75
------------------------------------------
Term: fertilizer
Sum of squares and products for the hypothesis:
yield weight
yield 12.50 6.250
weight 6.25 3.125
Multivariate Tests: fertilizer
Df test stat approx F num Df den Df Pr(>F)
Pillai 1 0.801542 10.09714 2 5 0.017546 *
Wilks 1 0.198458 10.09714 2 5 0.017546 *
Hotelling-Lawley 1 4.038855 10.09714 2 5 0.017546 *
Roy 1 4.038855 10.09714 2 5 0.017546 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Same result as small-m manova
.
Manova
will also do repeated measures, coming up later.
For normal quantile plots, need “extra-long” with all the data values in one column:
There are only four observations per response variable - treatment group combination, so graphs are not very informative (over):
MVTests
loaded first.$
). Box's M Test
Chi-Squared Value = 1.002964 , df = 3 and p-value: 0.801
Three different varieties of peanuts (mysteriously, 5, 6 and 8) planted in two different locations.
Three response variables: y
, smk
and w
.
Manova
)peanuts.1 <- lm(response ~ location * variety, data = peanuts)
peanuts.2 <- Manova(peanuts.1)
summary(peanuts.2)
Type II MANOVA Tests:
Sum of squares and products for error:
y smk w
y 104.205 49.365 76.480
smk 49.365 352.105 121.995
w 76.480 121.995 94.835
------------------------------------------
Term: location
Sum of squares and products for the hypothesis:
y smk w
y 0.7008333 -10.6575 7.129167
smk -10.6575000 162.0675 -108.412500
w 7.1291667 -108.4125 72.520833
Multivariate Tests: location
Df test stat approx F num Df den Df Pr(>F)
Pillai 1 0.893484 11.18432 3 4 0.020502 *
Wilks 1 0.106516 11.18432 3 4 0.020502 *
Hotelling-Lawley 1 8.388243 11.18432 3 4 0.020502 *
Roy 1 8.388243 11.18432 3 4 0.020502 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------------------------------------------
Term: variety
Sum of squares and products for the hypothesis:
y smk w
y 196.1150 365.1825 42.6275
smk 365.1825 1089.0150 414.6550
w 42.6275 414.6550 284.1017
Multivariate Tests: variety
Df test stat approx F num Df den Df Pr(>F)
Pillai 2 1.709109 9.792388 6 10 0.0010562 **
Wilks 2 0.012444 10.619086 6 8 0.0019275 **
Hotelling-Lawley 2 21.375675 10.687838 6 6 0.0054869 **
Roy 2 18.187611 30.312685 3 5 0.0012395 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------------------------------------------
Term: location:variety
Sum of squares and products for the hypothesis:
y smk w
y 205.1017 363.6675 107.78583
smk 363.6675 780.6950 254.22000
w 107.7858 254.2200 85.95167
Multivariate Tests: location:variety
Df test stat approx F num Df den Df Pr(>F)
Pillai 2 1.290861 3.033867 6 10 0.058708 .
Wilks 2 0.074300 3.558197 6 8 0.050794 .
Hotelling-Lawley 2 7.544290 3.772145 6 6 0.065517 .
Roy 2 6.824094 11.373490 3 5 0.011340 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Interaction not quite significant, but main effects are.
Combined response variable (y,smk,w)
definitely depends on location and on variety
Weak dependence of (y,smk,w)
on the location-variety combination.
Understanding that dependence beyond our scope right now.
this time there are only six observations per location and four per variety, so normality is still difficult to be confident about
y
at location 1 seems to be the worst for normality (long tails / outliers), and maybe y
at location 2 is skewed left, but the others are not bad
there is some evidence of unequal spread (slopes of lines), but is it bad enough to worry about? (Box M-test, over).
Box's M Test
Chi-Squared Value = 12.47797 , df = 6 and p-value: 0.0521
Box's M Test
Chi-Squared Value = 10.56304 , df = 12 and p-value: 0.567
Comments
d
by line.geom_line
inheritscolour
fromaes
inggplot
.d
has nofertilizer
(previouscolour
), so have to unset.High-fertilizer plants have both yield and weight high.
True even though no sig difference in yield or weight individually.
Drew line separating highs from lows on plot.