Standard ANOVA has just one response variable.
What if you have more than one response?
Try an ANOVA on each response separately.
But might miss some kinds of interesting dependence between the responses that distinguish the groups.
Measure yield and seed weight of plants grown under 2 conditions: low and high amounts of fertilizer.
Data (fertilizer, yield, seed weight):
Yields overlap for fertilizer groups.
Weights overlap for fertilizer groups.
Df Sum Sq Mean Sq F value Pr(>F)
fertilizer 1 12.5 12.500 2.143 0.194
Residuals 6 35.0 5.833
Df Sum Sq Mean Sq F value Pr(>F)
fertilizer 1 3.125 3.125 1.471 0.271
Residuals 6 12.750 2.125
Neither response depends significantly on fertilizer. But…
d
:response <- with(hilo, cbind(yield, weight))
hilo.1 <- manova(response ~ fertilizer, data = hilo)
summary(hilo.1)
Df Pillai approx F num Df den Df Pr(>F)
fertilizer 1 0.80154 10.097 2 5 0.01755 *
Residuals 6
---
Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Create new response variable by gluing together columns of responses, using cbind
.
Use manova
with new response, looks like lm
otherwise.
With more than 2 responses, cannot draw graph. What then?
If MANOVA test significant, cannot use Tukey. What then?
Use discriminant analysis (of which more later).
using Manova
from package car
:
Type II MANOVA Tests:
Sum of squares and products for error:
yield weight
yield 35 -18.00
weight -18 12.75
------------------------------------------
Term: fertilizer
Sum of squares and products for the hypothesis:
yield weight
yield 12.50 6.250
weight 6.25 3.125
Multivariate Tests: fertilizer
Df test stat approx F num Df den Df Pr(>F)
Pillai 1 0.801542 10.09714 2 5 0.017546 *
Wilks 1 0.198458 10.09714 2 5 0.017546 *
Hotelling-Lawley 1 4.038855 10.09714 2 5 0.017546 *
Roy 1 4.038855 10.09714 2 5 0.017546 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Same result as small-m manova
.
Manova
will also do repeated measures, coming up later.
For normal quantile plots, need “extra-long” with all the data values in one column:
There are only four observations per response variable - treatment group combination, so graphs are not very informative (over):
MVTests
loaded first.$
).library(MVTests)
# hilo %>% select(yield, weight) -> numeric_values
summary(BoxM(response, hilo$fertilizer))
Box's M Test
Chi-Squared Value = 1.002964 , df = 3 and p-value: 0.801
Three different varieties of peanuts (mysteriously, 5, 6 and 8) planted in two different locations.
Three response variables: y
, smk
and w
.
manova
) Df Pillai approx F num Df den Df Pr(>F)
location 1 0.89348 11.1843 3 4 0.020502
variety 2 1.70911 9.7924 6 10 0.001056
location:variety 2 1.29086 3.0339 6 10 0.058708
Residuals 6
location *
variety **
location:variety .
Residuals
---
Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Interaction not quite significant, but main effects are.
Combined response variable (y,smk,w)
definitely depends on location and on variety
Weak dependence of (y,smk,w)
on the location-variety combination.
Understanding that dependence beyond our scope right now.
this time there are only six observations per location and four per variety, so normality is still difficult to be confident about
y
at location 1 seems to be the worst for normality (long tails / outliers), and maybe y
at location 2 is skewed left, but the others are not bad
there is some evidence of unequal spread (slopes of lines), but is it bad enough to worry about? (Box M-test, over).
Box's M Test
Chi-Squared Value = 12.47797 , df = 6 and p-value: 0.0521
Box's M Test
Chi-Squared Value = 10.56304 , df = 12 and p-value: 0.567
Neither of these P-values is low enough to worry about. (Remember, the P-value here has to be really small to indicate a problem.)
Box’s M test does not work well (and can fail to work at all) if the sample sizes are too small.
except that the result makes no sense. This is because there are only two observations per location-variety combination, which is not enough to estimate anything, and so the test no longer works.
Comments
d
by line.geom_line
inheritscolour
fromaes
inggplot
.d
has nofertilizer
(previouscolour
), so have to unset.High-fertilizer plants have both yield and weight high.
True even though no sig difference in yield or weight individually.
Drew line separating highs from lows on plot.