library(tidyverse)
library(marginaleffects)
library(car)
Worksheet 7
Packages
Hair colour and pain tolerance
In lecture, we sped through a review example of one-way ANOVA, in which we investigated the effect of hair colour on pain tolerance. The data were in http://ritsokiguess.site/datafiles/hairpain.txt, with two columns hair
(hair colour) and pain
(pain tolerance, with a higher value indicating that the individual can withstand more pain). The hair colours were brown and blond, each subdivided into light and dark. The data values are separated by single spaces.
In this problem, we take a different approach to an analysis of the same data. In particular, we focus on three particular comparisons of pain tolerance:
- light brown hair vs. dark brown hair
- light blond hair vs. dark blond hair
- the average of brown hair vs the average of blond hair.
These are comparisons we have chosen to make before looking at the data (because, we suppose, they are particular research questions of interest to us).
- Read in and display (some of) the data.
The last sentence of the first paragraph indicates that we do not have a .csv
(for a change), but rather something that needs to be read in using read_delim
:
<- "http://ritsokiguess.site/datafiles/hairpain.txt"
my_url <- read_delim(my_url, " ") hairpain
Rows: 19 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: " "
chr (1): hair
dbl (1): pain
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
hairpain
- For the analysis we are about to do, the categorical variable needs to be a
factor
. Create and save a dataframe for which this is the case.
The categorical variable is hair
, so (overwriting the original dataframe, which appears to be safe):
%>%
hairpain mutate(hair = factor(hair)) -> hairpain
hairpain
The fctr
(or fct
) at the top of the new hair
indicates that we have indeed made a factor column. (The contrasts
stuff later needs a factor
, hence making one now.)
- Find out how many observations you have for each hair type. (This has the side-effect of telling you which order the hair types are in, as far as R is concerned.)
This is most easily count
:
%>% count(hair) hairpain
which reveals that the hair types are actually in alphabetical order. This is not exactly a surprise, but I figured it would make your life easier below to know exactly which hair type is first, second, etc. without you otherwise having to work it out.
- Set up contrasts to represent your comparisons of interest.
Something along these lines:
<- c(0, 1, 0, -1)
c_brown <- c(1, 0, -1, 0)
c_blond <- c(-1/2, 1/2, -1/2, 1/2) c_brown_blond
The names of the contrasts are chosen by you, but it is helpful to use names that say which things you are contrasting.
Pick out the two shades of brown (the second and fourth ones in question 3 with zeros elsewhere), the two shades of blond (the first and third ones in question 3 with zeros elsewhere), and the average of the two brown positive and of the two blond negative (for example; see below).
You have some alternatives:
- you can switch all the positives and all the negatives in any contrast (for example, the first one could be
c(0, -1, 0, 1)
) because it is still “the same contrast”, that is, making the same comparison, this way). - you can also multiply or divide all the numbers in a contrast by the same thing (for example, the last one could be
c(-1, 1, -1, 1)
). This also makes “the same contrast”. My take when I am comparing averages is to divide by how many things the average is based on, in this case 2 (two types of brown hair and two types of blond hair). If, say, one of them had been two types and the other three, it would have been necessary to divide by the right thing, for examplec(1/2, 1/2, -1/3, -1/3, -1/3)
, to make the coefficients of the contrast add up to zero (see below).
The four coefficients in each of your contrasts need to add up to zero (this is actually the definition of a contrast), which you can check as follows:
sum(c_brown)
[1] 0
sum(c_blond)
[1] 0
sum(c_brown_blond)
[1] 0
If they do, and the coefficients are of the right sign (relative to each other) and the right relative size, you are good.
- Verify that your contrasts are orthogonal.
Let’s recall our contrasts:
c_brown
[1] 0 1 0 -1
c_blond
[1] 1 0 -1 0
c_brown_blond
[1] -0.5 0.5 -0.5 0.5
To verify that two contrasts are orthogonal, say the second and third here, multiply the first coefficient in one by the first coefficient in the other, then repeat for the other coefficients, then add up the results, checking that your final result is zero. Here, that gives
\[ (1)(-0.5) + (0)(0.5) + (-1)(-0.5) + (0)(0.5) = (-0.5) + 0 + 0.5 + 0 = 0.\]
Check. Once you are happy that you understand the process, get R to do the heavy lifting the rest of the way:
sum(c_brown * c_blond)
[1] 0
sum(c_brown * c_brown_blond)
[1] 0
sum(c_blond * c_brown_blond)
[1] 0
All three pairs of contrasts are indeed orthogonal.
Having orthogonal contrasts means that the tests we are going to do below are independent of each other, so that the result of one does not influence any of the others.
- Set up your contrasts as a matrix, and set this matrix as the contrasts of your categorical variable. Then run the ANOVA as a regression.
That means this:
<- cbind(c_brown, c_blond, c_brown_blond)
m contrasts(hairpain$hair) <- m
cbind
makes the matrix, and the thing inside contrasts
should be your dataframe and categorical variable column separated by a $
.
Then, run what would be an ANOVA as a regression using lm
, and display the summary:
.1 <- lm(pain ~ hair, data = hairpain)
hairpainsummary(hairpain.1)
Call:
lm(formula = pain ~ hair, data = hairpain)
Residuals:
Min 1Q Median 3Q Max
-11.20 -5.45 -0.50 4.30 13.60
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 47.575 1.884 25.257 1.05e-13 ***
hairc_brown -2.550 2.741 -0.930 0.36695
hairc_blond -4.000 2.584 -1.548 0.14251
hairc_brown_blond -15.250 3.767 -4.048 0.00105 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.172 on 15 degrees of freedom
Multiple R-squared: 0.576, Adjusted R-squared: 0.4912
F-statistic: 6.791 on 3 and 15 DF, p-value: 0.004114
Interpretation is coming shortly, but you might note that the three rows are no longer the intercept and slopes you would have expected, but are the three contrasts you set up.
Extra: There were four groups (hair types), so in a regular ANOVA there would be three df for hair types, and your \(F\)-test would have 3 and 15 df.1 When you analyze using contrasts, each contrast has 1 df and hence can be tested with a \(t\)-test, as here.2 As a result, you can have as many contrasts as you have (numerator) df, here 3 because there are 4 groups. You have some freedom to define the contrasts according to the things you want to test, but you can only have a fixed number of them, and they have to be orthogonal.3
- Interpret your results.
The three tests in the summary table are labelled with hair
followed by the names you gave the contrasts. Ignore the intercept:
- there is no difference in pain tolerance between the people with light brown and dark brown hair (P-value 0.37)
- there is no difference in pain tolerance between the people with light blond and dark blond hair (P-value 0.14)
- there is a difference in pain tolerance between brown-haired and blond-haired people (P-value 0.0011).
Note how this is more focused than the regular ANOVA, that looked at all the differences between groups, including some that we didn’t really care about.
There are only two hair colours, brown and blond, so we don’t need to do Tukey, but we can figure out which hair colour is associated with higher pain tolerance either by looking at group means:
%>%
hairpain group_by(hair) %>%
summarize(mean_pain = mean(pain))
or by reasoning it out from the contrast. In my contrast:
c_brown_blond
[1] -0.5 0.5 -0.5 0.5
the positive coefficients went with brown hair. The Estimate in the summary
was negative, which means that the pain tolerance was higher in the people with the negative coefficients, namely the blond-haired people. This is consistent with my table of means just above.
Cats
144 cats, 47 female and 97 male, had been used in an experiment with the muscle-relaxing drug digitalis. For each cat, its Sex
was recorded, along with the cat’s body weight (Bwt
) in kilograms and the weight of its heart (Hwt
) in grams. We are interested in the relationship between the weight of a cat’s heart (response) and its body weight (explanatory), and whether that relationship is different for male and female cats. The data are in the file http://ritsokiguess.site/datafiles/cats.csv.
- Read in and display (some of) the data.
As ever:
<- "http://ritsokiguess.site/datafiles/cats.csv"
my_url <- read_csv(my_url) cats
Rows: 144 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): Sex
dbl (2): Bwt, Hwt
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
cats
144 rows, with the columns promised. The bit about digitalis is not actually at all relevant to what we are doing.
- Make a suitable graph of the three variables. Add appropriate regression lines to your graph.
Two quantitative and one categorical means a scatterplot of the two quantitative variables, bearing in mind which one of them is the response, with the categorical variable’s levels being indicated by colour. Here, we are predicting heart weight from body weight:
ggplot(cats, aes(x = Bwt, y = Hwt, colour = Sex)) + geom_point() +
geom_smooth(method = "lm")
`geom_smooth()` using formula = 'y ~ x'
Some things to note for yourself from the graph:
- the male cats tended to be bigger than the females (the heaviest female cat had a body weight of 3 kg, but there were many male cats heavier than that)
- cats with a bigger body weight tended to have a bigger heart, whether male or female
- is there a difference between male and female cats? Not clear.
- the slopes of the two regression lines look a bit different, but whether that difference is significant remains to be seen.
- Fit a suitable analysis of covariance model, and display its output.
Include an interaction (and check whether it is necessary, which is coming up):
.1 <- lm(Hwt ~ Bwt * Sex, data = cats)
catssummary(cats.1)
Call:
lm(formula = Hwt ~ Bwt * Sex, data = cats)
Residuals:
Min 1Q Median 3Q Max
-3.7728 -1.0118 -0.1196 0.9272 4.8646
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.9813 1.8428 1.618 0.107960
Bwt 2.6364 0.7759 3.398 0.000885 ***
SexM -4.1654 2.0618 -2.020 0.045258 *
Bwt:SexM 1.6763 0.8373 2.002 0.047225 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.442 on 140 degrees of freedom
Multiple R-squared: 0.6566, Adjusted R-squared: 0.6493
F-statistic: 89.24 on 3 and 140 DF, p-value: < 2.2e-16
Extra:
- The interaction is significant (just), so we need to keep it. That means that the two lines on your graph do in fact have significantly different slopes (which may or may not be what you guessed)
- The Estimate for
SexM
says how the intercept for males compares to the baselineSex
, females. If you imagine a cat with body weight 0 (which is of course impossible) by extending the lines on your graph to the left, this says that the Male line will be considerably less than the Female line by the time you get back to aBwt
of zero. Unlike the example in lecture, the Male line ends up being above the Female one eventually (onceBwt
gets above about 2.5), because it has a larger slope. - The Estimate for
Bwt
is an ordinary regression slope, but since the slopes of the two lines are different, we have to say that it is the slope for the baselineSex
, females. That is, a female cat that is 1 kg heavier will have a heart that is 2.64 grams heavier, on average. - The Intercept is likewise the intercept for females.
- The interaction term represents the difference in slopes for male and female cats: the slope for male cats is 1.68 bigger than the slope for female cats.
- Is the interaction term significant? What does your answer mean in the context of the data?
The interaction term has a P-value of 0.047, which is just less than 0.05, so it is significant (at that \(\alpha\)). This means that the two lines predicting heart weight from body weight for males and females have different slopes.
This is unlike the example we had in class, where the interaction term and hence the two slopes were not significantly different.
- For male and female cats of body weights 2.5 and 3.5 kg (all four combinations), obtain predicted heart weights.
This uses marginaleffects
in the customary way. Make a new
dataframe of values to predict for first, and then do your predictions, displaying only the columns of interest.
Hence, first this:
<- datagrid(model = cats.1, Bwt = c(2.5, 3.5), Sex = c("F", "M"))
new new
and then this:
cbind(predictions(cats.1, new)) %>%
select(estimate, Bwt, Sex)
Add things like confidence limits for the predictions if you like (though we’re not using them).
- (2 points) Using your predictions, verify that the slopes for males and females are different.
Take a trip back to high school and recall that slope is rise over run (or search for it and tell me where you found out what to do). In this case, the “run” part is 1, because the two body weights are 1 kg apart, so the slope in each case is just the “rise” part:
Slope for (baseline) females is
12.21 - 9.57
[1] 2.64
and the slope for males is
13.91 - 9.60
[1] 4.31
These are indeed different, as required. (If you forgot to add an interaction term in your ANCOVA model, they will be the same.) You don’t need to be any more accurate than this.
Extra: going back to our output from earlier:
summary(cats.1)
Call:
lm(formula = Hwt ~ Bwt * Sex, data = cats)
Residuals:
Min 1Q Median 3Q Max
-3.7728 -1.0118 -0.1196 0.9272 4.8646
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.9813 1.8428 1.618 0.107960
Bwt 2.6364 0.7759 3.398 0.000885 ***
SexM -4.1654 2.0618 -2.020 0.045258 *
Bwt:SexM 1.6763 0.8373 2.002 0.047225 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.442 on 140 degrees of freedom
Multiple R-squared: 0.6566, Adjusted R-squared: 0.6493
F-statistic: 89.24 on 3 and 140 DF, p-value: < 2.2e-16
the slope for the baseline females is indeed the same as the Estimate for Bwt
. The slope for males is the baseline slope, plus the change in slope between females and males, which is the interaction term, hence
2.64 + 1.68
[1] 4.32
as we found from the predictions.
Neurocognition in individuals with schizophrenia
A study was carried out to evaluate patterns and levels of performance on neurocognitive measures among individuals with schizophrenia and schizoaffective disorder using a well-validated, comprehensive neurocognitive battery specifically designed for individuals with psychosis.
The main interest was in determining how well these measures distinguished among all groups and whether there were variables that distinguished between the schizophrenia and schizoaffective groups. Age and sex were also measured for each individual, but we ignore those in our analysis.
Variables of interest, all quantitative except for the first one:
Dx
Diagnostic group, categorical with levelsSchizophrenia
Schizoaffective
Control
Speed
Speed of processing scoreAttention
Attention/Vigilance scoreMemory
Working memory scoreVerbal
Verbal Learning scoreVisual
Visual Learning scoreProbSolv
Reasoning/Problem Solving scoreSocialCog
Social Cognition score
The clinical sample comprised 116 male and female patients who had a diagnosis of schizophrenia (\(n = 70)\) or schizoaffective disorder (\(n = 46\)) confirmed by a standard test.
Non-psychiatric control participants (\(n = 146\)) were screened for medical and psychiatric illness and history of substance abuse. Patients were recruited from three outpatient clinics in Hamilton, Ontario, Canada. Control participants were recruited through local newspaper and online classified advertisements for paid research participation.
The data are in http://ritsokiguess.site/datafiles/NeuroCog.csv.
- Read in and display (some of) the data.
Solution
The usual, exactly:
<- "http://ritsokiguess.site/datafiles/NeuroCog.csv"
my_url <- read_csv(my_url) NeuroCog
Rows: 242 Columns: 11
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Dx, Sex
dbl (9): id, Speed, Attention, Memory, Verbal, Visual, ProbSolv, SocialCog, Age
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
NeuroCog
\(\blacksquare\)
- Why is this dataset suitable for a multivariate ANOVA analysis? Explain briefly.
Solution
All the quantitative test scores are response variables (they are all outcomes of the study). There is only one explanatory variable, Dx
, with three levels. With more than one response variable and one or more categorical explanatory variables, a MANOVA is a sensible way to proceed.
\(\blacksquare\)
- Create a suitable response variable for a MANOVA. Show the first few rows of your response variable. NOTE: if you display it all, all 242 rows will be displayed. Be careful.
Solution
There are two ways to go here. The way you’ll probably think of is the “base R” way:
<- with(NeuroCog, cbind(Speed, Attention, Memory, Verbal, Visual, ProbSolv, SocialCog))
y head(y)
Speed Attention Memory Verbal Visual ProbSolv SocialCog
[1,] 19 9 19 33 24 39 28
[2,] 8 25 15 28 24 40 37
[3,] 14 23 15 20 13 32 24
[4,] 7 18 14 34 16 31 36
[5,] 21 9 35 28 29 45 28
[6,] 31 10 26 29 21 33 28
The problem with this way is that you have to name all seven of the response variables. The other way you can go is the “tidyverse” way, where you use tidyverse
tools to select the columns you want without naming them, and then you turn it into a matrix
at the end:
%>%
NeuroCog select(Speed:SocialCog) %>%
as.matrix() -> response
head(response)
Speed Attention Memory Verbal Visual ProbSolv SocialCog
[1,] 19 9 19 33 24 39 28
[2,] 8 25 15 28 24 40 37
[3,] 14 23 15 20 13 32 24
[4,] 7 18 14 34 16 31 36
[5,] 21 9 35 28 29 45 28
[6,] 31 10 26 29 21 33 28
The reason for the head
is that what I called y
and response
are both R matrix
objects, not dataframes. If you display a matrix
, it shows you the whole thing, no matter how many rows it has, so you need to use head
, or something like this:
%>% as_tibble() %>% slice(1:10) response
to display only enough of it to verify that you have the right thing. (The last one turns response
back into a dataframe to use slice
.)
\(\blacksquare\)
- Run a suitable MANOVA using the
manova
command, displaying the output.
Solution
This is straightforward, because it is just like aov
except with the multi-column response that you made:
.1 <- manova(response ~ Dx, data = NeuroCog)
neurosummary(neuro.1)
Df Pillai approx F num Df den Df Pr(>F)
Dx 2 0.34795 7.0406 14 468 2.956e-13 ***
Residuals 239
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
\(\blacksquare\)
- Run a suitable MANOVA using
Manova
from thecar
package, displaying the results.
Solution
Load car
first, and then go through the two-step process to get the analysis:
.2 <- lm(response ~ Dx, data = NeuroCog)
neuro.3 <- Manova(neuro.2)
neurosummary(neuro.3)
Type II MANOVA Tests:
Sum of squares and products for error:
Speed Attention Memory Verbal Visual ProbSolv SocialCog
Speed 26951.463 18528.553 18905.32 11987.542 13615.554 11973.889 6644.373
Attention 18528.553 38318.045 21237.57 14631.133 13714.366 9205.879 13537.946
Memory 18905.318 21237.570 32337.99 14250.909 16211.156 13144.151 11766.618
Verbal 11987.542 14631.133 14250.91 20416.547 12810.992 7260.057 7000.052
Visual 13615.554 13714.366 16211.16 12810.992 26566.679 11753.637 6761.313
ProbSolv 11973.889 9205.879 13144.15 7260.057 11753.637 19861.366 5372.382
SocialCog 6644.373 13537.946 11766.62 7000.052 6761.313 5372.382 33805.811
------------------------------------------
Term: Dx
Sum of squares and products for the hypothesis:
Speed Attention Memory Verbal Visual ProbSolv SocialCog
Speed 8360.293 6813.042 5600.521 6238.566 5555.768 5865.127 6130.585
Attention 6813.042 5579.079 4581.959 5105.181 4528.576 4742.708 5141.087
Memory 5600.521 4581.959 3763.704 4193.298 3722.463 3904.419 4203.457
Verbal 6238.566 5105.181 4193.298 4671.981 4146.595 4347.563 4688.898
Visual 5555.768 4528.576 3722.463 4146.595 3692.081 3896.223 4079.538
ProbSolv 5865.127 4742.708 3904.419 4347.563 3896.223 4165.345 4101.841
SocialCog 6130.585 5141.087 4203.457 4688.898 4079.538 4101.841 5277.131
Multivariate Tests: Dx
Df test stat approx F num Df den Df Pr(>F)
Pillai 2 0.3479501 7.040631 14 468 2.9559e-13 ***
Wilks 2 0.6610440 7.653800 14 466 1.3276e-14 ***
Hotelling-Lawley 2 0.4991527 8.271673 14 464 5.9508e-16 ***
Roy 2 0.4702174 15.718695 7 234 < 2.22e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
\(\blacksquare\)
- What are you able to conclude from your analyses? (The conclusion should be the same for both of them.)
Solution
The null hypothesis says that all the test scores have the same mean on all the diagnoses. This is clearly rejected, so the null is false in some way: there are test scores with different means on some of the diagnoses (or, it is not true that all the test scores are the same for all the diagnoses). If you use the null hypothesis, make sure it is clear enough which part of it is your null hypothesis and which part is your conclusion, since they are not the same.
Your conclusion will necessarily be vague, because we have no idea which diagnoses are different on which tests. To assess that properly, we need to do a discriminant analysis, which is coming up later. We will do an informal analysis with a graph.
\(\blacksquare\)
- Carry out Box’s M test. What do you conclude from it?
Solution
Load (and install if you have to) MVTests
, and then:
library(MVTests)
Attaching package: 'MVTests'
The following object is masked from 'package:datasets':
iris
summary(BoxM(response, NeuroCog$Dx))
Box's M Test
Chi-Squared Value = 77.84483 , df = 56 and p-value: 0.0284
This is significant at α=0.05, but recall that we do not reject equal spreads (across groups for each response variable) until this P-value goes below 0.001, so this says that the spreads are not unequal enough to be a problem.
\(\blacksquare\)
- Make boxplots of all seven response variables against diagnosis (
Dx
), on oneggplot
. The best answer will display the graphs so that they are easy to read. Hint: the idea is the same as plotting residuals against all of the explanatory variables in a regression.
Solution
First, we need the data “longer”, with all the test scores in one column and a second column labelling which test they were:
%>%
NeuroCog pivot_longer(Speed:SocialCog, names_to = "test_name", values_to = "test_score")
Having done that, we can now draw facetted boxplots, one for each explanatory variable:
%>%
NeuroCog pivot_longer(Speed:SocialCog, names_to = "test_name", values_to = "test_score") %>%
ggplot(aes(x = Dx, y = test_score)) + geom_boxplot() +
facet_wrap(~test_name)
The scores are all on the same scale (0-100, presumably), so you can use the same scale for all the facets, or (equally good) use scales = "free"
to give each facet its own scale, since you may not know until you draw the graph whether the tests really are all on the same scale.
Three points (when this was on an assignment) for getting this far. The last point was for noticing that the diagnosis names are long and overwrite each other on the \(x\)-axes. The standard way to work around this is to put them on the \(y\)-axis, which means flipping the \(x\) and \(y\) axes around using coord_flip
and making the boxes go sideways:
%>%
NeuroCog pivot_longer(Speed:SocialCog, names_to = "test_name", values_to = "test_score") %>%
ggplot(aes(x = Dx, y = test_score)) + geom_boxplot() +
facet_wrap(~test_name) + coord_flip()
The long diagnosis names take up a lot of plot real estate, but you can compare the boxes well enough (next part), so this is all right.
Another approach is to display the facets in two columns rather than three, which might give enough room for the diagnosis labels to be seen:
%>%
NeuroCog pivot_longer(Speed:SocialCog, names_to = "test_name", values_to = "test_score") %>%
ggplot(aes(x = Dx, y = test_score)) + geom_boxplot() +
facet_wrap(~test_name, ncol = 2)
Extra: you could even combine the coord-flipping and the two columns:
%>%
NeuroCog pivot_longer(Speed:SocialCog, names_to = "test_name", values_to = "test_score") %>%
ggplot(aes(x = Dx, y = test_score)) + geom_boxplot() +
facet_wrap(~test_name, scales = "free", ncol = 2) + coord_flip()
which gives more room to see the boxplots.
\(\blacksquare\)
- Looking at your boxplots, why do you think your MANOVA came out significant, and what do your boxplots tell you about the relative test scores for patients with diagnoses of schizophrenia or schizoaffective?
Solution
The significant MANOVA says that somewhere, there must be at least one response variable where the scores on average differ among the three groups. What seems to be happening here is that the scores are consistently higher for the control group than the other two groups, across all (or almost all, depending on how you see it) the response variables.
The researchers were interested in comparing the schizophrenia and schizoaffective patients, and as I see it, the scores on those are very similar across all the response variables, given the amount of variability there is (quite a lot). My guess is that there is actually no significant difference anywhere between these two groups.
Extra: what these boxplots don’t show us is whether there are any multivariate differences between these two groups (as we saw with the seed yield and seed weight in lecture). Maybe you have a good way to make a seven-dimensional plot, but I don’t think I do!
\(\blacksquare\)
Footnotes
If you have done ANOVAs by hand, you would have looked up 3 and 15 df in your tables to get the P-value for your \(F\)-test, with a result like “the P-value is between 0.01 and 0.05”.↩︎
An \(F\) with 1 df on the top is the square of a \(t\) with the same df on the bottom, so such tests are equivalent.↩︎
Not strictly true, but it makes things a lot more complicated if you don’t have orthogonal contrasts, and that’s beyond our scope in this course. One of the PASIAS problems has some discussion on this if you want to learn more.↩︎