library(tidyverse)
library(marginaleffects)
Worksheet 6
Packages
Chocolate cake
In an experiment on the preparation of chocolate cakes, three recipes for preparing the batter were compared. Recipes R1 and R2 differed in that the chocolate was added at 40C and 60C, respectively, while recipe R3 contained extra sugar. In addition, six different baking temperatures were tested: these ranged in 10C steps from 175C to 225C. (For those more used to baking in Fahrenheit, this is between about 350F and 450F.) Several measurements were made on the cakes. The one shown here is the breaking angle. One half of a slab of cake is held fixed, while the other half is pivoted about the middle until breakage occurs. The angle through which the moving half has revolved is read on a circular scale. Since breakage is gradual, the reading tends to have a subjective element. (A higher breaking angle is considered better.)
The data are in http://ritsokiguess.site/datafiles/choccake.csv. There is an additional column batch
that you can ignore.
- Read in and display (some of) the data.
One of these days, I’m going to trick you, but today is not one of those days:
<- "http://ritsokiguess.site/datafiles/choccake.csv"
my_url <- read_csv(my_url) choccake
Rows: 270 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): recipe, batch, temp
dbl (1): breakang
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
choccake
Extra: there are three recipes and six temperatures, so there are this many replicate observations per recipe-temperature combination:
270 / (3*6)
[1] 15
In actual fact, there were 15 different batches of cake batter prepared, and each batch was used once for each recipe-temperature combination. You might expect there to be some differences among batches, and that therefore a more sophisticated analysis would account for batch as well. This is what is known as a “split-plot design”. For each recipe, each batch gave enough batter to make six cakes, one of which was baked at each temperature. In the jargon of split-plot designs, recipe is the “whole-plot” factor, and temperature is nested within batch.
But that’s by the way, since we ignore batch. The examples at the bottom of this page treat the recipe-batch1 combination as a “random effect” using a mixed model. We learn about those when we study repeated measures.
- Obtain a suitable boxplot of these data.
This is a grouped boxplot, with one of the explanatory variables as x
and the other one as fill
. I like the one that goes on the \(x\)-axis to be the one with more levels, which is here temp
(best):
ggplot(choccake, aes(x = temp, y = breakang, fill = recipe)) + geom_boxplot()
Each boxplot here is based on 15 observations, which is a decent number to assess shape. Some of the distributions appear to be skewed (mainly to the right), and there are a lot of outliers, almost all at the upper end.
Extra: If you do x
and colour
the other way around, you’ll have three sets of six boxplots, each one a different colour:
ggplot(choccake, aes(fill = temp, y = breakang, x = recipe)) + geom_boxplot()
Then you have to distinguish six colours, and see whether the pattern in each of the three groups of boxplots is similar. This seems to me like harder work, although things are consistent enough here that you can still do it: the three sets of six boxplots are all on a level (no recipe effect), and within each set, they all head uphill with temperature in about the same kind of way.
- Regardless of any misgivings you may have about your boxplot, fit an (otherwise) appropriate analysis of variance model, and display the results.
We don’t have any better models to fit in two-way ANOVA, so away we go:
.1 <- aov(breakang ~ recipe*temp, data = choccake)
choccakesummary(choccake.1)
Df Sum Sq Mean Sq F value Pr(>F)
recipe 2 135 67.5 1.084 0.340
temp 5 2100 420.1 6.742 6.44e-06 ***
recipe:temp 10 206 20.6 0.331 0.972
Residuals 252 15702 62.3
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Extra: with the outliers mostly at the top, this is the kind of situation where a transformation might be useful, with a transformation like square root or log being the kind of thing to expect. I would probably go with square root here, since the skewness doesn’t seem to be extreme, but you can also fire up Box-Cox to see whether that agrees:
library(MASS, exclude = "select")
boxcox(breakang ~ recipe*temp, data = choccake)
Well, I was wrong; the right transformations include reciprocal (power \(-1\)) and one-over-square-root (power \(-0.5\)). Let’s try the latter, first for the boxplot and second for the ANOVA:
ggplot(choccake, aes(x = temp, y = breakang^(-0.5), fill = recipe)) + geom_boxplot()
The distributions do look more symmetric now. There are still some outliers, but maybe with 15 observations per group we can get away with that. Then:
.1a <- aov(breakang^(-0.5) ~ recipe*temp, data = choccake)
choccakesummary(choccake.1a)
Df Sum Sq Mean Sq F value Pr(>F)
recipe 2 0.00174 0.000871 2.081 0.127
temp 5 0.01667 0.003335 7.968 5.44e-07 ***
recipe:temp 10 0.00138 0.000138 0.329 0.973
Residuals 252 0.10547 0.000419
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The P-values are not changed that much, so the transformation is not making as much of a difference as you might expect.
In any case, the point of these questions is to give you some practice with a two-way ANOVA, and I didn’t want to throw transformations into the mix.
- What do you conclude from the \(F\)-test in the above analysis, in the context of the data?
There is no significant interaction between recipe and temperature. That is, the breakage angle of the cake does not depend on the combination of recipe and temperature.
This, you’ll recall, is as far as you go. The next stage would be to remove the interaction term, re-fit and see about effects of recipe and/or temperature.
- Can you improve your model of the previous part? If so, do so; if not, explain briefly why not.
Remove the non-significant interaction:
.2 <- update(choccake.1, .~. - recipe:temp)
choccakesummary(choccake.2)
Df Sum Sq Mean Sq F value Pr(>F)
recipe 2 135 67.5 1.112 0.33
temp 5 2100 420.1 6.918 4.37e-06 ***
Residuals 262 15908 60.7
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
You could reasonably also remove the non-significant recipe
now that you have removed the non-significant interaction. Up to you.
Extra: The interaction was so far from being significant that the P-values for recipe and temperature have barely changed.
- Obtain an analysis of the simple effects of recipe at temperature
175C
.
This means to only look at this one temperature, and test for an effect of recipe on breaking angle:
%>% filter(temp == "175C") -> temp175
choccake .3 <- aov(breakang ~ recipe, data = temp175)
choccakesummary(choccake.3)
Df Sum Sq Mean Sq F value Pr(>F)
recipe 2 38.6 19.29 0.377 0.688
Residuals 42 2146.4 51.10
This is all you need to do: there is no significant effect of recipe at this temperature, so there are no differences among recipes to find (and therefore there is no need to run Tukey).
- Obtain an analysis of the simple effects of recipe at temperature
225C
.
I literally copied, pasted, and edited the question, so you can literally copy, paste, and edit your code:
%>% filter(temp == "225C") -> temp225
choccake .4 <- aov(breakang ~ recipe, data = temp225)
choccakesummary(choccake.4)
Df Sum Sq Mean Sq F value Pr(>F)
recipe 2 4 1.76 0.021 0.98
Residuals 42 3583 85.30
Once again, this is all you need.
- Compare the results of the previous two parts. Considering the results of your previous work, explain briefly why this not a surprise.
Neither of the two simple effects analyses shows any significant differences among recipes, at either of the two temperatures. This is not surprising, because the interaction term was not significant earlier, and the analysis you did without it shows that there is an effect of temperature (that applies to all recipes) and no effect of recipe (that applies to all temperatures). Hence, we would expect a simple effects analysis to consistently show no difference among the recipes whichever temperature we are looking at, which is exactly what happened.
Extra: This is a very boring simple effects analysis. You could do all the temperatures, if you are in the mood for unparalleled tedium:
library(broom) # for tidy, below
%>%
choccake nest_by(temp) %>%
rowwise() %>%
mutate(my_aov = list(aov(breakang ~ recipe, data = data))) %>%
mutate(p_val = tidy(my_aov)$p.value[1])
There is, as the non-significant interactions indicate, no difference between the recipes at any of the temperatures.2
The time when it gets interesting is when the interaction is significant. If that had been the case here, we might have found a difference among recipes for some temperatures but not others, or maybe that one of the recipes is best at certain temperatures but a different recipe is best at others.
Let’s revisit our graph from earlier:
ggplot(choccake, aes(x = temp, y = breakang, fill = recipe)) + geom_boxplot()
The three coloured boxplots within each temperature are pretty much side by side (to within the amount of variability there is), but as you read across from left to right, the trios of boxplots are going gradually uphill: that is to say, as temperature increases, the breakage angle tends to increase as well. My guess is that there is some optimal temperature at which the breakage angle is largest (which might be 215 or 225 degrees), and if the experiment had included some higher temperatures, we would have seen the average breakage angle going down again.
Drug and electroshock and solving simple tasks
64 subjects were given some simple tasks to do, and the number of tasks completed in 10 minutes was recorded. This is shown in the column response
in our data. Before attempting the tasks, each subject was randomly assigned to either receive or not receive a drug (in xdrug
), and also to receive or not receive electroshock therapy (in xshock
). The researchers were interested in whether either the drug or the electroshock or the combination of both affected the number of tasks the subject could solve. (The subjects were all fully informed of the risks, and agreed to sign consent forms.) The data are in http://ritsokiguess.site/datafiles/drugshock.csv.
- Read in and display (some of) the data.
As usual. My dataframe is called task
(call yours whatever you like):
<- "http://ritsokiguess.site/datafiles/drugshock.csv"
my_url <- read_csv(my_url) task
Rows: 64 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): xdrug, xshock
dbl (1): response
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
task
Note that the categories of the two categorical variables were not just yes
and no
, but variations on yes
and no
that you can tell apart.
Extra: nothing challenging for you, but I had some work to do to get the data into this form for you (mostly done while waiting at Service Ontario to get my health card renewed). The data came from here:
data("Shkdrug", package = "BSDA")
Shkdrug
I think this was originally intended for a one-way analysis treating the four treatment combinations as levels of a single factor, but it looks like a two-factor analysis to me. Let’s see what those treatments are:
%>% count(treatment) Shkdrug
It looks like Drug
and NoDrug
, but sometimes NoDrug
is NoDg
. It also looks like Shock
and NoShock
, except that it isn’t that either. So we have some tidying up to do. First, split up the treatment
column into two columns, one for the drug and one for electroshock, which is done with separate_wider_delim
:
%>% separate_wider_delim(treatment,
Shkdrug names = c("xdrug", "xshock"),
delim = "/") -> task
task
I called my new dataframe task
, and for some reason called the new columns xdrug
and xshock
. Possibly I was distracted by waiting for my number to come up. Anyway. The next thing is to sort out those category levels. There are several possibilities for both xdrug
and xshock
, and we need to handle them all, so case_when
is better than ifelse
:
%>%
task mutate(xdrug = case_when(xdrug == "Drug" ~ "drug_yes",
== "NoDg" ~ "drug_no",
xdrug == "NoDrug" ~ "drug_no"
xdrug %>%
)) mutate(xshock = case_when(xshock == "NoS" ~ "shock_no",
== "Shk" ~ "shock_yes",
xshock == "S" ~ "shock_yes")) -> task
xshock task
case_when
always looks better if you take five seconds to line things up. If you don’t, it makes your reader wonder what else you are sloppy about.
I gave the new categories names starting with drug_
and shock_
rather than just yes
and no
, so as to be less confusing for you later. Let’s check that we have something sensible now:
%>% count(xdrug, xshock) task
There were supposed to be 16 subjects in each subcategory, so it looks good. This is what I saved for you.
- Draw a suitable graph of all three variables. Here, and in the next question, put drug on the \(x\) axis.
With one quantitative and two categorical variables, a grouped boxplot is the standard one:
ggplot(task, aes(x = xdrug, y = response, fill = xshock)) + geom_boxplot()
The other categorical variable, xshock
, goes as fill
and is distinguished by colour. Using colour
instead of fill
will outline the boxes with the appropriate colour, and is equally good.
You might already be suspecting an interaction, since there appears to be an effect of electroshock for the subjects that did not get the drug, but not for the subjects that did get the drug. Seeing the effect of one factor depending on the level of the other is a hallmark of an interaction.
I asked you to put xdrug
on the \(x\)-axis just to give the grader something more uniform to assess. In actual fact, you could just as well have xshock
on the \(x\)-axis and xdrug
as fill
.
- Draw an interaction plot for these data.
Again, put xdrug
on the \(x\)-axis. For this graph, you need the mean value of response
for each combination of xdrug
and xshock
, which is a group-by and summarize. Then plot the means as points joined by lines for each value of xshock
, which is done by setting xshock
as both colour
and group
:
%>% group_by(xdrug, xshock) %>%
task summarize(mean_res = mean(response)) %>%
ggplot(aes(x = xdrug, y = mean_res, colour = xshock, group = xshock)) +
geom_point() + geom_line()
`summarise()` has grouped output by 'xdrug'. You can override using the
`.groups` argument.
One point for finding the means, two for making the graph from them (the latter is the main thing here), when this was graded. Other approaches:
- you can save the result of the mean calculation in a dataframe,
d
say, and then use that dataframe as the first input toggplot
- I didn’t talk about
stat_summary
in lecture, but it’s in the slides, and if you can make it work to get this graph, I’m good with that.
- Based on your two graphs, do you expect to see an interaction, and do you expect it to be significant? Explain briefly.
The two lines on the interaction plot do not seem at all parallel, so we should expect to see an interaction. To decide whether it is going to be significant, remember that the interaction plot does not show variability, and to get a sense of how much variability there is, we need to go back to the boxplot. The two boxes on the left look definitely different and the two on the right definitely do not, so we expect to see significance somewhere. (To be more precise, an interaction means that the effect of xshock
depends on the level of xdrug
, and what we just said means that it definitely does appear to.)
- Run a suitable ANOVA and display the results. Were your suspicions confirmed?
Include an interaction as the first step of any two-way ANOVA, and see if it is significant:
.1 <- aov(response ~ xdrug * xshock, data = task)
tasksummary(task.1)
Df Sum Sq Mean Sq F value Pr(>F)
xdrug 1 1.00 1.00 0.568 0.45398
xshock 1 39.06 39.06 22.189 1.51e-05 ***
xdrug:xshock 1 18.06 18.06 10.260 0.00218 **
Residuals 60 105.62 1.76
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
With a P-value of 0.002, the interaction effect is definitely significant, confirming what we saw from the graphs.
Extra: The strong significance of xshock
is not relevant; it is actually significant because subjects who had the electroshock treatment got fewer of the tasks done on average than those who did not, but by studying the interaction with simple effects (next), we get a more nuanced conclusion than that.
- Why is a simple effects analysis appropriate here? Carry out a simple effects analysis of
xshock
for each level ofxdrug
, stating your conclusions in the context of the data.
A simple effects analysis is appropriate because the interaction is significant, and this will help us to understand why it was significant. One point.
Two points for each of the simple effects analyses, one for each level of xdrug
. In each case, extract the level of xdrug
you care about, run an aov
predicting response
from xshock
, and display the results. You might need to find out the two levels of xdrug
: you can either go back to the data and look, or you can count them:
%>% count(xdrug) task
Then do the two analyses, for example like this:
%>%
task filter(xdrug == "drug_no") %>%
aov(response ~ xshock, data = .) %>%
summary()
Df Sum Sq Mean Sq F value Pr(>F)
xshock 1 55.12 55.12 39.61 6.15e-07 ***
Residuals 30 41.75 1.39
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
For the subjects who did not get the drug, there is a significant effect of xshock
on response
(P-value 0.0000006), and you can check the plots to see what kind of effect it was: if a subject did not get the drug, they did significantly more tasks if they didn’t get electroshock either.
And then the other one:
%>%
task filter(xdrug == "drug_yes") %>%
aov(response ~ xshock, data = .) %>%
summary()
Df Sum Sq Mean Sq F value Pr(>F)
xshock 1 2.00 2.000 0.939 0.34
Residuals 30 63.88 2.129
With a P-value of 0.34, this is not significant, so that if a subject did get the drug, then electroshock (or not) made no difference to the number of tasks they completed.
These conclusions, you note, are entirely what you would have guessed from the graphs, but now you have P-values to support them with. Two points for each filter
, aov
, summary
, and conclusion in words in the context of the data.
I mentioned the effect of xshock
in our first aov
, the one with the interaction in it. Now you see that inferring an overall effect of xshock
would have been a mistake: yes, subjects who received the electroshock treatment completed fewer tasks on average overall, but the reason for that was that the subjects who received electroshock completed a lot fewer tasks if they did not get the drug, and it took the simple effects analysis to enable us to see that. Hence, trying to interpret a main effect when the interaction is significant is a mistake, and doing so will likely mislead you.
Comments: I did these in one pipeline each, because that seemed to be simplest, and I knew I was only doing one thing each time (running aov
and displaying the results). There is no reason to do Tukey here (see next part); if there was, it would be better to save the result of the aov
each time, look at the summary
of it, and then if significant look at the TukeyHSD
of it. There are a couple of technical details in the way I did it, both concerning the fact that in a pipeline there is “the thing that came out of the previous step” that is taken as the first input of the next step. In aov
, what came out of the filter
is going to be the dataframe used in data
, which is not (by default) first, so I used the special .
notation to indicate “here is where the result of the previous step goes”. In the next line, the output of aov
is the input to summary
, which is how a pipeline works, so I don’t have to say what it is called (it doesn’t have a name beyond .
anyway).
For you, feel free to save the result of the filter
and/or the result of the aov
and use them as inputs to the next thing. As long as you get the same two ANOVA tables as I did by what looks like sensible code, I am happy.
Extra 1: As far as I can tell, there is no particular reason for this dataset to condition on xdrug
and look for an effect of xshock
. It would be perfectly good to do it the other way around. In the auto noise example in lecture, we cared more about the effect of filter type and the engine size was like a “block” (in experimental design terms, a factor that makes a difference but that we don’t really care about; putting it in the analysis will reduce the error sum of squares and thus give us better tests for the things we really do care about). In that context, it makes more sense to condition on the block and say “in this block, this happened”. Here, though, you can imagine that the researchers were interested in both factors, and so there was no special reason to do it this way around. I asked you to do it like this to be consistent with your graphs (which I was also more specific with than I might have been), so that you could find out why your significant simple effect was significant by looking straight at your graphs. I invite you to switch everything around: make xshock
the \(x\) on your graphs, and condition on xshock
for your simple effects, and see what difference it makes to your conclusions.
Extra 2: I did some digging and found out that the model formula input of aov
is called formula
. It seems that if you specify that by name, it will take data =
from the output of the previous line, and so you no longer need the data = .
:
%>%
task filter(xdrug == "drug_no") %>%
aov(formula = response ~ xshock) %>%
summary()
Df Sum Sq Mean Sq F value Pr(>F)
xshock 1 55.12 55.12 39.61 6.15e-07 ***
Residuals 30 41.75 1.39
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The logic, I think, is that the first two inputs to aov
are formula
and data
in that order; since we specified formula
by name, the first remaining input to aov
is now data
, so that’s where the output from the filter
goes.
- Why is there no value in running Tukey as a followup to any of your simple effects? Explain briefly.
Because the xshock
factor only has two levels, so if you find a significant difference due to xshock
, you already know that its two levels are different in terms of response
without any further analysis being needed.
I suppose the bit up to the first comma is all you need, but it is better to say something like the rest of it as well, to show that you understand.
Extra: if you do it anyway (as you can find out for yourself easily enough if you are not sure), this is what you get:
%>%
task filter(xdrug == "drug_no") %>%
aov(response ~ xshock, data = .) %>%
TukeyHSD()
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = response ~ xshock, data = .)
$xshock
diff lwr upr p adj
shock_yes-shock_no -2.625 -3.476797 -1.773203 6e-07
you get exactly the same P-value as you got before, so you have really gained nothing. You get a confidence interval for the difference in means, sure, but even that tells you not much more than your graph does (for subjects that did not receive the drug, the number of tasks completed was less for the subjects that received electroshock).