Worksheet 6

Published

February 9, 2025

Packages

library(tidyverse)
library(marginaleffects)

Chocolate cake

In an experiment on the preparation of chocolate cakes, three recipes for preparing the batter were compared. Recipes R1 and R2 differed in that the chocolate was added at 40C and 60C, respectively, while recipe R3 contained extra sugar. In addition, six different baking temperatures were tested: these ranged in 10C steps from 175C to 225C. (For those more used to baking in Fahrenheit, this is between about 350F and 450F.) Several measurements were made on the cakes. The one shown here is the breaking angle. One half of a slab of cake is held fixed, while the other half is pivoted about the middle until breakage occurs. The angle through which the moving half has revolved is read on a circular scale. Since breakage is gradual, the reading tends to have a subjective element. (A higher breaking angle is considered better.)

The data are in http://ritsokiguess.site/datafiles/choccake.csv. There is an additional column batch that you can ignore.

  1. Read in and display (some of) the data.

One of these days, I’m going to trick you, but today is not one of those days:

my_url <- "http://ritsokiguess.site/datafiles/choccake.csv"
choccake <- read_csv(my_url)
Rows: 270 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): recipe, batch, temp
dbl (1): breakang

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
choccake

Extra: there are three recipes and six temperatures, so there are this many replicate observations per recipe-temperature combination:

270 / (3*6)
[1] 15

In actual fact, there were 15 different batches of cake batter prepared, and each batch was used once for each recipe-temperature combination. You might expect there to be some differences among batches, and that therefore a more sophisticated analysis would account for batch as well. This is what is known as a “split-plot design”. For each recipe, each batch gave enough batter to make six cakes, one of which was baked at each temperature. In the jargon of split-plot designs, recipe is the “whole-plot” factor, and temperature is nested within batch.

But that’s by the way, since we ignore batch. The examples at the bottom of this page treat the recipe-batch1 combination as a “random effect” using a mixed model. We learn about those when we study repeated measures.

  1. Obtain a suitable boxplot of these data.

This is a grouped boxplot, with one of the explanatory variables as x and the other one as fill. I like the one that goes on the \(x\)-axis to be the one with more levels, which is here temp (best):

ggplot(choccake, aes(x = temp, y = breakang, fill = recipe)) + geom_boxplot()

Each boxplot here is based on 15 observations, which is a decent number to assess shape. Some of the distributions appear to be skewed (mainly to the right), and there are a lot of outliers, almost all at the upper end.

Extra: If you do x and colour the other way around, you’ll have three sets of six boxplots, each one a different colour:

ggplot(choccake, aes(fill = temp, y = breakang, x = recipe)) + geom_boxplot()

Then you have to distinguish six colours, and see whether the pattern in each of the three groups of boxplots is similar. This seems to me like harder work, although things are consistent enough here that you can still do it: the three sets of six boxplots are all on a level (no recipe effect), and within each set, they all head uphill with temperature in about the same kind of way.

  1. Regardless of any misgivings you may have about your boxplot, fit an (otherwise) appropriate analysis of variance model, and display the results.

We don’t have any better models to fit in two-way ANOVA, so away we go:

choccake.1 <- aov(breakang ~ recipe*temp, data = choccake)
summary(choccake.1)
             Df Sum Sq Mean Sq F value   Pr(>F)    
recipe        2    135    67.5   1.084    0.340    
temp          5   2100   420.1   6.742 6.44e-06 ***
recipe:temp  10    206    20.6   0.331    0.972    
Residuals   252  15702    62.3                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Extra: with the outliers mostly at the top, this is the kind of situation where a transformation might be useful, with a transformation like square root or log being the kind of thing to expect. I would probably go with square root here, since the skewness doesn’t seem to be extreme, but you can also fire up Box-Cox to see whether that agrees:

library(MASS, exclude = "select")
boxcox(breakang ~ recipe*temp, data = choccake)

Well, I was wrong; the right transformations include reciprocal (power \(-1\)) and one-over-square-root (power \(-0.5\)). Let’s try the latter, first for the boxplot and second for the ANOVA:

ggplot(choccake, aes(x = temp, y = breakang^(-0.5), fill = recipe)) + geom_boxplot()

The distributions do look more symmetric now. There are still some outliers, but maybe with 15 observations per group we can get away with that. Then:

choccake.1a <- aov(breakang^(-0.5) ~ recipe*temp, data = choccake)
summary(choccake.1a)
             Df  Sum Sq  Mean Sq F value   Pr(>F)    
recipe        2 0.00174 0.000871   2.081    0.127    
temp          5 0.01667 0.003335   7.968 5.44e-07 ***
recipe:temp  10 0.00138 0.000138   0.329    0.973    
Residuals   252 0.10547 0.000419                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The P-values are not changed that much, so the transformation is not making as much of a difference as you might expect.

In any case, the point of these questions is to give you some practice with a two-way ANOVA, and I didn’t want to throw transformations into the mix.

  1. What do you conclude from the \(F\)-test in the above analysis, in the context of the data?

There is no significant interaction between recipe and temperature. That is, the breakage angle of the cake does not depend on the combination of recipe and temperature.

This, you’ll recall, is as far as you go. The next stage would be to remove the interaction term, re-fit and see about effects of recipe and/or temperature.

  1. Can you improve your model of the previous part? If so, do so; if not, explain briefly why not.

Remove the non-significant interaction:

choccake.2 <- update(choccake.1, .~. - recipe:temp)
summary(choccake.2)
             Df Sum Sq Mean Sq F value   Pr(>F)    
recipe        2    135    67.5   1.112     0.33    
temp          5   2100   420.1   6.918 4.37e-06 ***
Residuals   262  15908    60.7                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

You could reasonably also remove the non-significant recipe now that you have removed the non-significant interaction. Up to you.

Extra: The interaction was so far from being significant that the P-values for recipe and temperature have barely changed.

  1. Obtain an analysis of the simple effects of recipe at temperature 175C.

This means to only look at this one temperature, and test for an effect of recipe on breaking angle:

choccake %>% filter(temp == "175C") -> temp175
choccake.3 <- aov(breakang ~ recipe, data = temp175)
summary(choccake.3)
            Df Sum Sq Mean Sq F value Pr(>F)
recipe       2   38.6   19.29   0.377  0.688
Residuals   42 2146.4   51.10               

This is all you need to do: there is no significant effect of recipe at this temperature, so there are no differences among recipes to find (and therefore there is no need to run Tukey).

  1. Obtain an analysis of the simple effects of recipe at temperature 225C.

I literally copied, pasted, and edited the question, so you can literally copy, paste, and edit your code:

choccake %>% filter(temp == "225C") -> temp225
choccake.4 <- aov(breakang ~ recipe, data = temp225)
summary(choccake.4)
            Df Sum Sq Mean Sq F value Pr(>F)
recipe       2      4    1.76   0.021   0.98
Residuals   42   3583   85.30               

Once again, this is all you need.

  1. Compare the results of the previous two parts. Considering the results of your previous work, explain briefly why this not a surprise.

Neither of the two simple effects analyses shows any significant differences among recipes, at either of the two temperatures. This is not surprising, because the interaction term was not significant earlier, and the analysis you did without it shows that there is an effect of temperature (that applies to all recipes) and no effect of recipe (that applies to all temperatures). Hence, we would expect a simple effects analysis to consistently show no difference among the recipes whichever temperature we are looking at, which is exactly what happened.

Extra: This is a very boring simple effects analysis. You could do all the temperatures, if you are in the mood for unparalleled tedium:

library(broom) # for tidy, below
choccake %>% 
  nest_by(temp) %>% 
  rowwise() %>% 
  mutate(my_aov = list(aov(breakang ~ recipe, data =  data))) %>%
  mutate(p_val = tidy(my_aov)$p.value[1])

There is, as the non-significant interactions indicate, no difference between the recipes at any of the temperatures.2

The time when it gets interesting is when the interaction is significant. If that had been the case here, we might have found a difference among recipes for some temperatures but not others, or maybe that one of the recipes is best at certain temperatures but a different recipe is best at others.

Let’s revisit our graph from earlier:

ggplot(choccake, aes(x = temp, y = breakang, fill = recipe)) + geom_boxplot()

The three coloured boxplots within each temperature are pretty much side by side (to within the amount of variability there is), but as you read across from left to right, the trios of boxplots are going gradually uphill: that is to say, as temperature increases, the breakage angle tends to increase as well. My guess is that there is some optimal temperature at which the breakage angle is largest (which might be 215 or 225 degrees), and if the experiment had included some higher temperatures, we would have seen the average breakage angle going down again.

Drug and electroshock and solving simple tasks

64 subjects were given some simple tasks to do, and the number of tasks completed in 10 minutes was recorded. This is shown in the column response in our data. Before attempting the tasks, each subject was randomly assigned to either receive or not receive a drug (in xdrug), and also to receive or not receive electroshock therapy (in xshock). The researchers were interested in whether either the drug or the electroshock or the combination of both affected the number of tasks the subject could solve. (The subjects were all fully informed of the risks, and agreed to sign consent forms.) The data are in http://ritsokiguess.site/datafiles/drugshock.csv.

  1. Read in and display (some of) the data.

As usual. My dataframe is called task (call yours whatever you like):

my_url <- "http://ritsokiguess.site/datafiles/drugshock.csv"
task <- read_csv(my_url)
Rows: 64 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): xdrug, xshock
dbl (1): response

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
task

Note that the categories of the two categorical variables were not just yes and no, but variations on yes and no that you can tell apart.

Extra: nothing challenging for you, but I had some work to do to get the data into this form for you (mostly done while waiting at Service Ontario to get my health card renewed). The data came from here:

data("Shkdrug", package = "BSDA")
Shkdrug

I think this was originally intended for a one-way analysis treating the four treatment combinations as levels of a single factor, but it looks like a two-factor analysis to me. Let’s see what those treatments are:

Shkdrug %>% count(treatment)

It looks like Drug and NoDrug, but sometimes NoDrug is NoDg. It also looks like Shock and NoShock, except that it isn’t that either. So we have some tidying up to do. First, split up the treatment column into two columns, one for the drug and one for electroshock, which is done with separate_wider_delim:

Shkdrug %>% separate_wider_delim(treatment,
                                 names = c("xdrug", "xshock"),
                                 delim = "/") -> task
task

I called my new dataframe task, and for some reason called the new columns xdrug and xshock. Possibly I was distracted by waiting for my number to come up. Anyway. The next thing is to sort out those category levels. There are several possibilities for both xdrug and xshock, and we need to handle them all, so case_when is better than ifelse:

task %>% 
  mutate(xdrug = case_when(xdrug == "Drug"   ~ "drug_yes",
                           xdrug == "NoDg"   ~ "drug_no",
                           xdrug == "NoDrug" ~ "drug_no"
                           )) %>% 
  mutate(xshock = case_when(xshock == "NoS" ~ "shock_no",
                            xshock == "Shk" ~ "shock_yes",
                            xshock == "S"   ~ "shock_yes")) -> task
task

case_when always looks better if you take five seconds to line things up. If you don’t, it makes your reader wonder what else you are sloppy about.

I gave the new categories names starting with drug_ and shock_ rather than just yes and no, so as to be less confusing for you later. Let’s check that we have something sensible now:

task %>% count(xdrug, xshock)

There were supposed to be 16 subjects in each subcategory, so it looks good. This is what I saved for you.

  1. Draw a suitable graph of all three variables. Here, and in the next question, put drug on the \(x\) axis.

With one quantitative and two categorical variables, a grouped boxplot is the standard one:

ggplot(task, aes(x = xdrug, y = response, fill = xshock)) + geom_boxplot()

The other categorical variable, xshock, goes as fill and is distinguished by colour. Using colour instead of fill will outline the boxes with the appropriate colour, and is equally good.

You might already be suspecting an interaction, since there appears to be an effect of electroshock for the subjects that did not get the drug, but not for the subjects that did get the drug. Seeing the effect of one factor depending on the level of the other is a hallmark of an interaction.

I asked you to put xdrug on the \(x\)-axis just to give the grader something more uniform to assess. In actual fact, you could just as well have xshock on the \(x\)-axis and xdrug as fill.

  1. Draw an interaction plot for these data.

Again, put xdrug on the \(x\)-axis. For this graph, you need the mean value of response for each combination of xdrug and xshock, which is a group-by and summarize. Then plot the means as points joined by lines for each value of xshock, which is done by setting xshock as both colour and group:

task %>% group_by(xdrug, xshock) %>% 
  summarize(mean_res = mean(response)) %>% 
  ggplot(aes(x = xdrug, y = mean_res, colour = xshock, group = xshock)) + 
    geom_point() + geom_line()
`summarise()` has grouped output by 'xdrug'. You can override using the
`.groups` argument.

One point for finding the means, two for making the graph from them (the latter is the main thing here), when this was graded. Other approaches:

  • you can save the result of the mean calculation in a dataframe, d say, and then use that dataframe as the first input to ggplot
  • I didn’t talk about stat_summary in lecture, but it’s in the slides, and if you can make it work to get this graph, I’m good with that.
  1. Based on your two graphs, do you expect to see an interaction, and do you expect it to be significant? Explain briefly.

The two lines on the interaction plot do not seem at all parallel, so we should expect to see an interaction. To decide whether it is going to be significant, remember that the interaction plot does not show variability, and to get a sense of how much variability there is, we need to go back to the boxplot. The two boxes on the left look definitely different and the two on the right definitely do not, so we expect to see significance somewhere. (To be more precise, an interaction means that the effect of xshock depends on the level of xdrug, and what we just said means that it definitely does appear to.)

  1. Run a suitable ANOVA and display the results. Were your suspicions confirmed?

Include an interaction as the first step of any two-way ANOVA, and see if it is significant:

task.1 <- aov(response ~ xdrug * xshock, data = task)
summary(task.1)
             Df Sum Sq Mean Sq F value   Pr(>F)    
xdrug         1   1.00    1.00   0.568  0.45398    
xshock        1  39.06   39.06  22.189 1.51e-05 ***
xdrug:xshock  1  18.06   18.06  10.260  0.00218 ** 
Residuals    60 105.62    1.76                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

With a P-value of 0.002, the interaction effect is definitely significant, confirming what we saw from the graphs.

Extra: The strong significance of xshock is not relevant; it is actually significant because subjects who had the electroshock treatment got fewer of the tasks done on average than those who did not, but by studying the interaction with simple effects (next), we get a more nuanced conclusion than that.

  1. Why is a simple effects analysis appropriate here? Carry out a simple effects analysis of xshock for each level of xdrug, stating your conclusions in the context of the data.

A simple effects analysis is appropriate because the interaction is significant, and this will help us to understand why it was significant. One point.

Two points for each of the simple effects analyses, one for each level of xdrug. In each case, extract the level of xdrug you care about, run an aov predicting response from xshock, and display the results. You might need to find out the two levels of xdrug: you can either go back to the data and look, or you can count them:

task %>% count(xdrug)

Then do the two analyses, for example like this:

task %>% 
  filter(xdrug == "drug_no") %>% 
  aov(response ~ xshock, data = .) %>% 
  summary()
            Df Sum Sq Mean Sq F value   Pr(>F)    
xshock       1  55.12   55.12   39.61 6.15e-07 ***
Residuals   30  41.75    1.39                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

For the subjects who did not get the drug, there is a significant effect of xshock on response (P-value 0.0000006), and you can check the plots to see what kind of effect it was: if a subject did not get the drug, they did significantly more tasks if they didn’t get electroshock either.

And then the other one:

task %>% 
  filter(xdrug == "drug_yes") %>% 
  aov(response ~ xshock, data = .) %>% 
  summary()
            Df Sum Sq Mean Sq F value Pr(>F)
xshock       1   2.00   2.000   0.939   0.34
Residuals   30  63.88   2.129               

With a P-value of 0.34, this is not significant, so that if a subject did get the drug, then electroshock (or not) made no difference to the number of tasks they completed.

These conclusions, you note, are entirely what you would have guessed from the graphs, but now you have P-values to support them with. Two points for each filter, aov, summary, and conclusion in words in the context of the data.

I mentioned the effect of xshock in our first aov, the one with the interaction in it. Now you see that inferring an overall effect of xshock would have been a mistake: yes, subjects who received the electroshock treatment completed fewer tasks on average overall, but the reason for that was that the subjects who received electroshock completed a lot fewer tasks if they did not get the drug, and it took the simple effects analysis to enable us to see that. Hence, trying to interpret a main effect when the interaction is significant is a mistake, and doing so will likely mislead you.

Comments: I did these in one pipeline each, because that seemed to be simplest, and I knew I was only doing one thing each time (running aov and displaying the results). There is no reason to do Tukey here (see next part); if there was, it would be better to save the result of the aov each time, look at the summary of it, and then if significant look at the TukeyHSD of it. There are a couple of technical details in the way I did it, both concerning the fact that in a pipeline there is “the thing that came out of the previous step” that is taken as the first input of the next step. In aov, what came out of the filter is going to be the dataframe used in data, which is not (by default) first, so I used the special . notation to indicate “here is where the result of the previous step goes”. In the next line, the output of aov is the input to summary, which is how a pipeline works, so I don’t have to say what it is called (it doesn’t have a name beyond . anyway).

For you, feel free to save the result of the filter and/or the result of the aov and use them as inputs to the next thing. As long as you get the same two ANOVA tables as I did by what looks like sensible code, I am happy.

Extra 1: As far as I can tell, there is no particular reason for this dataset to condition on xdrug and look for an effect of xshock. It would be perfectly good to do it the other way around. In the auto noise example in lecture, we cared more about the effect of filter type and the engine size was like a “block” (in experimental design terms, a factor that makes a difference but that we don’t really care about; putting it in the analysis will reduce the error sum of squares and thus give us better tests for the things we really do care about). In that context, it makes more sense to condition on the block and say “in this block, this happened”. Here, though, you can imagine that the researchers were interested in both factors, and so there was no special reason to do it this way around. I asked you to do it like this to be consistent with your graphs (which I was also more specific with than I might have been), so that you could find out why your significant simple effect was significant by looking straight at your graphs. I invite you to switch everything around: make xshock the \(x\) on your graphs, and condition on xshock for your simple effects, and see what difference it makes to your conclusions.

Extra 2: I did some digging and found out that the model formula input of aov is called formula. It seems that if you specify that by name, it will take data = from the output of the previous line, and so you no longer need the data = .:

task %>% 
  filter(xdrug == "drug_no") %>% 
  aov(formula = response ~ xshock) %>% 
  summary()
            Df Sum Sq Mean Sq F value   Pr(>F)    
xshock       1  55.12   55.12   39.61 6.15e-07 ***
Residuals   30  41.75    1.39                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The logic, I think, is that the first two inputs to aov are formula and data in that order; since we specified formula by name, the first remaining input to aov is now data, so that’s where the output from the filter goes.

  1. Why is there no value in running Tukey as a followup to any of your simple effects? Explain briefly.

Because the xshock factor only has two levels, so if you find a significant difference due to xshock, you already know that its two levels are different in terms of response without any further analysis being needed.

I suppose the bit up to the first comma is all you need, but it is better to say something like the rest of it as well, to show that you understand.

Extra: if you do it anyway (as you can find out for yourself easily enough if you are not sure), this is what you get:

task %>% 
  filter(xdrug == "drug_no") %>% 
  aov(response ~ xshock, data = .) %>% 
  TukeyHSD()
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = response ~ xshock, data = .)

$xshock
                     diff       lwr       upr p adj
shock_yes-shock_no -2.625 -3.476797 -1.773203 6e-07

you get exactly the same P-value as you got before, so you have really gained nothing. You get a confidence interval for the difference in means, sure, but even that tells you not much more than your graph does (for subjects that did not receive the drug, the number of tasks completed was less for the subjects that received electroshock).

Footnotes

  1. In the dataset there, what we call batch is there called replicate.↩︎

  2. There is a bit of fiddling involved to get the ANOVA P-values out. This is one way to do it.↩︎