STAC33 Assignment 7

You are expected to complete this assignment on your own: that is, you may discuss general ideas with others, but the writeup of the work must be entirely your own. If your assignment is unreasonably similar to that of another student, you can expect to be asked to explain yourself.

If you run into problems on this assignment, it is up to you to figure out what to do. The only exception is if it is impossible for you to complete this assignment, for example a data file cannot be read. (There is a difference between you not knowing how to do something, which you have to figure out, and something being impossible, which you are allowed to contact me about.)

You must hand in a rendered document that shows your code, the output that the code produces, and your answers to the questions. This should be a file with .html on the end of its name. There is no credit for handing in your unrendered document (ending in .qmd), because the grader cannot then see whether the code in it runs properly. After you have handed in your file, you should be able to see (in Attempts) what file you handed in, and you should make a habit of checking that you did indeed hand in what you intended to, and that it displays as you expect.

Hint: render your document frequently, and solve any problems as they come up, rather than trying to do so at the end (when you may be close to the due date). If your document will not successfully render, it is because of an error in your code that you will have to find and fix. The error message will tell you where the problem is, but it is up to you to sort out what the problem is.

1 Smoke stack emissions

A manufacturing company measured the smoke stack emissions of carbon monoxide and compared them with the corresponding values from another company, with the aim of showing that the company’s emissions of carbon monoxide were lower than the competitor’s. The data are in http://ritsokiguess.site/datafiles/smokestack.csv. The column company shows whether each emission value comes from the manufacturer or its competitor, and the emission value is given in suitable units.

(a) (1 point) Read in and display (some of) the data.

As expected:

my_url <- "http://ritsokiguess.site/datafiles/smokestack.csv"
smokestack <- read_csv(my_url)

Rows: 19 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): company
dbl (1): emission

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

smokestack

(b) (2 points) Make a suitable graph of the data.

Two choices here:

the standard graph is a boxplot:

ggplot(smokestack, aes(x = company, y = emission)) + geom_boxplot()

if you look ahead to the next part, you might recognize that normality is going to be an issue, in which case you could reasonably draw normal quantile plots, one for each company, facetted:

ggplot(smokestack, aes(sample = emission)) + stat_qq() + stat_qq_line() +
  facet_wrap(~ company)

(c) (2 points) Bearing in mind that there are 9 observations of the `manufacturer` and 10 for the `competitor`, explain briefly why it might be better to use Mood’s median test as opposed to some kind of \(t\)-test.

The boxplots give an easier route to an answer here: both distributions are left-skewed, and the sample sizes are not large (so that we cannot expect much help from the Central Limit Theorem), so we should prefer a Mood’s median test over a two-sample \(t\)-test.

If you drew normal quantile plots, you have to work a bit harder, but you only need to find one distribution you don’t like. The competitor distribution has sort of a curved shape, so you could argue that this is skewed to the left; about all that’s wrong with the manufacturer distribution is that it has a short upper tail, which is not itself an argument against a \(t\)-test.

(d) (4 points) Run a suitable Mood’s median test. What do you conclude from it, in the context of the data?

Load smmr if you have not already done so, then:

median_test(smokestack, emission, company)

$grand_median
[1] 3.4

$table
              above
group          above below
  competitor       6     3
  manufacturer     0     6

$test
       what       value
1 statistic 6.666666667
2        df 1.000000000
3   P-value 0.009823275

The null hypothesis is that the manufacturer and the competitor have the same median emissions level, and the alternative should be that the manufacturer’s median emission is less (one-sided: see the beginning of the question).

The P-value here is two-sided, so (i) check that we are on the correct side. All the manufacturer’s emissions are below the grand median of 3.4 (some of them are equal but they are discarded from the calculation), while most of the competitors’ are above, so this is pointing towards the alternative and we are on the correct side. Then (ii) take the two-sided P-value of 0.0098 and halve it to get 0.0049. This is smaller than 0.05, so we can reject the null in favour of our one-sided alternative and conclude that median emissions are less for the manufacturer than for its competitor.

2 Cleansing skin

An experiment was designed to compare three different skin cleansing agents, labelled A, B, and C. Forty-five subjects with similar skin conditions were randomly assigned to one of the three agents, shown in agent. A patch of skin on each individual was exposed to a contaminant and then cleansed with the cleansing agent assigned to that individual. After 8 hours, the residual contaminants were measured. The results are shown in clean. A lower value means that the cleansing agent was more effective (that is, that there was less contaminant remaining). The data are in http://ritsokiguess.site/datafiles/clean.csv.

(a) (1 point) Read in and display (some of) the data.

An absence of surprises here:

my_url <- "http://ritsokiguess.site/datafiles/clean.csv"
skin <- read_csv(my_url)

Rows: 45 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): agent
dbl (1): clean

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

skin

Choose your own name for the dataframe, but avoid clean, since that is the name of one of the columns.

(b) (2 points) Make a suitable graph of the data, bearing in mind that you want to make a general comparison of the effectiveness of the cleansing agents.

The standard graph, with one categorical variable and one quantitative one, is a boxplot:

ggplot(skin, aes(x = agent, y = clean)) + geom_boxplot()

Unlike the example in lecture, this is a perfectly good order to display the cleansing agents in. You might be familiar with the terms “nominal” and “ordinal”: this one is nominal, where the values in agent are just labels for the cleansing agents, and you could display them in any order. In the lecture example with the jumping rats, the group was ordinal, because the amount of jumping was in order (from low to high on the second boxplot in lecture).

The last part of the question statement (after the comma) is to encourage you to draw a boxplot rather than something like a normal quantile plot. A normal quantile plot is great for assessing normality but is not great for anything else (like assessing medians or IQRs).

(c) (2 points) Explain briefly why you would be happy to run a standard ANOVA in this situation.

Looking at the boxplot, it seems that the three groups all have more or less equal spreads. The sample sizes (15 in each group) should be large enough to overcome any (mild, moderate, your choice of word) skewness we see.

You know where you want to get to, so your job is to make an argument that will get you there.

Extra: the data values are all recorded as integers, so the normal quantile plot looks like this:

ggplot(skin, aes(sample = clean)) + stat_qq() + stat_qq_line() +
  facet_wrap(~ agent)

The horizontal bands of points reflect that there are several equal observations (for agent A: 2, 3, 4, and 5). The best you can do for assessing normality here is to check that the points are all reasonably close to the line, which I think they are; they won’t hug the line perfectly as they might if the data values had been recorded more accurately.

(d) (3 points) Run a suitable analysis of variance, and display the results. What do you conclude, in the context of the data?

In the light of the previous part, a regular aov:

skin.1 <- aov(clean ~ agent, data = skin)
summary(skin.1)

            Df Sum Sq Mean Sq F value   Pr(>F)    
agent        2 108.13   54.07   47.44 1.68e-11 ***
Residuals   42  47.87    1.14                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The null hypothesis is that all the cleansing agents have the same mean residual contaminant. The P-value of \(1.7 \times 10^{-11}\) is very small, so this null hypothesis is rejected, and not all the cleansing agents have the same mean residual contaminant. If you prefer, “at least one of the cleansing agents has a different mean residual contaminant than the others”.

Give the P-value, and be careful with your conclusion: “the means are different”, or something like that, is not precise enough, because it could either mean that all the means are different (which we are not entitled to conclude yet), or that some of the means are different.

(e) (3 points) Are you justified in running Tukey’s method here? Explain briefly. If justified, run it, and interpret the results in the context of the data.

The analysis of variance was significant, so we need to use Tukey’s method to see which agents differ significantly from which in terms of residual contaminant:

TukeyHSD(skin.1)

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = clean ~ agent, data = skin)

$agent
         diff       lwr        upr     p adj
B-A  3.733333  2.786274  4.6803925 0.0000000
C-A  2.466667  1.519608  3.4137258 0.0000004
C-B -1.266667 -2.213726 -0.3196075 0.0063206

All three P-values are smaller than 0.05, so all three cleansing agents differ significantly from the others in how well they clean. (The word “all”, or a close synonym, needs to be in your answer to make it clear that every single one is different from the others; otherwise, it might be like the example from lecture where one was different, but two were the same.)

(f) (2 points) Which cleansing agent or agents would you recommend for future use, based on the results of this study? Explain briefly.

A small amount of residual contamination is best. There are two things you need to say:

Agent A has the smallest mean residual contamination (most easily seen from the boxplot, but you can work it out from the Tukey output)
Agent A’s residual contamination is significantly smaller than that of both the other two agents, from the Tukey analysis.

The significance is also important, because you don’t want to recommend something that came out best by chance; you want to make a recommendation that will stand up to replication (as Fisher said, “a well-designed experiment will rarely fail to yield this level of significance”). Agent A is significantly better than the others, so you can expect it to be consistently the best.

Extras: it is not much work to run either of the other two possible analyses. If you were happy with the normality but not with the equal spreads, you would run a Welch ANOVA:

oneway.test(clean ~ agent, data = skin)


    One-way analysis of means (not assuming equal variances)

data:  clean and agent
F = 46.061, num df = 2.000, denom df = 27.949, p-value = 1.423e-09

and, seeing the significance, follow up with Games-Howell:

gamesHowellTest(clean ~ factor(agent), data = skin)


    Pairwise comparisons using Games-Howell test

data: clean by factor(agent)

  A       B     
B 1.1e-09 -     
C 1.3e-06 0.0093


P value adjustment method: none

alternative hypothesis: two.sided

The technical detail is that this function requires the categorical variable to be an actual factor, not just text, so we have to make it one. The results are the same as for the Tukey, and the P-values are all pretty similar. In the same way that the two variants of two-sample \(t\)-test will give similar results when they both apply, the two variants of ANOVA will do the same thing.¹

The other option is Mood’s median test. You can do it here, but it is probably not the best option because aov appears to be just fine:

median_test(skin, clean, agent)

$grand_median
[1] 6

$table
     above
group above below
    A     0    15
    B     9     1
    C     4     6

$test
       what        value
1 statistic 2.086538e+01
2        df 2.000000e+00
3   P-value 2.945366e-05

One again, strong significance (though not quite as strong as the other two tests, which hints at those other two tests being better for these data). The table of aboves and belows gives a pretty strong hint about why.²

Pairwise median tests to follow, therefore:

pairwise_median_test(skin, clean, agent)

The test for agents B vs C is not quite significant this time (P-value 0.057). The other two comparisons are still significant, though with larger P-values than in the other two analyses. Once again, we are probably suffering from a lack of power compared to the regular and Welch ANOVAs. All three tests are valid, but I think the aov is best because its assumptions appear to be all right here.

Footnotes

Welch’s algebra applied just as well to ANOVA as to the two-sample \(t\)-test, so you would expect things to be similar in practice for the two variants of ANOVA when they both apply. For that matter, you would also expect regular ANOVA to fail badly when it does not apply, when the groups have obviously different spreads, in just the same way that a pooled \(t\)-test can fail badly when it does not apply.↩︎
There were 15 observations in each group, but we appear to have “lost” a lot of observations from cleansing agents B and C. This is because their values were equal to the grand median of 6. The observations were only recorded as integers, so a lot of them are equal to each other, and the ones that happened to be equal to the grand median all got thrown away.↩︎

1 Smoke stack emissions

(a) (1 point) Read in and display (some of) the data.

(b) (2 points) Make a suitable graph of the data.

(c) (2 points) Bearing in mind that there are 9 observations of the manufacturer and 10 for the competitor, explain briefly why it might be better to use Mood’s median test as opposed to some kind of \(t\)-test.

(d) (4 points) Run a suitable Mood’s median test. What do you conclude from it, in the context of the data?

2 Cleansing skin

(a) (1 point) Read in and display (some of) the data.

(b) (2 points) Make a suitable graph of the data, bearing in mind that you want to make a general comparison of the effectiveness of the cleansing agents.

(c) (2 points) Explain briefly why you would be happy to run a standard ANOVA in this situation.

(d) (3 points) Run a suitable analysis of variance, and display the results. What do you conclude, in the context of the data?

(e) (3 points) Are you justified in running Tukey’s method here? Explain briefly. If justified, run it, and interpret the results in the context of the data.

(f) (2 points) Which cleansing agent or agents would you recommend for future use, based on the results of this study? Explain briefly.

Footnotes

(c) (2 points) Bearing in mind that there are 9 observations of the `manufacturer` and 10 for the `competitor`, explain briefly why it might be better to use Mood’s median test as opposed to some kind of \(t\)-test.