Questions are below. My solutions are below all the question parts for a question; scroll down if you get stuck. There may be extra discussion below that for some of the questions; you might find that interesting to read, maybe after tutorial.

For these worksheets, you will learn the most by spending a few minutes thinking about how you would answer each question before you look at my solution. There are no grades attached to these worksheets, so feel free to guess: it makes no difference at all how wrong your initial guess is!

questions first, from qq file

# 1 Child psychology

According to research in child psychology, working mothers spend a mean time of 11 minutes per day talking to their children, with a standard deviation of 2.3 minutes. Your research suggests that the mean should be greater than that, and you are planning a study of working mothers (who work outside the home) to see how many minutes they spend talking to their children. You think that the mean should be 12 minutes per day, and you want to design your study so that a mean of 11 should be rejected with a reasonably high probability.

1. If you interview 20 working mothers, what is the power of your test if your thought about the mean is correct? Estimate by simulation. Assume that the time that a mother spends talking to her children has a normal distribution.

2. Explain briefly why power.t.test can be used to calculate an answer to this problem, and use it to check your result in the previous part.

3. A standard level of power in studies in child psychology is 80% (0.80). According to what you have seen so far, is it necessary to interview 20 working mothers, or more, or less? Explain briefly. Use power.t.test to obtain an appropriate sample size, under the assumptions you have made.

My solutions

According to research in child psychology, working mothers spend a mean time of 11 minutes per day talking to their children, with a standard deviation of 2.3 minutes. Your research suggests that the mean should be greater than that, and you are planning a study of working mothers (who work outside the home) to see how many minutes they spend talking to their children. You think that the mean should be 12 minutes per day, and you want to design your study so that a mean of 11 should be rejected with a reasonably high probability.

1. If you interview 20 working mothers, what is the power of your test if your thought about the mean is correct? Estimate by simulation. Assume that the time that a mother spends talking to her children has a normal distribution.

Solution

For me, to kick off:

set.seed(457299)

The advantage to doing this is that every time you re-run code in your notebook (starting from the beginning and going all the way through), the set of random numbers in it will be the same, so that if you have talked about a set of random numbers that came out a certain way, your description won’t need to change if you run things again. The number inside the set.seed tells the random number generator where to start, and can be anything; some people use 1 or 123. Mine is an old phone number.

Now to work. The true mean (as far as you are concerned) is 12, so sample from a normal distribution with that mean (and the same standard deviation as the other research, since you don’t have a better value).1 The null mean is the one from the previous research, namely 11 (the one you want to reject), and because you want to prove that the mean is in fact greater than 11, you need an upper-tailed alternative. That leads to this:

library(tidyverse)

Then2 follow the familiar six lines of code:

tibble(sim = 1:1000) %>%
rowwise() %>%
mutate(my_sample = list(rnorm(20, 12, 2.3))) %>%
mutate(t_test = list(t.test(my_sample, mu = 11, alternative = "greater"))) %>%
mutate(p_value = t_test$p.value) %>% count(p_value <= 0.05) The estimated power is $$606/1000 = 0.606$$. By interviewing 20 working mothers, you are likely to be able to reject the null hypothesis that the mean time spent talking to their children is 11 minutes per day, in favour of an alternative that the mean is larger than this. Your answer will likely differ from mine by a bit (see below for how big “a bit” might be), but it should be somewhere near this. Indeed, if your answer is exactly the same as mine or as one of your classmates, this may cause the grader to look again at your code to make sure that what you did was sensible. The code: • set up a data frame with places for 1000 (or 10,000 or whatever) simulated samples • work one row at a time from here forward • draw a random sample of size 20 from the truth (as far as you know what it is) for each row • run a $$t$$-test testing whether the mean is 11 for each row (which you know, or at least think, it isn’t, but the onus is on your research to prove itself over the current state of affairs) • get the P-value from each test • count how many of those P-values are less than or equal to 0.05. The number of times this is true, divided by your number of simulations, is your estimate of the power. $$\blacksquare$$ 1. Explain briefly why power.t.test can be used to calculate an answer to this problem, and use it to check your result in the previous part. Solution We can use power.t.test for two reasons: (i) we actually are doing a $$t$$-test (for a population mean), and (ii) we are assuming that the times spent talking to children have a normal distribution. For power.t.test we need to specify the sample size n, the difference between the true and null means delta, the population SD sd, the type of test (a one-sample $$t$$), and whether it is one- or two-sided: power.t.test(n = 20, delta = 12 - 11, sd = 2.3, type = "one.sample", alternative = "one.sided") ## ## One-sample t test power calculation ## ## n = 20 ## delta = 1 ## sd = 2.3 ## sig.level = 0.05 ## power = 0.5908187 ## alternative = one.sided If you are like me, you probably typed alternative = "greater", as you would with t.test, but then the error message will tell you what you actually need, either one.sided or two.sided. You need to be aware that information about the one-sidedness of your test has to find its way into power.t.test somehow; it turns out that it doesn’t matter which kind of one-sided test you have (greater or less), just that you have a one-sided test at all.3 The power here is actually 0.590. My simulated value of 0.606 was close. Yours will probably be different from mine, but it ought to be somewhere near to 0.590. (See the Extra for how close you would expect to be.) $$\blacksquare$$ 1. A standard level of power in studies in child psychology is 80% (0.80). According to what you have seen so far, is it necessary to interview 20 working mothers, or more, or less? Explain briefly. Use power.t.test to obtain an appropriate sample size, under the assumptions you have made. Solution Our power is smaller than the target power of 0.8, so our sample size should be bigger than 20. (A bigger sample size will give a bigger power.) Use power.t.test, inserting a value for power, and leaving out n (since that is what we want the function to find for us): power.t.test(power = 0.80, delta = 12 - 11, sd = 2.3, type = "one.sample", alternative = "one.sided") ## ## One-sample t test power calculation ## ## n = 34.10007 ## delta = 1 ## sd = 2.3 ## sig.level = 0.05 ## power = 0.8 ## alternative = one.sided Round the sample size up to 35. We need to interview 35 working mothers to get 80% power. $$\blacksquare$$ # 2 Two-sample power Suppose we have two populations, which are supposing to both be normally distributed. The first population has a mean of 20, and the second population has a mean of 25. Both populations have the same SD of 9. 1. Suppose we take samples of 30 observations from each population. Use power.t.test to find the probability that a two-sample $$t$$-test will (correctly) reject the null hypothesis that the two populations have the same mean, in favour of a one-sided alternative. (Hint: delta should be positive.) 2. Find the sample size needed to obtain a power of 0.75. Comment briefly on whether your sample size makes sense. 3. Reproduce your power result from (a) by simulation. Some things to consider: • you will need to generate two columns of random samples, one from each population • t.test can also run a two-sample $$t$$-test by giving the two columns separately, rather than as we have done it before by having a column with all the measurements and a separate column saying which group they came from. • you will need to get the right alternative. With two columns input like this, the alternative is relative to the column you give first. 1. Give an example of a situation where the simulation approach could be used and power.t.test not. My solutions: 1. Suppose we take samples of 30 observations from each population. Use power.t.test to find the probability that a two-sample $$t$$-test will (correctly) reject the null hypothesis that the two populations have the same mean, in favour of a one-sided alternative. (Hint: delta should be positive.) Solution Note first that power.t.test works here because (i) the populations have normal distributions, (ii) the SDs are the same, (iii) the sample sizes from each population are the same. Then, subtracting the means so that the bigger one is first (and thus delta is positive): power.t.test(n = 30, delta = 25-20, sd = 9, type = "two.sample", alternative = "one.sided") ## ## Two-sample t test power calculation ## ## n = 30 ## delta = 5 ## sd = 9 ## sig.level = 0.05 ## power = 0.6849538 ## alternative = one.sided ## ## NOTE: n is number in *each* group The power is about 0.68. (Give about one more decimal if you prefer.) You’ll recall that we don’t say which one side we want to reject for. The way power.t.test does it is to assume the first population has the bigger mean, which is why I said to use a positive delta. (Otherwise you’ll get an extremely small power, which corresponds to concluding that the population with the smaller mean actually has the bigger one, which you could imagine is not going to happen much.) $$\blacksquare$$ 1. Find the sample size needed to obtain a power of 0.75. Comment briefly on whether your sample size makes sense. Solution The coding part is not meant to be hard: take what you just did, and replace the n with the power you are shooting for: power.t.test(power = 0.75, delta = 25-20, sd = 9, type = "two.sample", alternative = "one.sided") ## ## Two-sample t test power calculation ## ## n = 35.55396 ## delta = 5 ## sd = 9 ## sig.level = 0.05 ## power = 0.75 ## alternative = one.sided ## ## NOTE: n is number in *each* group The required sample size is 36 in each group, which, were you doing this on an assignment, you would need to say. (Remember also to round up to the next whole number, since you want at least that much power.) As to why the answer makes sense, well, before we had a sample size of 30 in each group and a power about 0.68. Our target power here was slightly bigger than that, and to increase the power (all else fixed) we need to increase the sample size as well. The sample size needed is slightly bigger than 30, which matches with the power desired being slightly bigger than 0.68. $$\blacksquare$$ 1. Reproduce your power result from (a) by simulation. Some things to consider: • you will need to generate two columns of random samples, one from each population • t.test can also run a two-sample $$t$$-test by giving the two columns separately, rather than as we have done it before by having a column with all the measurements and a separate column saying which group they came from. • you will need to get the right alternative. With two columns input like this, the alternative is relative to the column you give first. Solution With all that in mind, something like this: tibble(sim = 1:1000) %>% rowwise() %>% mutate(sample1 = list(rnorm(30, 20, 9)), sample2 = list(rnorm(30, 25, 9))) %>% mutate(my_test = list(t.test(sample1, sample2, alternative = "less"))) %>% mutate(p_val = my_test$p.value) %>%
count(p_val <= 0.05)

My estimated power is 0.675, entirely consistent with part (a).

Code notes:

• generate two columns of random samples, one from each population
• feed the two columns of samples into t.test. The first population actually had the smaller mean, so that is the one side that you would like the test to reject in favour of.

$$\blacksquare$$

1. Give an example of a situation where the simulation approach could be used and power.t.test not.

Solution

In my solution to (a) I said

“Note first that power.t.test works here because (i) the populations have normal distributions, (ii) the SDs are the same, (iii) the sample sizes from each population are the same.”

These all need to be true in order to use power.t.test, so to answer this one, describe a situation where one of them fails, eg:

• the population distributions are something other than normal
• the two populations have different SDs (spreads). I think power.t.test is using the pooled test behind the scenes.
• the sample sizes are different (for example, sampling from one population might be more expensive than sampling from the other, and you might want to achieve a certain power with one sample size being twice the other).

It is also worth noting that you could make these changes in your simulation code without great difficulty:

• change rnorm to something else
• change the 9’s in rnorm to something else (the one in sample1 different from the one in sample2)
• change the 30’s in rnorm to something else.

$$\blacksquare$$

# 3 Child psychology, revisited

This is a continuation of a previous question on this worksheet.

1. Another distribution that might be suitable for time spent talking to children is the gamma distribution. Values from a gamma distribution are guaranteed to be greater than zero (which is suitable for times spent talking to children). As far as R is concerned, a random value from a gamma distribution is generated using the function rgamma. This, for us, has three inputs: the number of values to generate, a parameter called shape for which we will use the value 27.23, and a parameter called scale for which we will use the value 0.44. Generate a random sample of 1000 values from a gamma distribution with the given parameter values. Hint: make sure that the inputs that need names actually have names.

2. Find the mean and SD of your random sample of values from the gamma distribution. Are the mean and SD somewhere close to the mean and SD you used in your first power analysis? Explain (very) briefly. Hint: for this part and the next one, it will help to have your random gamma values in a dataframe, so you may want to create one first if you didn’t already.

3. Make a histogram of your random sample of gamma-distributed values, and comment briefly on its shape.

4. Suppose now that you want to assume that the data have a gamma distribution with this scale and shape (and thus the same mean and SD that you used in 1(a)). Modify your simulation to estimate the power of the $$t$$-test against a null mean of 11 against the alternative that the mean is greater than 11, using a sample size of 20, with the data coming from this gamma distribution.

5. Compare the estimated power from 1(a) (or 1(b)) and the previous part. Does the similarity or difference make sense? Explain briefly.

My solutions

1. Another distribution that might be suitable for time spent talking to children is the gamma distribution. Values from a gamma distribution are guaranteed to be greater than zero (which is suitable for times spent talking to children). As far as R is concerned, a random value from a gamma distribution is generated using the function rgamma. This, for us, has three inputs: the number of values to generate, a parameter called shape for which we will use the value 27.23, and a parameter called scale for which we will use the value 0.44. Generate a random sample of 1000 values from a gamma distribution with the given parameter values. Hint: make sure that the inputs that need names actually have names, and organize your results as a column in a dataframe.

Solution

Following the hint, something like this:

d <- tibble(g = rgamma(1000, shape = 27.23, scale = 0.44))
d

Displaying the dataframe as usual shows the first 10 rows, which once again will enable anyone reading your work to see that you have the right kind of thing.

I am apparently using my standard “temporary dataframe name” of d. Give it and the column in it whatever names you think make sense. For example, you might use gamma for the column and gammas (plural) for the dataframe.

$$\blacksquare$$

1. Find the mean and SD of your random sample of values from the gamma distribution. Are the mean and SD somewhere close to the mean and SD you used in your first power analysis? Explain (very) briefly.

Solution

The first thing is the mean and SD of your random values. Starting from a dataframe, this is summarize:

d %>% summarize(g_mean = mean(g), g_sd = sd(g))

The mean and SD we used in the first simulation of power (in 1(a)) were 12 and 2.3 respectively, and these (for me) are very close to that. (Calculating the mean and SD and saying just “these are close” will confuse your reader, so make sure you say what they are supposed to be close to.)

$$\blacksquare$$

1. Make a histogram of your random sample of gamma-distributed values, and comment briefly on its shape.

Solution

ggplot(d, aes(x = g)) + geom_histogram(bins = 10)

This is very slightly skewed to the right, but it is not that far away from being normal.

$$\blacksquare$$

1. Suppose now that you want to assume that the data have a gamma distribution with this scale and shape (and thus the same mean and SD that you used in 1(a)). Modify your simulation to estimate the power of the $$t$$-test against a null mean of 11 against the alternative that the mean is greater than 11, using a sample size of 20, with the data coming from this gamma distribution.

Solution

power.t.test assumes normally-distributed data, which we no longer want to assume, and so we cannot use that. (We are still doing a $$t$$-test, so that part is not a problem.)

In the simulation from 1(a), there is actually only one line that needs to change, the third one, which generates the random samples from the truth. Instead of rnorm, this should have rgamma, using the shape and scale values from above, but with a sample size of 20. The test is the same, with the same hypotheses, so nothing else changes:

tibble(sim = 1:1000) %>%
rowwise() %>%
mutate(my_sample = list(rgamma(20, shape = 27.23, scale = 0.44))) %>%
mutate(t_test = list(t.test(my_sample, mu = 11, alternative = "greater"))) %>%
mutate(p_val = t_test\$p.value) %>%
count(p_val <= 0.05)

Copy, paste, and edit from what you did in 1(a). Here, change the rnorm to rgamma and put the right numbers in.

My estimated power is 0.586.

$$\blacksquare$$

1. Compare the estimated power from 1(a) (or 1(b)) and the previous part. Does the similarity or difference make sense? Explain briefly.

Solution

My simulated power values were 0.606 from the normal distribution (or 0.591 by calculation) and 0.586 from the gamma distribution. These are similar, and indicate that the power of the test does not depend much on the exact distribution that generates the data.

Does this make sense? Well, we saw from the histogram that this gamma distribution is actually rather close to looking normal in shape, and so the $$t$$-test should give about the same results whether we assume a normal or a gamma distribution. Thus the power should be about the same, as it is. A second, important, consideration is that with a sample size of 20, a slightly non-normal distribution, as we have, should have next to no impact on the $$t$$-test, and thus also on the power. The results we saw all make sense.

$$\blacksquare$$

1. You might suppose that if the mean is bigger, and this is a random variable that must be positive, that the SD might also be bigger than in the previous research, but unless you have data from a “pilot study”, that is, a small version of the study you are going to do, you have no way to supply a suitable bigger value, so the best you can do is to stick with the value you have.↩︎

2. The set.seed is actually part of “base R”, so doesn’t need the tidyverse to run. If you are in the probably good habit of running library(tidyverse) before you do anything else, that way also works here.↩︎

3. The one I’m thinking of from lecture didn’t have an alternative because the default is two-sided, which is what it was. The two-sample example from lecture was one-sided, because the new method for teaching reading was supposed to be better, so that is the one to borrow from.↩︎