Questions are below. My solutions are below all the question parts for a question; scroll down if you get stuck. There may be extra discussion below that for some of the questions; you might find that interesting to read, maybe after tutorial.
For these worksheets, you will learn the most by spending a few minutes thinking about how you would answer each question before you look at my solution. There are no grades attached to these worksheets, so feel free to guess: it makes no difference at all how wrong your initial guess is!
questions first, from qq file
According to research in child psychology, working mothers spend a mean time of 11 minutes per day talking to their children, with a standard deviation of 2.3 minutes. Your research suggests that the mean should be greater than that, and you are planning a study of working mothers (who work outside the home) to see how many minutes they spend talking to their children. You think that the mean should be 12 minutes per day, and you want to design your study so that a mean of 11 should be rejected with a reasonably high probability.
If you interview 20 working mothers, what is the power of your test if your thought about the mean is correct? Estimate by simulation. Assume that the time that a mother spends talking to her children has a normal distribution.
Explain briefly why power.t.test
can be used to
calculate an answer to this problem, and use it to check your result in
the previous part.
A standard level of power in studies in child psychology is 80%
(0.80). According to what you have seen so far, is it necessary to
interview 20 working mothers, or more, or less? Explain briefly. Use
power.t.test
to obtain an appropriate sample size, under
the assumptions you have made.
My solutions
According to research in child psychology, working mothers spend a mean time of 11 minutes per day talking to their children, with a standard deviation of 2.3 minutes. Your research suggests that the mean should be greater than that, and you are planning a study of working mothers (who work outside the home) to see how many minutes they spend talking to their children. You think that the mean should be 12 minutes per day, and you want to design your study so that a mean of 11 should be rejected with a reasonably high probability.
Solution
For me, to kick off:
set.seed(457299)
The advantage to doing this is that every time you re-run code in
your notebook (starting from the beginning and going all the way
through), the set of random numbers in it will be the same, so that if
you have talked about a set of random numbers that came out a certain
way, your description won’t need to change if you run things again. The
number inside the set.seed
tells the random number
generator where to start, and can be anything; some people use 1 or 123.
Mine is an old phone number.
Now to work. The true mean (as far as you are concerned) is 12, so sample from a normal distribution with that mean (and the same standard deviation as the other research, since you don’t have a better value).1 The null mean is the one from the previous research, namely 11 (the one you want to reject), and because you want to prove that the mean is in fact greater than 11, you need an upper-tailed alternative. That leads to this:
library(tidyverse)
Then2 follow the familiar six lines of code:
tibble(sim = 1:1000) %>%
rowwise() %>%
mutate(my_sample = list(rnorm(20, 12, 2.3))) %>%
mutate(t_test = list(t.test(my_sample, mu = 11, alternative = "greater"))) %>%
mutate(p_value = t_test$p.value) %>%
count(p_value <= 0.05)
The estimated power is \(606/1000 = 0.606\). By interviewing 20 working mothers, you are likely to be able to reject the null hypothesis that the mean time spent talking to their children is 11 minutes per day, in favour of an alternative that the mean is larger than this.
Your answer will likely differ from mine by a bit (see below for how big “a bit” might be), but it should be somewhere near this. Indeed, if your answer is exactly the same as mine or as one of your classmates, this may cause the grader to look again at your code to make sure that what you did was sensible.
The code:
\(\blacksquare\)
power.t.test
can be used to
calculate an answer to this problem, and use it to check your result in
the previous part.Solution
We can use power.t.test
for two reasons: (i) we actually
are doing a \(t\)-test (for a
population mean), and (ii) we are assuming that the times spent talking
to children have a normal distribution.
For power.t.test
we need to specify the sample size
n
, the difference between the true and null means
delta
, the population SD sd
, the type of test
(a one-sample \(t\)), and whether it is
one- or two-sided:
power.t.test(n = 20, delta = 12 - 11, sd = 2.3, type = "one.sample", alternative = "one.sided")
##
## One-sample t test power calculation
##
## n = 20
## delta = 1
## sd = 2.3
## sig.level = 0.05
## power = 0.5908187
## alternative = one.sided
If you are like me, you probably typed
alternative = "greater"
, as you would with
t.test
, but then the error message will tell you what you
actually need, either one.sided
or two.sided
.
You need to be aware that information about the one-sidedness of your
test has to find its way into power.t.test
somehow; it
turns out that it doesn’t matter which kind of one-sided test
you have (greater or less), just that you have a one-sided test at
all.3
The power here is actually 0.590. My simulated value of 0.606 was close. Yours will probably be different from mine, but it ought to be somewhere near to 0.590. (See the Extra for how close you would expect to be.)
\(\blacksquare\)
power.t.test
to obtain an appropriate sample size, under
the assumptions you have made.Solution
Our power is smaller than the target power of 0.8, so our sample size should be bigger than 20. (A bigger sample size will give a bigger power.)
Use power.t.test
, inserting a value for
power
, and leaving out n
(since that is what
we want the function to find for us):
power.t.test(power = 0.80, delta = 12 - 11, sd = 2.3, type = "one.sample", alternative = "one.sided")
##
## One-sample t test power calculation
##
## n = 34.10007
## delta = 1
## sd = 2.3
## sig.level = 0.05
## power = 0.8
## alternative = one.sided
Round the sample size up to 35. We need to interview 35 working mothers to get 80% power.
\(\blacksquare\)
Suppose we have two populations, which are supposing to both be normally distributed. The first population has a mean of 20, and the second population has a mean of 25. Both populations have the same SD of 9.
Suppose we take samples of 30 observations from each population.
Use power.t.test
to find the probability that a two-sample
\(t\)-test will (correctly) reject the
null hypothesis that the two populations have the same mean, in favour
of a one-sided alternative. (Hint: delta
should be
positive.)
Find the sample size needed to obtain a power of 0.75. Comment briefly on whether your sample size makes sense.
Reproduce your power result from (a) by simulation. Some things to consider:
t.test
can also run a two-sample \(t\)-test by giving the two columns
separately, rather than as we have done it before by having a column
with all the measurements and a separate column saying which group they
came from.alternative
. With two
columns input like this, the alternative is relative to the column you
give first.power.t.test
not.My solutions:
power.t.test
to find the probability that a two-sample
\(t\)-test will (correctly) reject the
null hypothesis that the two populations have the same mean, in favour
of a one-sided alternative. (Hint: delta
should be
positive.)Solution
Note first that power.t.test
works here because (i) the
populations have normal distributions, (ii) the SDs are the same, (iii)
the sample sizes from each population are the same.
Then, subtracting the means so that the bigger one is first (and thus
delta
is positive):
power.t.test(n = 30, delta = 25-20, sd = 9, type = "two.sample", alternative = "one.sided")
##
## Two-sample t test power calculation
##
## n = 30
## delta = 5
## sd = 9
## sig.level = 0.05
## power = 0.6849538
## alternative = one.sided
##
## NOTE: n is number in *each* group
The power is about 0.68. (Give about one more decimal if you prefer.)
You’ll recall that we don’t say which one side we want to
reject for. The way power.t.test
does it is to assume the
first population has the bigger mean, which is why I said to use a
positive delta
. (Otherwise you’ll get an extremely small
power, which corresponds to concluding that the population with the
smaller mean actually has the bigger one, which you could imagine is not
going to happen much.)
\(\blacksquare\)
Solution
The coding part is not meant to be hard: take what you just did, and
replace the n
with the power
you are shooting
for:
power.t.test(power = 0.75, delta = 25-20, sd = 9, type = "two.sample", alternative = "one.sided")
##
## Two-sample t test power calculation
##
## n = 35.55396
## delta = 5
## sd = 9
## sig.level = 0.05
## power = 0.75
## alternative = one.sided
##
## NOTE: n is number in *each* group
The required sample size is 36 in each group, which, were you doing this on an assignment, you would need to say. (Remember also to round up to the next whole number, since you want at least that much power.)
As to why the answer makes sense, well, before we had a sample size of 30 in each group and a power about 0.68. Our target power here was slightly bigger than that, and to increase the power (all else fixed) we need to increase the sample size as well. The sample size needed is slightly bigger than 30, which matches with the power desired being slightly bigger than 0.68.
\(\blacksquare\)
t.test
can also run a two-sample \(t\)-test by giving the two columns
separately, rather than as we have done it before by having a column
with all the measurements and a separate column saying which group they
came from.alternative
. With two
columns input like this, the alternative is relative to the column you
give first.Solution
With all that in mind, something like this:
tibble(sim = 1:1000) %>%
rowwise() %>%
mutate(sample1 = list(rnorm(30, 20, 9)),
sample2 = list(rnorm(30, 25, 9))) %>%
mutate(my_test = list(t.test(sample1, sample2, alternative = "less"))) %>%
mutate(p_val = my_test$p.value) %>%
count(p_val <= 0.05)
My estimated power is 0.675, entirely consistent with part (a).
Code notes:
t.test
. The first
population actually had the smaller mean, so that is the one side that
you would like the test to reject in favour of.\(\blacksquare\)
power.t.test
not.Solution
In my solution to (a) I said
“Note first that power.t.test
works here because (i) the
populations have normal distributions, (ii) the SDs are the same, (iii)
the sample sizes from each population are the same.”
These all need to be true in order to use power.t.test
,
so to answer this one, describe a situation where one of them fails,
eg:
power.t.test
is using the pooled test behind the
scenes.It is also worth noting that you could make these changes in your simulation code without great difficulty:
rnorm
to something elsernorm
to something else (the one in
sample1
different from the one in
sample2
)rnorm
to something else.\(\blacksquare\)
This is a continuation of a previous question on this worksheet.
Another distribution that might be suitable for time spent
talking to children is the gamma distribution. Values from a
gamma distribution are guaranteed to be greater than zero (which is
suitable for times spent talking to children). As far as R is concerned,
a random value from a gamma distribution is generated using the function
rgamma
. This, for us, has three inputs: the number of
values to generate, a parameter called shape
for which we
will use the value 27.23, and a parameter called scale
for
which we will use the value 0.44. Generate a random sample of 1000
values from a gamma distribution with the given parameter values. Hint:
make sure that the inputs that need names actually have names.
Find the mean and SD of your random sample of values from the gamma distribution. Are the mean and SD somewhere close to the mean and SD you used in your first power analysis? Explain (very) briefly. Hint: for this part and the next one, it will help to have your random gamma values in a dataframe, so you may want to create one first if you didn’t already.
Make a histogram of your random sample of gamma-distributed values, and comment briefly on its shape.
Suppose now that you want to assume that the data have a gamma distribution with this scale and shape (and thus the same mean and SD that you used in 1(a)). Modify your simulation to estimate the power of the \(t\)-test against a null mean of 11 against the alternative that the mean is greater than 11, using a sample size of 20, with the data coming from this gamma distribution.
Compare the estimated power from 1(a) (or 1(b)) and the previous part. Does the similarity or difference make sense? Explain briefly.
My solutions
rgamma
. This, for us, has three inputs: the number of
values to generate, a parameter called shape
for which we
will use the value 27.23, and a parameter called scale
for
which we will use the value 0.44. Generate a random sample of 1000
values from a gamma distribution with the given parameter values. Hint:
make sure that the inputs that need names actually have names, and
organize your results as a column in a dataframe.Solution
Following the hint, something like this:
d <- tibble(g = rgamma(1000, shape = 27.23, scale = 0.44))
d
Displaying the dataframe as usual shows the first 10 rows, which once again will enable anyone reading your work to see that you have the right kind of thing.
I am apparently using my standard “temporary dataframe name” of
d
. Give it and the column in it whatever names you think
make sense. For example, you might use gamma
for the column
and gammas
(plural) for the dataframe.
\(\blacksquare\)
Solution
The first thing is the mean and SD of your random values. Starting
from a dataframe, this is summarize
:
d %>% summarize(g_mean = mean(g), g_sd = sd(g))
The mean and SD we used in the first simulation of power (in 1(a)) were 12 and 2.3 respectively, and these (for me) are very close to that. (Calculating the mean and SD and saying just “these are close” will confuse your reader, so make sure you say what they are supposed to be close to.)
\(\blacksquare\)
Solution
Start with your dataframe:
ggplot(d, aes(x = g)) + geom_histogram(bins = 10)
This is very slightly skewed to the right, but it is not that far away from being normal.
\(\blacksquare\)
Solution
power.t.test
assumes normally-distributed data, which we
no longer want to assume, and so we cannot use that. (We are still doing
a \(t\)-test, so that part is not a
problem.)
In the simulation from 1(a), there is actually only one line that
needs to change, the third one, which generates the random samples from
the truth. Instead of rnorm
, this should have
rgamma
, using the shape and scale values from above, but
with a sample size of 20. The test is the same, with the same
hypotheses, so nothing else changes:
tibble(sim = 1:1000) %>%
rowwise() %>%
mutate(my_sample = list(rgamma(20, shape = 27.23, scale = 0.44))) %>%
mutate(t_test = list(t.test(my_sample, mu = 11, alternative = "greater"))) %>%
mutate(p_val = t_test$p.value) %>%
count(p_val <= 0.05)
Copy, paste, and edit from what you did in 1(a). Here, change the
rnorm
to rgamma
and put the right numbers
in.
My estimated power is 0.586.
\(\blacksquare\)
Solution
My simulated power values were 0.606 from the normal distribution (or 0.591 by calculation) and 0.586 from the gamma distribution. These are similar, and indicate that the power of the test does not depend much on the exact distribution that generates the data.
Does this make sense? Well, we saw from the histogram that this gamma distribution is actually rather close to looking normal in shape, and so the \(t\)-test should give about the same results whether we assume a normal or a gamma distribution. Thus the power should be about the same, as it is. A second, important, consideration is that with a sample size of 20, a slightly non-normal distribution, as we have, should have next to no impact on the \(t\)-test, and thus also on the power. The results we saw all make sense.
\(\blacksquare\)
You might suppose that if the mean is bigger, and this is a random variable that must be positive, that the SD might also be bigger than in the previous research, but unless you have data from a “pilot study”, that is, a small version of the study you are going to do, you have no way to supply a suitable bigger value, so the best you can do is to stick with the value you have.↩︎
The set.seed
is actually part of “base R”,
so doesn’t need the tidyverse
to run. If you are in the
probably good habit of running library(tidyverse)
before
you do anything else, that way also works here.↩︎
The one I’m thinking of from lecture didn’t have an
alternative
because the default is two-sided, which is what
it was. The two-sample example from lecture was one-sided,
because the new method for teaching reading was supposed to be
better, so that is the one to borrow from.↩︎