Questions are below. My solutions are below all the question parts for a question; scroll down if you get stuck. There may be extra discussion below that for some of the questions; you might find that interesting to read, maybe after tutorial.

For these worksheets, you will learn the most by spending a few minutes thinking about how you would answer each question before you look at my solution. There are no grades attached to these worksheets, so feel free to guess: it makes no difference at all how wrong your initial guess is!

questions first, from qq file

According to research in child psychology, working mothers spend a
mean time of 11 minutes per day talking to their children, with a
standard deviation of 2.3 minutes. Your research suggests that the mean
should be greater than that, and you are planning a study of working
mothers (who work outside the home) to see how many minutes
*they* spend talking to their children. You think that the mean
should be 12 minutes per day, and you want to design your study so that
a mean of 11 should be rejected with a reasonably high probability.

If you interview 20 working mothers, what is the power of your test if your thought about the mean is correct? Estimate by simulation. Assume that the time that a mother spends talking to her children has a normal distribution.

Explain briefly why

`power.t.test`

can be used to calculate an answer to this problem, and use it to check your result in the previous part.A standard level of power in studies in child psychology is 80% (0.80). According to what you have seen so far, is it necessary to interview 20 working mothers, or more, or less? Explain briefly. Use

`power.t.test`

to obtain an appropriate sample size, under the assumptions you have made.

My solutions

According to research in child psychology, working mothers spend a
mean time of 11 minutes per day talking to their children, with a
standard deviation of 2.3 minutes. Your research suggests that the mean
should be greater than that, and you are planning a study of working
mothers (who work outside the home) to see how many minutes
*they* spend talking to their children. You think that the mean
should be 12 minutes per day, and you want to design your study so that
a mean of 11 should be rejected with a reasonably high probability.

- If you interview 20 working mothers, what is the power of your test if your thought about the mean is correct? Estimate by simulation. Assume that the time that a mother spends talking to her children has a normal distribution.

Solution

For me, to kick off:

`set.seed(457299)`

The advantage to doing this is that every time you re-run code in
your notebook (starting from the beginning and going all the way
through), the set of random numbers in it will be the same, so that if
you have talked about a set of random numbers that came out a certain
way, your description won’t need to change if you run things again. The
number inside the `set.seed`

tells the random number
generator where to start, and can be anything; some people use 1 or 123.
Mine is an old phone number.

Now to work. The true mean (as far as you are concerned) is 12, so
sample from a normal distribution with that mean (and the same standard
deviation as the other research, since you don’t have a better value).^{1} The null
mean is the one from the previous research, namely 11 (the one you want
to reject), and because you want to prove that the mean is in fact
*greater* than 11, you need an upper-tailed alternative. That
leads to this:

`library(tidyverse)`

Then^{2}
follow the familiar six lines of code:

```
tibble(sim = 1:1000) %>%
rowwise() %>%
mutate(my_sample = list(rnorm(20, 12, 2.3))) %>%
mutate(t_test = list(t.test(my_sample, mu = 11, alternative = "greater"))) %>%
mutate(p_value = t_test$p.value) %>%
count(p_value <= 0.05)
```

The estimated power is \(606/1000 = 0.606\). By interviewing 20 working mothers, you are likely to be able to reject the null hypothesis that the mean time spent talking to their children is 11 minutes per day, in favour of an alternative that the mean is larger than this.

Your answer will likely differ from mine by a bit (see below for how big “a bit” might be), but it should be somewhere near this. Indeed, if your answer is exactly the same as mine or as one of your classmates, this may cause the grader to look again at your code to make sure that what you did was sensible.

The code:

- set up a data frame with places for 1000 (or 10,000 or whatever) simulated samples
- work one row at a time from here forward
- draw a random sample of size 20
*from the truth*(as far as you know what it is) for each row - run a \(t\)-test testing whether the mean is 11 for each row (which you know, or at least think, it isn’t, but the onus is on your research to prove itself over the current state of affairs)
- get the P-value from each test
- count how many of those P-values are less than or equal to 0.05. The number of times this is true, divided by your number of simulations, is your estimate of the power.

\(\blacksquare\)

- Explain briefly why
`power.t.test`

can be used to calculate an answer to this problem, and use it to check your result in the previous part.

Solution

We can use `power.t.test`

for two reasons: (i) we actually
*are* doing a \(t\)-test (for a
population mean), and (ii) we are assuming that the times spent talking
to children have a normal distribution.

For `power.t.test`

we need to specify the sample size
`n`

, the difference between the true and null means
`delta`

, the population SD `sd`

, the type of test
(a one-sample \(t\)), and whether it is
one- or two-sided:

`power.t.test(n = 20, delta = 12 - 11, sd = 2.3, type = "one.sample", alternative = "one.sided")`

```
##
## One-sample t test power calculation
##
## n = 20
## delta = 1
## sd = 2.3
## sig.level = 0.05
## power = 0.5908187
## alternative = one.sided
```

If you are like me, you probably typed
`alternative = "greater"`

, as you would with
`t.test`

, but then the error message will tell you what you
actually need, either `one.sided`

or `two.sided`

.
You need to be aware that information about the one-sidedness of your
test has to find its way into `power.t.test`

somehow; it
turns out that it doesn’t matter *which* kind of one-sided test
you have (greater or less), just that you have a one-sided test at
all.^{3}

The power here is actually 0.590. My simulated value of 0.606 was close. Yours will probably be different from mine, but it ought to be somewhere near to 0.590. (See the Extra for how close you would expect to be.)

\(\blacksquare\)

- A standard level of power in studies in child psychology is 80%
(0.80). According to what you have seen so far, is it necessary to
interview 20 working mothers, or more, or less? Explain briefly. Use
`power.t.test`

to obtain an appropriate sample size, under the assumptions you have made.

Solution

Our power is smaller than the target power of 0.8, so our sample size should be bigger than 20. (A bigger sample size will give a bigger power.)

Use `power.t.test`

, inserting a value for
`power`

, and leaving out `n`

(since that is what
we want the function to find for us):

`power.t.test(power = 0.80, delta = 12 - 11, sd = 2.3, type = "one.sample", alternative = "one.sided")`

```
##
## One-sample t test power calculation
##
## n = 34.10007
## delta = 1
## sd = 2.3
## sig.level = 0.05
## power = 0.8
## alternative = one.sided
```

Round the sample size up to 35. We need to interview 35 working mothers to get 80% power.

\(\blacksquare\)

Suppose we have two populations, which are supposing to both be normally distributed. The first population has a mean of 20, and the second population has a mean of 25. Both populations have the same SD of 9.

Suppose we take samples of 30 observations from each population. Use

`power.t.test`

to find the probability that a two-sample \(t\)-test will (correctly) reject the null hypothesis that the two populations have the same mean, in favour of a one-sided alternative. (Hint:`delta`

should be positive.)Find the sample size needed to obtain a power of 0.75. Comment briefly on whether your sample size makes sense.

Reproduce your power result from (a) by simulation. Some things to consider:

- you will need to generate
*two*columns of random samples, one from each population `t.test`

can also run a two-sample \(t\)-test by giving the two columns separately, rather than as we have done it before by having a column with all the measurements and a separate column saying which group they came from.- you will need to get the right
`alternative`

. With two columns input like this, the alternative is relative to the column you give first.

- Give an example of a situation where the simulation approach could
be used and
`power.t.test`

not.

My solutions:

- Suppose we take samples of 30 observations from each population. Use
`power.t.test`

to find the probability that a two-sample \(t\)-test will (correctly) reject the null hypothesis that the two populations have the same mean, in favour of a one-sided alternative. (Hint:`delta`

should be positive.)

Solution

Note first that `power.t.test`

works here because (i) the
populations have normal distributions, (ii) the SDs are the same, (iii)
the sample sizes from each population are the same.

Then, subtracting the means so that the bigger one is first (and thus
`delta`

is positive):

`power.t.test(n = 30, delta = 25-20, sd = 9, type = "two.sample", alternative = "one.sided")`

```
##
## Two-sample t test power calculation
##
## n = 30
## delta = 5
## sd = 9
## sig.level = 0.05
## power = 0.6849538
## alternative = one.sided
##
## NOTE: n is number in *each* group
```

The power is about 0.68. (Give about one more decimal if you prefer.)

You’ll recall that we don’t say *which* one side we want to
reject for. The way `power.t.test`

does it is to assume the
first population has the bigger mean, which is why I said to use a
positive `delta`

. (Otherwise you’ll get an extremely small
power, which corresponds to concluding that the population with the
smaller mean actually has the bigger one, which you could imagine is not
going to happen much.)

\(\blacksquare\)

- Find the sample size needed to obtain a power of 0.75. Comment briefly on whether your sample size makes sense.

Solution

The coding part is not meant to be hard: take what you just did, and
replace the `n`

with the `power`

you are shooting
for:

`power.t.test(power = 0.75, delta = 25-20, sd = 9, type = "two.sample", alternative = "one.sided")`

```
##
## Two-sample t test power calculation
##
## n = 35.55396
## delta = 5
## sd = 9
## sig.level = 0.05
## power = 0.75
## alternative = one.sided
##
## NOTE: n is number in *each* group
```

The required sample size is 36 in each group, which, were you doing
this on an assignment, you would need to say. (Remember also to round
*up* to the next whole number, since you want *at least*
that much power.)

As to why the answer makes sense, well, before we had a sample size of 30 in each group and a power about 0.68. Our target power here was slightly bigger than that, and to increase the power (all else fixed) we need to increase the sample size as well. The sample size needed is slightly bigger than 30, which matches with the power desired being slightly bigger than 0.68.

\(\blacksquare\)

- Reproduce your power result from (a) by simulation. Some things to consider:

- you will need to generate
*two*columns of random samples, one from each population `t.test`

can also run a two-sample \(t\)-test by giving the two columns separately, rather than as we have done it before by having a column with all the measurements and a separate column saying which group they came from.- you will need to get the right
`alternative`

. With two columns input like this, the alternative is relative to the column you give first.

Solution

With all that in mind, something like this:

```
tibble(sim = 1:1000) %>%
rowwise() %>%
mutate(sample1 = list(rnorm(30, 20, 9)),
sample2 = list(rnorm(30, 25, 9))) %>%
mutate(my_test = list(t.test(sample1, sample2, alternative = "less"))) %>%
mutate(p_val = my_test$p.value) %>%
count(p_val <= 0.05)
```

My estimated power is 0.675, entirely consistent with part (a).

Code notes:

- generate two columns of random samples, one from each population
- feed the two columns of samples into
`t.test`

. The first population actually had the smaller mean, so that is the one side that you would like the test to reject in favour of.

\(\blacksquare\)

- Give an example of a situation where the simulation approach could
be used and
`power.t.test`

not.

Solution

In my solution to (a) I said

“Note first that `power.t.test`

works here because (i) the
populations have normal distributions, (ii) the SDs are the same, (iii)
the sample sizes from each population are the same.”

These all need to be true in order to use `power.t.test`

,
so to answer this one, describe a situation where one of them fails,
eg:

- the population distributions are something other than normal
- the two populations have different SDs (spreads). I think
`power.t.test`

is using the pooled test behind the scenes. - the sample sizes are different (for example, sampling from one population might be more expensive than sampling from the other, and you might want to achieve a certain power with one sample size being twice the other).

It is also worth noting that you could make these changes in your simulation code without great difficulty:

- change
`rnorm`

to something else - change the 9’s in
`rnorm`

to something else (the one in`sample1`

different from the one in`sample2`

) - change the 30’s in
`rnorm`

to something else.

\(\blacksquare\)

This is a continuation of a previous question on this worksheet.

Another distribution that might be suitable for time spent talking to children is the

*gamma distribution*. Values from a gamma distribution are guaranteed to be greater than zero (which is suitable for times spent talking to children). As far as R is concerned, a random value from a gamma distribution is generated using the function`rgamma`

. This, for us, has three inputs: the number of values to generate, a parameter called`shape`

for which we will use the value 27.23, and a parameter called`scale`

for which we will use the value 0.44. Generate a random sample of 1000 values from a gamma distribution with the given parameter values. Hint: make sure that the inputs that need names actually have names.Find the mean and SD of your random sample of values from the gamma distribution. Are the mean and SD somewhere close to the mean and SD you used in your first power analysis? Explain (very) briefly. Hint: for this part and the next one, it will help to have your random gamma values in a dataframe, so you may want to create one first if you didn’t already.

Make a histogram of your random sample of gamma-distributed values, and comment briefly on its shape.

Suppose now that you want to assume that the data have a gamma distribution with this scale and shape (and thus the same mean and SD that you used in 1(a)). Modify your simulation to estimate the power of the \(t\)-test against a null mean of 11 against the alternative that the mean is greater than 11, using a sample size of 20, with the data coming from this gamma distribution.

Compare the estimated power from 1(a) (or 1(b)) and the previous part. Does the similarity or difference make sense? Explain briefly.

My solutions

- Another distribution that might be suitable for time spent talking
to children is the
*gamma distribution*. Values from a gamma distribution are guaranteed to be greater than zero (which is suitable for times spent talking to children). As far as R is concerned, a random value from a gamma distribution is generated using the function`rgamma`

. This, for us, has three inputs: the number of values to generate, a parameter called`shape`

for which we will use the value 27.23, and a parameter called`scale`

for which we will use the value 0.44. Generate a random sample of 1000 values from a gamma distribution with the given parameter values. Hint: make sure that the inputs that need names actually have names, and organize your results as a column in a dataframe.

Solution

Following the hint, something like this:

```
d <- tibble(g = rgamma(1000, shape = 27.23, scale = 0.44))
d
```

Displaying the dataframe as usual shows the first 10 rows, which once again will enable anyone reading your work to see that you have the right kind of thing.

I am apparently using my standard “temporary dataframe name” of
`d`

. Give it and the column in it whatever names you think
make sense. For example, you might use `gamma`

for the column
and `gammas`

(plural) for the dataframe.

\(\blacksquare\)

- Find the mean and SD of your random sample of values from the gamma distribution. Are the mean and SD somewhere close to the mean and SD you used in your first power analysis? Explain (very) briefly.

Solution

The first thing is the mean and SD of your random values. Starting
from a dataframe, this is `summarize`

:

`d %>% summarize(g_mean = mean(g), g_sd = sd(g))`

The mean and SD we used in the first simulation of power (in 1(a))
were 12 and 2.3 respectively, and these (for me) are very close to that.
(Calculating the mean and SD and saying just “these are close” will
confuse your reader, so make sure you say what they are supposed to be
close *to*.)

\(\blacksquare\)

- Make a histogram of your random sample of gamma-distributed values, and comment briefly on its shape.

Solution

Start with your dataframe:

`ggplot(d, aes(x = g)) + geom_histogram(bins = 10)`

This is very slightly skewed to the right, but it is not that far away from being normal.

\(\blacksquare\)

- Suppose now that you want to assume that the data have a gamma distribution with this scale and shape (and thus the same mean and SD that you used in 1(a)). Modify your simulation to estimate the power of the \(t\)-test against a null mean of 11 against the alternative that the mean is greater than 11, using a sample size of 20, with the data coming from this gamma distribution.

Solution

`power.t.test`

assumes normally-distributed data, which we
no longer want to assume, and so we cannot use that. (We are still doing
a \(t\)-test, so that part is not a
problem.)

In the simulation from 1(a), there is actually only one line that
needs to change, the third one, which generates the random samples from
the truth. Instead of `rnorm`

, this should have
`rgamma`

, using the shape and scale values from above, but
with a sample size of 20. The test is the same, with the same
hypotheses, so nothing else changes:

```
tibble(sim = 1:1000) %>%
rowwise() %>%
mutate(my_sample = list(rgamma(20, shape = 27.23, scale = 0.44))) %>%
mutate(t_test = list(t.test(my_sample, mu = 11, alternative = "greater"))) %>%
mutate(p_val = t_test$p.value) %>%
count(p_val <= 0.05)
```

Copy, paste, and edit from what you did in 1(a). Here, change the
`rnorm`

to `rgamma`

and put the right numbers
in.

My estimated power is 0.586.

\(\blacksquare\)

- Compare the estimated power from 1(a) (or 1(b)) and the previous part. Does the similarity or difference make sense? Explain briefly.

Solution

My simulated power values were 0.606 from the normal distribution (or 0.591 by calculation) and 0.586 from the gamma distribution. These are similar, and indicate that the power of the test does not depend much on the exact distribution that generates the data.

Does this make sense? Well, we saw from the histogram that this gamma distribution is actually rather close to looking normal in shape, and so the \(t\)-test should give about the same results whether we assume a normal or a gamma distribution. Thus the power should be about the same, as it is. A second, important, consideration is that with a sample size of 20, a slightly non-normal distribution, as we have, should have next to no impact on the \(t\)-test, and thus also on the power. The results we saw all make sense.

\(\blacksquare\)

You might suppose that if the mean is bigger, and this is a random variable that must be positive, that the SD might also be bigger than in the previous research, but unless you have data from a “pilot study”, that is, a small version of the study you are going to do, you have no way to supply a suitable bigger value, so the best you can do is to stick with the value you have.↩︎

The

`set.seed`

is actually part of “base R”, so doesn’t need the`tidyverse`

to run. If you are in the probably good habit of running`library(tidyverse)`

before you do anything else, that way also works here.↩︎The one I’m thinking of from lecture didn’t have an

`alternative`

because the default is two-sided, which is what it was. The two-sample example from lecture was*one*-sided, because the new method for teaching reading was supposed to be*better*, so that is the one to borrow from.↩︎