Worksheet 5

Published

February 2, 2024

Questions are below. My solutions are below all the question parts for a question; scroll down if you get stuck. There is extra discussion below that for some of the questions; you might find that interesting to read, maybe after tutorial.

For these worksheets, you will learn the most by spending a few minutes thinking about how you would answer each question before you look at my solution. There are no grades attached to these worksheets, so feel free to guess: it makes no difference at all how wrong your initial guess is!

If you don’t get to the end in tutorial, it’s a good idea to finish them on your own time this week.

1 Senior phones revisited

In a previous worksheet, we investigated the mean length of phone calls made by seniors, using a \(t\)-test. The data were in http://ritsokiguess.site/datafiles/senior_phone.csv. This time, we investigate whether a \(t\)-test was really the right thing to do.

  1. Read in and display (some of) the data. (Copy and paste what you did before.)

  2. Draw an appropriate graph of these data. Observe the skewed shape.

  3. Previously, we wondered whether the sample size was large enough to overcome that (considerable) skewness. Address this question by obtaining a bootstrap sampling distribution of the sample mean. What do you conclude?

My solutions

  1. Read in and display (some of) the data. (Copy and paste what you did before.)

Solution

my_url <- "http://ritsokiguess.site/datafiles/senior_phone.csv"
calls <- read_csv(my_url)
Rows: 200 Columns: 1
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (1): call_length

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
calls

One column, called call_length, containing the call lengths, evidently in whole numbers of minutes, and 200 rows, one for each senior customer.

\(\blacksquare\)

  1. Draw an appropriate graph of these data. Observe the skewed shape.

Solution

One quantitative variable, so a histogram:

ggplot(calls, aes(x = call_length)) + geom_histogram(bins = 8)

That is certainly skewed to the right.

A one-sample boxplot is also possible:

ggplot(calls, aes(x = 1, y = call_length)) + geom_boxplot()

with the same conclusion about shape.

\(\blacksquare\)

  1. Previously, we wondered whether the sample size was large enough to overcome that (considerable) skewness. Address this question by obtaining a bootstrap sampling distribution of the sample mean. What do you conclude?

Solution

The steps are:

  • set up a dataframe with 1000 (or more) rows to hold the simulations
  • work rowwise
  • generate a bootstrap random sample on each row
  • work out the mean of each sample
  • draw a graph of the results.

The following code will do that:

tibble(sim = 1:1000) %>% 
  rowwise() %>% 
  mutate(my_sample = 
           list(sample(calls$call_length, replace = TRUE))) %>% 
  mutate(my_mean = mean(my_sample)) -> b
ggplot(b, aes(x = my_mean)) + geom_histogram(bins = 10)

Take a look at that graph and make a call about whether it is close to normal. If you think it is, the \(t\)-test is good because the sample size is big enough. (I think it is.) If there is an obvious way that it’s not normal, then the sample size is not big enough yet. For example, if the sample is skewed right like this one, a sample that is not big enough will result in a bootstrap sampling distribution that is also skewed to the right, but not as much.

There is still a judgment call here, but not nearly as big a one as trying to guess whether the sample size is big enough given the amount of skewness present in the original data.

\(\blacksquare\)

2 Child psychology

According to research in child psychology, working mothers spend a mean time of 11 minutes per day talking to their children, with a standard deviation of 2.3 minutes. Your research suggests that the mean should be greater than that, and you are planning a study of working mothers (who work outside the home) to see how many minutes they spend talking to their children. You think that the mean should be 12 minutes per day, and you want to design your study so that a mean of 11 should be rejected with a reasonably high probability.

  1. If you interview 20 working mothers, what is the power of your test if your thought about the mean is correct? Estimate by simulation. Assume that the time that a mother spends talking to her children has a normal distribution.

  2. Explain briefly why power.t.test can be used to calculate an answer to this problem, and use it to check your result in the previous part.

  3. A standard level of power in studies in child psychology is 80% (0.80). According to what you have seen so far, is it necessary to interview 20 working mothers, or more, or less? Explain briefly. Use power.t.test to obtain an appropriate sample size, under the assumptions you have made.

My solutions

According to research in child psychology, working mothers spend a mean time of 11 minutes per day talking to their children, with a standard deviation of 2.3 minutes. Your research suggests that the mean should be greater than that, and you are planning a study of working mothers (who work outside the home) to see how many minutes they spend talking to their children. You think that the mean should be 12 minutes per day, and you want to design your study so that a mean of 11 should be rejected with a reasonably high probability.

  1. If you interview 20 working mothers, what is the power of your test if your thought about the mean is correct? Estimate by simulation. Assume that the time that a mother spends talking to her children has a normal distribution.

Solution

For me, to kick off:

set.seed(457299)

The advantage to doing this is that every time you re-run code in your notebook (starting from the beginning and going all the way through), the set of random numbers in it will be the same, so that if you have talked about a set of random numbers that came out a certain way, your description won’t need to change if you run things again. The number inside the set.seed tells the random number generator where to start, and can be anything; some people use 1 or 123. Mine is an old phone number.

Now to work. The true mean (as far as you are concerned) is 12, so sample from a normal distribution with that mean (and the same standard deviation as the other research, since you don’t have a better value).1 The null mean is the one from the previous research, namely 11 (the one you want to reject), and because you want to prove that the mean is in fact greater than 11, you need an upper-tailed alternative. That leads to this:

library(tidyverse)

Then2 follow the familiar six lines of code:

tibble(sim = 1:1000) %>% 
  rowwise() %>% 
  mutate(my_sample = list(rnorm(20, 12, 2.3))) %>% 
  mutate(t_test = list(t.test(my_sample, mu = 11, alternative = "greater"))) %>% 
  mutate(p_value = t_test$p.value) %>% 
  count(p_value <= 0.05)

The estimated power is \(606/1000 = 0.606\). By interviewing 20 working mothers, you are likely to be able to reject the null hypothesis that the mean time spent talking to their children is 11 minutes per day, in favour of an alternative that the mean is larger than this.

Your answer will likely differ from mine by a bit (see below for how big “a bit” might be), but it should be somewhere near this. Indeed, if your answer is exactly the same as mine or as one of your classmates, this may cause the grader to look again at your code to make sure that what you did was sensible.

The code:

  • set up a data frame with places for 1000 (or 10,000 or whatever) simulated samples
  • work one row at a time from here forward
  • draw a random sample of size 20 from the truth (as far as you know what it is) for each row
  • run a \(t\)-test testing whether the mean is 11 for each row (which you know, or at least think, it isn’t, but the onus is on your research to prove itself over the current state of affairs)
  • get the P-value from each test
  • count how many of those P-values are less than or equal to 0.05. The number of times this is true, divided by your number of simulations, is your estimate of the power.

\(\blacksquare\)

  1. Explain briefly why power.t.test can be used to calculate an answer to this problem, and use it to check your result in the previous part.

Solution

We can use power.t.test for two reasons: (i) we actually are doing a \(t\)-test (for a population mean), and (ii) we are assuming that the times spent talking to children have a normal distribution.

For power.t.test we need to specify the sample size n, the difference between the true and null means delta, the population SD sd, the type of test (a one-sample \(t\)), and whether it is one- or two-sided:

power.t.test(n = 20, delta = 12 - 11, sd = 2.3, type = "one.sample", alternative = "one.sided")

     One-sample t test power calculation 

              n = 20
          delta = 1
             sd = 2.3
      sig.level = 0.05
          power = 0.5908187
    alternative = one.sided

If you are like me, you probably typed alternative = "greater", as you would with t.test, but then the error message will tell you what you actually need, either one.sided or two.sided. You need to be aware that information about the one-sidedness of your test has to find its way into power.t.test somehow; it turns out that it doesn’t matter which kind of one-sided test you have (greater or less), just that you have a one-sided test at all.3

The power here is actually 0.590. My simulated value of 0.606 was close. Yours will probably be different from mine, but it ought to be somewhere near to 0.590. (See the Extra for how close you would expect to be.)

\(\blacksquare\)

  1. A standard level of power in studies in child psychology is 80% (0.80). According to what you have seen so far, is it necessary to interview 20 working mothers, or more, or less? Explain briefly. Use power.t.test to obtain an appropriate sample size, under the assumptions you have made.

Solution

Our power is smaller than the target power of 0.8, so our sample size should be bigger than 20. (A bigger sample size will give a bigger power.)

Use power.t.test, inserting a value for power, and leaving out n (since that is what we want the function to find for us):

power.t.test(power = 0.80, delta = 12 - 11, sd = 2.3, type = "one.sample", alternative = "one.sided")

     One-sample t test power calculation 

              n = 34.10007
          delta = 1
             sd = 2.3
      sig.level = 0.05
          power = 0.8
    alternative = one.sided

Round the sample size up to 35. We need to interview 35 working mothers to get 80% power.

\(\blacksquare\)

3 Two-sample power

Suppose we have two populations, which are supposing to both be normally distributed. The first population has a mean of 20, and the second population has a mean of 25. Both populations have the same SD of 9.

  1. Suppose we take samples of 30 observations from each population. Use power.t.test to find the probability that a two-sample \(t\)-test will (correctly) reject the null hypothesis that the two populations have the same mean, in favour of a one-sided alternative. (Hint: delta should be positive.)

  2. Find the sample size needed to obtain a power of 0.75. Comment briefly on whether your sample size makes sense.

  3. Reproduce your power result from (a) by simulation. Some things to consider:

  • you will need to generate two columns of random samples, one from each population
  • t.test can also run a two-sample \(t\)-test by giving the two columns separately, rather than as we have done it before by having a column with all the measurements and a separate column saying which group they came from.
  • you will need to get the right alternative. With two columns input like this, the alternative is relative to the column you give first.
  1. Give an example of a situation where the simulation approach could be used and power.t.test not.

My solutions:

  1. Suppose we take samples of 30 observations from each population. Use power.t.test to find the probability that a two-sample \(t\)-test will (correctly) reject the null hypothesis that the two populations have the same mean, in favour of a one-sided alternative. (Hint: delta should be positive.)

Solution

Note first that power.t.test works here because (i) the populations have normal distributions, (ii) the SDs are the same, (iii) the sample sizes from each population are the same.

Then, subtracting the means so that the bigger one is first (and thus delta is positive):

power.t.test(n = 30, delta = 25-20, sd = 9, type = "two.sample", alternative = "one.sided")

     Two-sample t test power calculation 

              n = 30
          delta = 5
             sd = 9
      sig.level = 0.05
          power = 0.6849538
    alternative = one.sided

NOTE: n is number in *each* group

The power is about 0.68. (Give about one more decimal if you prefer.)

You’ll recall that we don’t say which one side we want to reject for. The way power.t.test does it is to assume the first population has the bigger mean, which is why I said to use a positive delta. (Otherwise you’ll get an extremely small power, which corresponds to concluding that the population with the smaller mean actually has the bigger one, which you could imagine is not going to happen much.)

\(\blacksquare\)

  1. Find the sample size needed to obtain a power of 0.75. Comment briefly on whether your sample size makes sense.

Solution

The coding part is not meant to be hard: take what you just did, and replace the n with the power you are shooting for:

power.t.test(power = 0.75, delta = 25-20, sd = 9, type = "two.sample", alternative = "one.sided")

     Two-sample t test power calculation 

              n = 35.55396
          delta = 5
             sd = 9
      sig.level = 0.05
          power = 0.75
    alternative = one.sided

NOTE: n is number in *each* group

The required sample size is 36 in each group, which, were you doing this on an assignment, you would need to say. (Remember also to round up to the next whole number, since you want at least that much power.)

As to why the answer makes sense, well, before we had a sample size of 30 in each group and a power about 0.68. Our target power here was slightly bigger than that, and to increase the power (all else fixed) we need to increase the sample size as well. The sample size needed is slightly bigger than 30, which matches with the power desired being slightly bigger than 0.68.

\(\blacksquare\)

  1. Reproduce your power result from (a) by simulation. Some things to consider:
  • you will need to generate two columns of random samples, one from each population
  • t.test can also run a two-sample \(t\)-test by giving the two columns separately, rather than as we have done it before by having a column with all the measurements and a separate column saying which group they came from.
  • you will need to get the right alternative. With two columns input like this, the alternative is relative to the column you give first.

Solution

With all that in mind, something like this:

tibble(sim = 1:1000) %>% 
  rowwise() %>% 
  mutate(sample1 = list(rnorm(30, 20, 9)),
         sample2 = list(rnorm(30, 25, 9))) %>% 
  mutate(my_test = list(t.test(sample1, sample2, alternative = "less"))) %>% 
  mutate(p_val = my_test$p.value) %>% 
  count(p_val <= 0.05)

My estimated power is 0.675, entirely consistent with part (a).

Code notes:

  • generate two columns of random samples, one from each population
  • feed the two columns of samples into t.test. The first population actually had the smaller mean, so that is the one side that you would like the test to reject in favour of.

\(\blacksquare\)

  1. Give an example of a situation where the simulation approach could be used and power.t.test not.

Solution

In my solution to (a) I said

“Note first that power.t.test works here because (i) the populations have normal distributions, (ii) the SDs are the same, (iii) the sample sizes from each population are the same.”

These all need to be true in order to use power.t.test, so to answer this one, describe a situation where one of them fails, eg:

  • the population distributions are something other than normal
  • the two populations have different SDs (spreads). I think power.t.test is using the pooled test behind the scenes.
  • the sample sizes are different (for example, sampling from one population might be more expensive than sampling from the other, and you might want to achieve a certain power with one sample size being twice the other).

It is also worth noting that you could make these changes in your simulation code without great difficulty:

  • change rnorm to something else
  • change the 9’s in rnorm to something else (the one in sample1 different from the one in sample2)
  • change the 30’s in rnorm to something else.

\(\blacksquare\)

Footnotes

  1. You might suppose that if the mean is bigger, and this is a random variable that must be positive, that the SD might also be bigger than in the previous research, but unless you have data from a “pilot study”, that is, a small version of the study you are going to do, you have no way to supply a suitable bigger value, so the best you can do is to stick with the value you have.↩︎

  2. The set.seed is actually part of “base R”, so doesn’t need the tidyverse to run. If you are in the probably good habit of running library(tidyverse) before you do anything else, that way also works here.↩︎

  3. The one I’m thinking of from lecture didn’t have an alternative because the default is two-sided, which is what it was. The two-sample example from lecture was one-sided, because the new method for teaching reading was supposed to be better, so that is the one to borrow from.↩︎