Worksheet 5

Published

October 6, 2024

Questions are below. My solutions will be available after the tutorials are all finished. The whole point of these worksheets is for you to use your lecture notes to figure out what to do. In tutorial, the TAs are available to guide you if you get stuck. Once you have figured out how to do this worksheet, you will be prepared to tackle the assignment that depends on it.

If you are not able to finish in an hour, I encourage you to continue later with what you were unable to finish in tutorial.

Child psychology

According to research in child psychology, working mothers spend a mean time of 11 minutes per day talking to their children, with a standard deviation of 2.3 minutes. Your research suggests that the mean should be greater than that, and you are planning a study of working mothers (who work outside the home) to see how many minutes they spend talking to their children. You think that the mean should be 12 minutes per day, and you want to design your study so that a mean of 11 should be rejected with a reasonably high probability.

If you interview 20 working mothers, what is the power of your test if your thought about the mean is correct? Estimate by simulation. Assume that the time that a mother spends talking to her children has a normal distribution.

Explain briefly why power.t.test can be used to calculate an answer to this problem, and use it to check your result in the previous part.

A standard level of power in studies in child psychology is 80% (0.80). According to what you have seen so far, is it necessary to interview 20 working mothers, or more, or less? Explain briefly. Use power.t.test to obtain an appropriate sample size, under the assumptions you have made.

Two-sample power

Suppose we have two populations, which are supposing to both be normally distributed. The first population has a mean of 20, and the second population has a mean of 25. Both populations have the same SD of 9.

Suppose we take samples of 30 observations from each population. Use power.t.test to find the probability that a two-sample \(t\)-test will (correctly) reject the null hypothesis that the two populations have the same mean, in favour of a one-sided alternative. (Hint: delta should be positive.)

Find the sample size needed to obtain a power of 0.75. Comment briefly on whether your sample size makes sense.

Reproduce your power result from (a) by simulation. Some things to consider:

you will need to generate two columns of random samples, one from each population
t.test can also run a two-sample \(t\)-test by giving the two columns separately, rather than as we have done it before by having a column with all the measurements and a separate column saying which group they came from.
you will need to get the right alternative. With two columns input like this, the alternative is relative to the column you give first.

Give an example of a situation where the simulation approach could be used and power.t.test not.

Child psychology, revisited

This is a continuation of an earlier question about talking to children.

Another distribution that might be suitable for time spent talking to children is the gamma distribution. Values from a gamma distribution are guaranteed to be greater than zero (which is suitable for times spent talking to children). As far as R is concerned, a random value from a gamma distribution is generated using the function rgamma. This, for us, has three inputs: the number of values to generate, a parameter called shape for which we will use the value 27.23, and a parameter called scale for which we will use the value 0.44. Generate a random sample of 1000 values from a gamma distribution with the given parameter values. Hint: make sure that the inputs that need names actually have names, and organize your results as a column in a dataframe.

Find the mean and SD of your random sample of values from the gamma distribution. Are the mean and SD somewhere close to the mean and SD you used in your first power analysis? Explain (very) briefly.

Make a histogram of your random sample of gamma-distributed values, and comment briefly on its shape.

Suppose now that you want to assume that the data have a gamma distribution with this scale and shape (and thus the same mean and SD that you used previously). Modify your simulation to estimate the power of the \(t\)-test against a null mean of 11 against the alternative that the mean is greater than 11, using a sample size of 20, with the data coming from this gamma distribution.

Compare the estimated power from earlier and the previous part. Does the similarity or difference make sense? Explain briefly.