Worksheet 5

Published

October 3, 2025

Questions are below. My solutions will be available after the tutorials are all finished. The whole point of these worksheets is for you to use your lecture notes to figure out what to do. In tutorial, the TAs are available to guide you if you get stuck. Once you have figured out how to do this worksheet, you will be prepared to tackle the assignment that depends on it.

If you are not able to finish in an hour, I encourage you to continue later with what you were unable to finish in tutorial.

Packages you’ll need:

library(tidyverse)

Power and the exponential distribution

The exponential distribution is often used to model lifetimes of things like light bulbs or electronic components. It is not normal (bell-curved) in shape. In R, the function rexp will draw a random sample from an exponential distribution. It has two inputs, the sample size, and the “rate” (which is one over the mean).

  1. Draw a random sample of 100 observations from an exponential distribution with mean 5, saved as one column of a dataframe, and make a histogram of your sample. In what way is the distribution not normal in shape? Hint 1: tibble with the same sort of code as in a mutate will create a dataframe from scratch. Hint 2: use a set.seed first so that your random sample won’t change from one run to the next. You can use any number as input to the set.seed.
  1. For your sample, obtain a bootstrap sampling distribution of the sample mean. Comment briefly on its shape.
  1. Explain briefly why it makes sense to use a (one-sample) \(t\)-test for the population mean, even though the population does not appear to have a normal distribution.
  1. Estimate by simulation the power of a one-sample \(t\)-test to reject the null hypothesis that the population mean is 6 (against a two-sided alternative), when sampling from an exponential distribution with mean 5 and using a sample of size 100.
  1. Suppose our aim is to estimate (by simulation) the sample size needed to get a power of 0.75 in this same situation. By copying and pasting your code and making a small edit to it, run another simulation that will go some way towards meeting that aim. (This was previously an assignment question, so I made it definite about what to do, but on a worksheet, feel free to experiment further after you have tried your one more simulation.)

Child psychology

According to research in child psychology, working mothers spend a mean time of 11 minutes per day talking to their children, with a standard deviation of 2.3 minutes. Your research suggests that the mean should be greater than that, and you are planning a study of working mothers (who work outside the home) to see how many minutes they spend talking to their children. You think that the mean should be 12 minutes per day, and you want to design your study so that a mean of 11 should be rejected with a reasonably high probability.

  1. If you interview 20 working mothers, what is the power of your test if your thought about the mean is correct? Estimate by simulation. Assume that the time that a mother spends talking to her children has a normal distribution.
  1. Explain briefly why power.t.test can be used to calculate an answer to this problem, and use it to check your result in the previous part.
  1. A standard level of power in studies in child psychology is 80% (0.80). According to what you have seen so far, is it necessary to interview 20 working mothers, or more, or less? Explain briefly. Use power.t.test to obtain an appropriate sample size, under the assumptions you have made.

Child psychology, revisited

This is a continuation of an earlier question about talking to children.

  1. Another distribution that might be suitable for time spent talking to children is the gamma distribution. Values from a gamma distribution are guaranteed to be greater than zero (which is suitable for times spent talking to children). As far as R is concerned, a random value from a gamma distribution is generated using the function rgamma. This, for us, has three inputs: the number of values to generate, a parameter called shape for which we will use the value 27.23, and a parameter called scale for which we will use the value 0.44. Generate a random sample of 1000 values from a gamma distribution with the given parameter values. Hint: make sure that the inputs that need names actually have names, and organize your results as a column in a dataframe.
  1. Find the mean and SD of your random sample of values from the gamma distribution. Are the mean and SD somewhere close to the mean and SD you used in your first power analysis? Explain (very) briefly.
  1. Make a histogram of your random sample of gamma-distributed values, and comment briefly on its shape.
  1. Suppose now that you want to assume that the data have a gamma distribution with this scale and shape (and thus the same mean and SD that you used previously). Modify your simulation to estimate the power of the \(t\)-test against a null mean of 11 against the alternative that the mean is greater than 11, using a sample size of 20, with the data coming from this gamma distribution.
  1. Compare the estimated power from earlier and the previous part. Does the similarity or difference make sense? Explain briefly.