Worksheet 2

Published

January 9, 2025

Packages

library(tidyverse)
library(marginaleffects)

Low birth weight

Low birth weight, defined as a baby that weighs less than 2500 grams when it is born, is an outcome that is of concern because infant mortality rates and birth defect rates are very high for low birth weight babies. The mother’s behaviour during pregnancy is believed to have a great effect on whether the baby is of normal or low birth weight.

The variables of interest to us are:

  • low: underweight (low birth weight, under 2500 g) or normalweight (normal birth weight, 2500 g or larger).
  • lwt: the mother’s weight at her last menstrual period (in pounds)
  • smoke: whether or not the mother smoked during the pregnancy (Yes or No).

The data, with these variables and a number of others, are in http://ritsokiguess.site/datafiles/lowbwt.csv.

  1. Read in and display some of the data.
  1. Fit a logistic regression predicting whether or not the baby is of low birth weight, as it depends on the mother’s weight at last menstrual period and whether or not the mother smoked. Display the results. Hint: the response variable is not zero and one, so it needs to be a factor in the model.
  1. How do you know that your model is predicting the probability of a low birth weight baby, as opposed to a normal birth weight baby? Explain briefly.
  1. Should either of the explanatory variables be removed from the logistic regression? Explain briefly.
  1. Make a plot showing the fitted probability of a baby being of low birth weight as it depends on the mother’s weight at last menstrual period and whether or not the mother smokes. Hint: this is one line of code, using something from the marginaleffects package. Put the quantitative explanatory variable first.
  1. In your plot, describe the effects of the two explanatory variables on the probability of a low birth weight baby.

Grain beetles

A number of grain beetles were exposed to ethylene oxide at one of ten different concentrations (in mg/l), in column conc. For each concentration, the number of beetles affected, and the total number exposed to that concentration, were recorded. The data are in http://ritsokiguess.site/datafiles/beetle.csv. Our aim is to see whether a beetle being affected or not depends on the concentration of ethylene oxide.

  1. Read in and display the data. (You should see all of the data values this time.)
  1. In this dataframe, does each row refer to one beetle or more than one beetle? Explain briefly.
  1. Fit a suitable logistic regression for predicting the probability that a beetle is affected, as it depends on the concentration of ethylene oxide, and display the output from the logistic regression.
  1. Is there a significant effect of concentration? If there is, does a larger concentration go with a larger or smaller probability of being affected? Explain briefly.
  1. For concentrations 15, 20, and 25, use your model to predict the probability of being affected. Clearly display your predictions along with the concentrations they are predictions for. Are the predictions consistent with what you said in the previous part? Explain briefly.
  1. Make a plot showing the predicted probability of being affected for concentrations covering the range of the data. On your plot, show confidence intervals for the probability for each concentration.
  1. On your graph, would you say that the probabilities are accurately estimated, or not? Explain briefly.