Worksheet 3

Published

January 7, 2025

Packages

library(tidyverse)
library(marginaleffects)
library(MASS, exclude = "select")

Log odds and poisoning rats

In one of the examples from lecture, we learned about modelling the probability that a rat would live as it depended on the dose of a poison. Some of the output from the logistic regression is as shown below:

summary(rat2.1)


Call:
glm(formula = response ~ dose, family = "binomial", data = rat2)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   2.3619     0.6719   3.515 0.000439 ***
dose         -0.9448     0.2351  -4.018 5.87e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 27.530  on 5  degrees of freedom
Residual deviance:  2.474  on 4  degrees of freedom
AIC: 18.94

Number of Fisher Scoring iterations: 4

For the calculations below, I suggest you use R as a calculator. If you prefer, use your actual calculator, but then your numerical answers will need to be sufficiently close to being correct in order to have gotten any credit (when this was an assignment problem).

Using the summary output, obtain a prediction for a dose of 3.2 units. What precisely is this a prediction of?

Convert your prediction into a predicted probability that a rat given this dose will live. Hint: if probability is denoted \(p\) and odds \(d\), we saw in class that \(d = p/(1-p)\). It follows (by algebra that I am doing for you) that \(p = d/(1+d)\).

In the output given at the top of this question, there is a number \(-0.9448\). What is the interpretation of this number? (If you prefer, you can instead interpret the exp of this number.)

Carrots

In a consumer study, 103 consumers scored their preference of 12 Danish carrot types on a scale from 1 to 7, where 1 represents “strongly dislike” and 7 represents “strongly like”. The consumers also rated each carrot type on some other features, and some demographic information was collected. The data are in http://ritsokiguess.site/datafiles/carrots_pref.csv. We will be predicting preference score from the type of carrot and how often the consumer eats carrots (the latter treated as quantitative):

Frequency: how often the consumer eats carrots: 1: once a week or more, 2: once every two weeks, 3: once every three weeks, 4: at least once month, 5: less than once a month. (We will treat this as quantitative.)
Preference: consumer score on a seven-point scale, 7 being best
Product: type of carrot (there are 12 different named types).

Read in and display (some of) the data.

Why would ordinal logistic regression be a sensible method of analysis here?

Fit an ordinal logistic regression to this dataset. You do not need to display any output from this model yet. Hint: Preference is actually categorical, even though it looks like a number, so you should make sure that R treats it as categorical.

Can any explanatory variables be removed? Explain briefly.

If necessary, fit an improved model. (If not, explain briefly why not.)

We will be predicting probabilities of each rating category for each of the explanatory variables remaining in the best model. Make a dataframe that includes all the different types of carrot, and the values 1 and 5 for eat_carrots if that is in your best model. Hint: you can use count to get all the levels of a categorical variable.

Predict the probability of a customer giving each carrot type each preference score. Display your results in such a way that you can easily compare the probability of each score for different types of carrot.

There was a significant difference in preference scores among the different types of carrot. What do your predictions tell you about why that is? Explain briefly.