library(tidyverse)
library(marginaleffects)
library(nnet)
# library(MASS, exclude = "select")
Worksheet 4
Packages
Choice-box
A psychology experiment began by showing a video in which four German children demonstrated how to use a device called a “choice-box”, which consisted of three pipes. Three of the children in the video used pipe #1, demonstrating how to throw a ball into the pipe and receive a toy from a dispenser. The other child in the video used pipe #2, also throwing a ball into the pipe and receiving a toy from the dispenser. Pipe #3 was never used on the video.
The pipes on the choice-box were actually different colours, and different versions of the video were used in which the identity of pipes #1, #2, and #3 were varied at random, and the order of children using pipes #1 and #2 on the video were also varied at random: sometimes the three children demonstrating the same pipe appeared first, and sometimes the one child demonstrating the other pipe appeared first.
The 629 subjects of the experiment, who were other children of various ages, were each given one ball to use in the choice-box. The experimenter noted which pipe each subject threw the ball into, and how it related to the pipes used in the video that subject had watched. These are in the column y
:
majority
: the subject threw their ball into the pipe demonstrated by three children on their video (what I called pipe #1).minority
: the subject threw their ball into the pipe demonstrated by only one child on their video (what I called pipe #2).unchosen
: the subject threw their ball into the pipe demonstrated by none of the children on their video (what I called pipe #3). I should probably point out that these subjects got a toy from the dispenser as well.
The aim of the experiment was to see whether the subjects were influenced by what happened on the video they saw: for example, was a subject more likely to choose the pipe demonstrated three times on their video? The experimenters also recorded the gender, age, and culture of each subject (coded as C1 through C8), along with whether the video showed three children using pipe #1 first, or one child using pipe #2 first. Did these other variables have an effect on which pipe a subject chose? This kind of experiment might shed some light about how children are influenced by what they see and what changes it.
The data are in http://ritsokiguess.site/datafiles/Boxes.csv.
- Read in and display (some of) the data.
- What assumption is made about the response categories in order to use
multinom
from packagennet
?
- Fit an appropriate model for predicting the (treated as unordered) category of
y
from the other variables. Include a squared term inage
. You don’t need to display any results.
- To find out what, if anything, you can remove from your model, use
step
. The input tostep
is a model (here, the one you fitted in the previous part). The output fromstep
is another model, the one obtained by removing everything that can be removed. Save this model. Runningstep
displays some additional output, showing you what it is doing. (You might find that there is a lot of additional output; that was fine to hand in on this assignment.)
- For your best model, create a dataframe for predicting the probability that a child will choose the majority, minority, or unchosen pipe, for ages 5 through 13. What values have been used for the other explanatory variables?
- Calculate and display your predictions side by side with the corresponding explanatory variable values. Arrange your predictions in a way that makes them easier to compare.
- Plot the predictions as they depend on age. Hint: use the simplified procedure shown in lecture (which should also be in the slides).
NBA schedule
The NBA (National Basketball Association) runs North America’s major basketball league, whose games are played from October to April. The 2023-2024 schedule is at http://ritsokiguess.site/datafiles/nba_sched.csv. (This question came from an assignment where this was the current season.)
- Read in and display some of the data.
- NBA games are played on different days of the week. Which day of the week has the most games, and which day of the week has the fewest? Use the tools you saw in lecture, in this course and STAC32, to work this out. (Hint: does it seem to matter that the year only has two digits here?)
- You have a friend who lives in Auckland, New Zealand, who is a big basketball fan. They have a streaming package that enables them to watch any NBA game live. They get home from work at 4:00pm (local time) every day. What are some games they would have been able to watch from start to finish as they happen? Use tools we have seen in lecture to find this out. Hints below:
- your friend needs to know about games that start at 4:00pm or later Auckland time (16:00 or later).
- you can use
unite
to glue a date and time together as text - if your time does not have seconds, omit the
s
in the appropriate function - when you create a date-time that needs to be in a certain timezone, add the timezone when you create it
America/Toronto
will do for Eastern time; useOlsonNames()
to get a list of all the time zone names that R knows about. (The output fromOlsonNames()
is long, so just find what you need and use that.)