Worksheet 5

Published

February 3, 2025

Packages

library(tidyverse)
library(marginaleffects)
library(survival)
# library(survminer)

Hypernephroma

Hypernephroma, also known as renal cell carcinoma or kidney cancer, is a type of cancer that starts in the kidneys. It’s one of the most common types of kidney cancer in adults. Source.

The 33 patients in http://ritsokiguess.site/datafiles/lee_hypernephroma.csv all had hypernephromia and were all treated with chemotherapy, immunotherapy, and hormonal therapy. There are a lot of columns in our data, among them:

  • age of patient in years
  • gender of patient, noted as F or M.
  • date of treatment_start, as text (month - day - year)
  • date of treatment_end (last followup or date of death), as text
  • status of patient when last seen
  • the last five columns are the results of skin tests taken at the start of treatment.

The researchers wanted to see whether any of the skin test results, as well as the age and gender of the patient, helped in predicting survival time after the start of treatment.

  1. Read in and display (some of) the data.
  1. Convert the treatment start and treatment end dates into actual dates.
  1. Work out the number of days between the start and the end of the treatment. Check that your result is indeed a number of days. Turn it, if necessary, into an actual number (with as.numeric), for plots later.
  1. Create a suitable response variable y for a Cox proportional-hazards model, and display it. (You don’t need to save it.) Does it distinguish correctly between patients whose treatment_end was their date of death, and the patients who were still alive at this point?
  1. Fit a Cox proportional-hazards model, predicting survival time from age, gender, and the five skin test results. Display the summary of the model. (Hint: copy your Surv from above into your modelling function.)
  1. (2 points) Use step to remove explanatory variables that do not help to predict survival time. Save and display the model that comes out of step. (Some of the explanatory variables will only be significant at 0.10, not 0.05. Keep those.)
  1. Plot predicted survival probabilities over time for five representative ages. Hint: your procedure will use representative values for the other variables, so you do not need to supply values for those.
  1. Describe the effect of increasing age on your plot, and explain briefly how this is consistent with the summary output from your model.
  1. Repeat the previous two questions for mumps skin test values.