library(tidyverse)
library(marginaleffects)
library(survival)
# library(survminer)Worksheet 5
Packages
Hypernephroma
Hypernephroma, also known as renal cell carcinoma or kidney cancer, is a type of cancer that starts in the kidneys. It’s one of the most common types of kidney cancer in adults. Source.
The 33 patients in http://ritsokiguess.site/datafiles/lee_hypernephroma.csv all had hypernephromia and were all treated with chemotherapy, immunotherapy, and hormonal therapy. There are a lot of columns in our data, among them:
ageof patient in yearsgenderof patient, noted as F or M.- date of
treatment_start, as text (month - day - year) - date of
treatment_end(last followup or date of death), as text statusof patient when last seen- the last five columns are the results of skin tests taken at the start of treatment.
The researchers wanted to see whether any of the skin test results, as well as the age and gender of the patient, helped in predicting survival time after the start of treatment.
- Read in and display (some of) the data.
- Convert the treatment start and treatment end dates into actual dates.
- Work out the number of days between the start and the end of the treatment. Check that your result is indeed a number of days. Turn it, if necessary, into an actual number (with
as.numeric), for plots later.
- Create a suitable response variable
yfor a Cox proportional-hazards model, and display it. (You don’t need to save it.) Does it distinguish correctly between patients whosetreatment_endwas their date of death, and the patients who were still alive at this point?
- Fit a Cox proportional-hazards model, predicting survival time from age, gender, and the five skin test results. Display the summary of the model. (Hint: copy your
Survfrom above into your modelling function.)
- (2 points) Use
stepto remove explanatory variables that do not help to predict survival time. Save and display the model that comes out ofstep. (Some of the explanatory variables will only be significant at 0.10, not 0.05. Keep those.)
- Plot predicted survival probabilities over time for five representative ages. Hint: your procedure will use representative values for the other variables, so you do not need to supply values for those.
- Describe the effect of increasing age on your plot, and explain briefly how this is consistent with the
summaryoutput from your model.
- Repeat the previous two questions for
mumpsskin test values.