STAD29 Assignment 5

You are expected to complete this assignment on your own: that is, you may discuss general ideas with others, but the writeup of the work must be entirely your own. If your assignment is unreasonably similar to that of another student, you can expect to be asked to explain yourself.

If you run into problems on this assignment, it is up to you to figure out what to do. The only exception is if it is impossible for you to complete this assignment, for example a data file cannot be read. (There is a difference between you not knowing how to do something, which you have to figure out, and something being impossible, which you are allowed to contact me about.)

You must hand in a rendered document that shows your code, the output that the code produces, and your answers to the questions. This should be a file with .html on the end of its name. There is no credit for handing in your unrendered document (ending in .qmd), because the grader cannot then see whether the code in it runs properly. After you have handed in your file, you should be able to see (in Attempts) what file you handed in, and you should make a habit of checking that you did indeed hand in what you intended to, and that it displays as you expect.

Hint: render your document frequently, and solve any problems as they come up, rather than trying to do so at the end (when you may be close to the due date). If your document will not successfully render, it is because of an error in your code that you will have to find and fix. The error message will tell you where the problem is, but it is up to you to sort out what the problem is.

Packages

library(tidyverse)
library(marginaleffects)
library(survival)

Waiting for a heart transplant

The Mayo Clinic says: A heart transplant is an operation in which a failing heart is replaced with a healthier donor heart. Heart transplant is a treatment that’s usually reserved for people whose condition hasn’t improved enough with medications or other surgeries. While a heart transplant is a major operation, the patient’s chance of survival is good with appropriate follow-up care.

Healthy donor hearts are difficult to come by, and so patients who need a heart transplant go on a waiting list and are matched with a donor heart when one becomes available.

Our data come from the Stanford heart transplant program, and are in http://ritsokiguess.site/datafiles/jasa.csv. Variables of interest to us include:

  • birth.dt: patient’s birth date
  • accept.dt: date patient added to waiting list
  • tx.date: transplant date
  • fu.date: date of last followup
  • fustat: status at last followup (1 = alive, 0 = dead)
  • surgery: patient previously had heart bypass surgery (1 = yes, 0 = no)
  • age: in years, at time of being accepted onto waiting list
  • futime: time in days between being accepted onto waiting list and last followup
  • wait.time: time between acceptance on waiting list and receiving a transplant (if the patient did), or NA otherwise
  • transplant: 1 if patient received transplant, 0 if they did not (before the end of the study)

We are concerned here with modelling the time until transplant, not the time until death.

  1. (1 point) Read in and display (some of) the data.

The usual:

my_url <- "http://ritsokiguess.site/datafiles/jasa.csv"
heart <- read_csv(my_url)
Rows: 103 Columns: 14
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl  (10): fustat, surgery, age, futime, wait.time, transplant, mismatch, hl...
date  (4): birth.dt, accept.dt, tx.date, fu.date

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
heart
  1. (2 points) Verify that at least some of the values in futime are correct, using the dates you have.

The followup time is defined as the number of days between acceptance onto the waiting list to the last followup time, so calculating the time difference should do it (as long as it displays the number of days so that you can compare):

heart %>% mutate(fudiff = fu.date - accept.dt) %>% 
  select(fudiff, futime)

The ten values shown here are all correct, which is sufficient to check for this question. (I wanted to see only that you could calculate the number of days of followup and that you knew how to check they were correct.) Make sure that you display the columns you want to compare and not too many other columns, so that the comparison is easy for your reader.

Optionally, if you want to check that all the values are correct:

heart %>% mutate(fudiff = fu.date - accept.dt) %>% 
  summarize(eq = all.equal(as.numeric(fudiff), futime))

and they are.

There is a bit of work to do here, because what I calculated as fudiff is actually a “difftime” (or “duration”), and has units days, while the futime from the original data is a number (dimensionless). So, to get things that can actually be tested for equality, I turned the numbers of days I calculated into (dimensionless) numbers and compared them with the numbers in futime.

Alternatively, turn the time between acceptance onto the waiting list and last followup into a period and work out the number of days in it:

heart %>% mutate(fudiff = as.period(accept.dt %--% fu.date) / days(1)) %>% 
  select(fudiff, futime)

This is like the one on the worksheet in that the number of days here is for some reason fractional, so what you do is wave your hands and say that the values are “about the same” or “usually round to the same values”. (I gotta find out why this doesn’t always come out exactly the same like the one in the lecture notes.)

  1. (3 points) Create, save, and display (some of) a new column time in your dataframe that is the time between acceptance onto the waiting list and receiving a transplant, if the patient did receive a transplant, and the time between acceptance onto the waiting list and last followup otherwise. Hints: (i) ifelse, (ii) choose what to display so that your reader can easily see that your answer is correct.

Go back and read the definitions of the relevant variables. The relevant ones are transplant, wait.time (with a dot), and futime (without a dot):

heart %>% 
  mutate(time = ifelse(transplant == 1, wait.time, futime)) -> heart
heart %>% select(wait.time, futime, transplant, time)

The best answer displays your new column time as well as the things it was made out of, so that you can see that you did the right thing. Note that the inputs to ifelse are (i) something that is true or false, (ii) the value to use if true, (iii) the value to use if false (like =IF in Excel and other spreadsheets).

Displaying the other relevant columns makes it easy to see that:

  • if transplant is 1, wait.time has a non-missing value, and that is the value that gets used for time.
  • if transplant is 0, wait.time is missing,1 and the followup time is what gets used for time.

Extra: I haven’t shown you this before, but the idea here can also be expressed as “use the value of one variable (wait.time) if it is not missing; otherwise use the value of another variable (futime).” In databases, there is a function COALESCE that does this; when Hadley Wickham was designing dplyr, he borrowed several ideas from databases, and this was one of them. Here’s how it works:

heart %>% 
  mutate(time = coalesce(wait.time, futime)) %>% 
  select(wait.time, futime, time)

This doesn’t even use the value of transplant, because we know that if wait.time has a non-missing value, transplant must be 1, and if its value is missing, transplant must be 0. The function coalesce can take any number of inputs; the first input that has a non-missing value is the output, and if all the inputs have missing values, then the answer is missing.

  1. (2 points) Create and display a suitable response variable y in your dataframe, bearing in mind that we will be predicting time until transplant (event of interest). (As on the worksheet, you do not need to save the new column.)

This means using Surv to create the response variable, using the time to event (the time that we created above) and the indication that the event transplant happened (the value 1):

heart %>% mutate(y = Surv(time, transplant == 1)) %>% 
  select(time, transplant, y)

I displayed the time and transplant variables, so that I could eyeball the values of y and see that they were correct.

  1. (2 points) Why, specifically, do some of your values of y have a plus sign next to them?

These are censored values. What that means is that these patients never had a heart transplant (which is specifically what “the event of interest never happened” means in this case).

Two point for getting to “never had a heart transplant”, only one point for saying “these observations are censored” or “the event never happened” without saying what the event actually is.

  1. (2 points) Fit a Cox proportional-hazards model from the patient’s age and whether or not they had prior bypass surgery. Display the output.

Copy the Surv that you made y out of:

heart.1 <- coxph(Surv(time, transplant == 1) ~ age + surgery, data = heart)
summary(heart.1)
Call:
coxph(formula = Surv(time, transplant == 1) ~ age + surgery, 
    data = heart)

  n= 103, number of events= 69 

           coef exp(coef) se(coef)     z Pr(>|z|)  
age     0.03113   1.03161  0.01398 2.227    0.026 *
surgery 0.04792   1.04909  0.31032 0.154    0.877  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

        exp(coef) exp(-coef) lower .95 upper .95
age         1.032     0.9694     1.004     1.060
surgery     1.049     0.9532     0.571     1.927

Concordance= 0.551  (se = 0.04 )
Likelihood ratio test= 5.77  on 2 df,   p=0.06
Wald test            = 5  on 2 df,   p=0.08
Score (logrank) test = 5.06  on 2 df,   p=0.08

Extra: as you see, there was really no need to create y above, but I had you do it so that you could see what it contained.

  1. (2 points) If appropriate, fit a better model and explain why. (If you have nothing to do, explain why not.) Display the output from your better model, if you have one.

surgery is not significant: that is to say, whether a patient had prior heart bypass surgery does not affect their time to transplant. (You might find this surprising; I would have guessed that if a patient had already had bypass surgery and now they have heart trouble again, this would bump them up the queue, but apparently not.)

Hence, remove surgery, either by copying, pasting, and editing your code, or by using update:

heart.2 <- update(heart.1, .~. - surgery)
summary(heart.2)
Call:
coxph(formula = Surv(time, transplant == 1) ~ age, data = heart)

  n= 103, number of events= 69 

       coef exp(coef) se(coef)     z Pr(>|z|)  
age 0.03117   1.03167  0.01394 2.236   0.0253 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

    exp(coef) exp(-coef) lower .95 upper .95
age     1.032     0.9693     1.004      1.06

Concordance= 0.551  (se = 0.04 )
Likelihood ratio test= 5.75  on 1 df,   p=0.02
Wald test            = 5  on 1 df,   p=0.03
Score (logrank) test = 5.05  on 1 df,   p=0.02
  1. (2 points) Using the output from your best model, describe the effect of age in the context of the data.

The coefficient of age is 0.03117, positive. This means that if a patient is older, they are more likely to receive a heart transplant sooner, all else equal. (Precisely, the “hazard of event” is higher for an older patient, which means that the event of getting a heart transplant is more likely to happen sooner for them.)

You might imagine that the heart surgery team have different things that would cause them to prioritize one patient over another when a donor heart becomes available, and the patient’s age might be one of those things.

(If age is not in your best model, all you can say is that there is no effect of age at all for you.)

  1. (2 points) Display a graph of predicted survival curves for a representative collection of ages, using your best model.

This is plot_predictions, not forgetting the type on the end:

plot_predictions(model = heart.2, condition = c("time", "age"), type = "survival")

  1. (2 points) Explain briefly how the effect of age obtained from your graph and that obtained from the coxph output are consistent (or are not, if you think they are not).

The youngest patients have the best “survival”, but you need to think carefully what that means: it means here that the youngest patients are waiting the longest time to get a heart transplant on average, and the oldest patients are waiting the shortest time, so they are getting one quickest. This is one of those situations where having the event happen sooner is a good thing, as opposed to if the event is death (which it is not here).

In the output from summary(heart.2) (my best model), the coefficient for age was positive, which also says that older patients have a higher “hazard of event”, which means they are more likely to receive a heart transplant sooner.

Footnotes

  1. The patient never received a heart transplant, so that it makes no sense to talk about how long they waited for one.↩︎