Survival Analysis

Survival analysis

  • So far, have seen:

    • response variable counted or measured (regression)

    • response variable categorized (logistic regression)

  • But what if response is time until event (eg. time of survival after surgery)?

  • Additional complication: event might not have happened at end of study (eg. patient still alive). But knowing that patient has “not died yet” presumably informative. Such data called censored.

  • Enter survival analysis, in particular the “Cox proportional hazards model”.

  • Explanatory variables in this context often called covariates.

Packages

  • Install packages survival and survminer if not done.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.2     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(survival)
library(survminer)
Loading required package: ggpubr

Attaching package: 'survminer'

The following object is masked from 'package:survival':

    myeloma
library(broom)
library(marginaleffects)

Example: still dancing?

  • 12 women who have just started taking dancing lessons are followed for up to a year, to see whether they are still taking dancing lessons, or have quit. The “event” here is “quit”.

  • This might depend on:

    • a treatment (visit to a dance competition)

    • woman’s age (at start of study).

Data

Months  Quit   Treatment Age
1        1        0      16
2        1        0      24
2        1        0      18
3        0        0      27
4        1        0      25
7        1        1      26
8        1        1      36
10       1        1      38
10       0        1      45
12       1        1      47

About the data

  • months and quit are kind of combined response:

    • Months is number of months a woman was actually observed dancing

    • quit is 1 if woman quit, 0 if still dancing at end of study.

  • Treatment is 1 if woman went to dance competition, 0 otherwise.

  • Fit model and see whether Age or Treatment have effect on survival.

  • Want to do predictions for probabilities of still dancing as they depend on whatever is significant, and draw plot.

Read data

  • Column-aligned:
url <- "http://ritsokiguess.site/datafiles/dancing.txt"
dance <- read_table(url)

── Column specification ────────────────────────────────────────────────────────
cols(
  Months = col_double(),
  Quit = col_double(),
  Treatment = col_double(),
  Age = col_double()
)

The data

dance
# A tibble: 12 × 4
   Months  Quit Treatment   Age
    <dbl> <dbl>     <dbl> <dbl>
 1      1     1         0    16
 2      2     1         0    24
 3      2     1         0    18
 4      3     0         0    27
 5      4     1         0    25
 6      5     1         0    21
 7     11     1         0    55
 8      7     1         1    26
 9      8     1         1    36
10     10     1         1    38
11     10     0         1    45
12     12     1         1    47

Examine response and fit model

  • Response variable:
dance %>% mutate(mth = Surv(Months, Quit)) -> dance
dance
# A tibble: 12 × 5
   Months  Quit Treatment   Age    mth
    <dbl> <dbl>     <dbl> <dbl> <Surv>
 1      1     1         0    16     1 
 2      2     1         0    24     2 
 3      2     1         0    18     2 
 4      3     0         0    27     3+
 5      4     1         0    25     4 
 6      5     1         0    21     5 
 7     11     1         0    55    11 
 8      7     1         1    26     7 
 9      8     1         1    36     8 
10     10     1         1    38    10 
11     10     0         1    45    10+
12     12     1         1    47    12 
  • Then fit model, predicting mth from explanatories:
dance.1 <- coxph(mth ~ Treatment + Age, data = dance)

Output looks a lot like regression

summary(dance.1)
Call:
coxph(formula = mth ~ Treatment + Age, data = dance)

  n= 12, number of events= 10 

              coef exp(coef) se(coef)      z Pr(>|z|)  
Treatment -4.44915   0.01169  2.60929 -1.705   0.0882 .
Age       -0.36619   0.69337  0.15381 -2.381   0.0173 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

          exp(coef) exp(-coef) lower .95 upper .95
Treatment   0.01169     85.554 7.026e-05    1.9444
Age         0.69337      1.442 5.129e-01    0.9373

Concordance= 0.964  (se = 0.039 )
Likelihood ratio test= 21.68  on 2 df,   p=2e-05
Wald test            = 5.67  on 2 df,   p=0.06
Score (logrank) test = 14.75  on 2 df,   p=6e-04

Conclusions

  • Use \(\alpha=0.10\) here since not much data.

  • Three tests at bottom like global F-test. Consensus that something predicts survival time (whether or not dancer quit and how long it took).

  • Age (definitely), Treatment (marginally) both predict survival time.

Behind the scenes

  • All depends on hazard rate, which is based on probability that event happens in the next short time period, given that event has not happened yet:

  • \(X\) denotes time to event, \(\delta\) is small time interval:

  • \(h(t) = P(X \le t + \delta | X \ge t) / \delta\)

  • if \(h(t)\) large, event likely to happen soon (lifetime short)

  • if \(h(t)\) small, event unlikely to happen soon (lifetime long).

Modelling lifetime

  • want to model hazard rate

  • but hazard rate always positive, so actually model log of hazard rate

  • modelling how (log-)hazard rate depends on other things eg \(X_1 =\) age, \(X_2 =\) treatment, with the \(\beta\) being regression coefficients:

  • Cox model \(h(t)=h_0(t)\exp(\beta_0+\beta_1X_1+\beta_2 X_2+\cdots)\), or:

  • \(\log(h(t)) = \log(h_0(t)) + \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots\)

  • like a generalized linear model with log link.

Model checking

  • With regression, usually plot residuals against fitted values.

  • Not quite same here (nonlinear model), but ``martingale residuals’’ should have no pattern vs. “linear predictor”.

  • ggcoxdiagnostics from package survminer makes plot, to which we add smooth. If smooth trend more or less straight across, model OK.

  • Martingale residuals can go very negative, so won’t always look normal.

Martingale residual plot for dance data

This looks good (with only 12 points):

ggcoxdiagnostics(dance.1) 

Predicted survival probs

  • The function we use is called survfit, though actually works rather like predict.
  • First create a data frame of values to predict from. We’ll do all combos of ages 20 and 40, treatment and not, using datagrid to get all the combos:
treatments <- c(0, 1)
ages <- c(20, 40)
dance.new <- datagrid(model = dance.1, Treatment = treatments, Age = ages)
Warning: Matrix columns are not supported as predictors and are therefore
  omitted. This may prevent computation of the quantities of interest. You
  can construct your own prediction dataset and supply it explicitly to
  the `newdata` argument.
dance.new
  Treatment Age rowid
1         0  20     1
2         0  40     2
3         1  20     3
4         1  40     4

The predictions

One prediction for each time for each combo of age and treatment in dance.new:


── Column specification ────────────────────────────────────────────────────────
cols(
  Months = col_double(),
  Quit = col_double(),
  Treatment = col_double(),
  Age = col_double()
)
s <- survfit(dance.1, newdata = dance.new, data = dance)
summary(s)
Call: survfit(formula = dance.1, newdata = dance.new, data = dance)

 time n.risk n.event survival1 survival2 survival3 survival4
    1     12       1  8.76e-01  1.00e+00  9.98e-01     1.000
    2     11       2  3.99e-01  9.99e-01  9.89e-01     1.000
    4      8       1  1.24e-01  9.99e-01  9.76e-01     1.000
    5      7       1  2.93e-02  9.98e-01  9.60e-01     1.000
    7      6       1 2.96e-323  6.13e-01  1.70e-04     0.994
    8      5       1  0.00e+00  2.99e-06  1.35e-98     0.862
   10      4       1  0.00e+00  0.00e+00  0.00e+00     0.000
   11      2       1  0.00e+00  0.00e+00  0.00e+00     0.000
   12      1       1  0.00e+00  0.00e+00  0.00e+00     0.000

Conclusions from predicted probs

  • Older women more likely to be still dancing than younger women (compare “profiles” for same treatment group).

  • Effect of treatment seems to be to increase prob of still dancing (compare “profiles” for same age for treatment group vs. not)

  • Would be nice to see this on a graph. This is ggsurvplot from package survminer:

g <- ggsurvplot(s, conf.int = F)

“Strata” (groups)

  • uses “strata” thus (dance.new):
  Treatment Age rowid
1         0  20     1
2         0  40     2
3         1  20     3
4         1  40     4

Plotting survival probabilities

g

Discussion

  • Survivor curve farther to the right is better (better chance of surviving longer).

  • Best is age 40 with treatment, worst age 20 without.

  • Appears to be:

    • age effect (40 better than 20)

    • treatment effect (treatment better than not)

    • In analysis, treatment effect only marginally significant.

A more realistic example: lung cancer

  • When you load in an R package, get data sets to illustrate functions in the package.

  • One such is lung. Data set measuring survival in patients with advanced lung cancer.

  • Along with survival time, number of “performance scores” included, measuring how well patients can perform daily activities.

  • Sometimes high good, but sometimes bad!

  • Variables below, from the data set help file (?lung).

The variables

Uh oh, missing values

lung
    inst time status age sex ph.ecog ph.karno pat.karno meal.cal wt.loss
1      3  306      2  74   1       1       90       100     1175      NA
2      3  455      2  68   1       0       90        90     1225      15
3      3 1010      1  56   1       0       90        90       NA      15
4      5  210      2  57   1       1       90        60     1150      11
5      1  883      2  60   1       0      100        90       NA       0
6     12 1022      1  74   1       1       50        80      513       0
7      7  310      2  68   2       2       70        60      384      10
8     11  361      2  71   2       2       60        80      538       1
9      1  218      2  53   1       1       70        80      825      16
10     7  166      2  61   1       2       70        70      271      34
11     6  170      2  57   1       1       80        80     1025      27
12    16  654      2  68   2       2       70        70       NA      23
13    11  728      2  68   2       1       90        90       NA       5
14    21   71      2  60   1      NA       60        70     1225      32
15    12  567      2  57   1       1       80        70     2600      60
16     1  144      2  67   1       1       80        90       NA      15
17    22  613      2  70   1       1       90       100     1150      -5
18    16  707      2  63   1       2       50        70     1025      22
19     1   61      2  56   2       2       60        60      238      10
20    21   88      2  57   1       1       90        80     1175      NA
21    11  301      2  67   1       1       80        80     1025      17
22     6   81      2  49   2       0      100        70     1175      -8
23    11  624      2  50   1       1       70        80       NA      16
24    15  371      2  58   1       0       90       100      975      13
25    12  394      2  72   1       0       90        80       NA       0
26    12  520      2  70   2       1       90        80      825       6
27     4  574      2  60   1       0      100       100     1025     -13
28    13  118      2  70   1       3       60        70     1075      20
29    13  390      2  53   1       1       80        70      875      -7
30     1   12      2  74   1       2       70        50      305      20
31    12  473      2  69   2       1       90        90     1025      -1
32     1   26      2  73   1       2       60        70      388      20
33     7  533      2  48   1       2       60        80       NA     -11
34    16  107      2  60   2       2       50        60      925     -15
35    12   53      2  61   1       2       70       100     1075      10
36     1  122      2  62   2       2       50        50     1025      NA
37    22  814      2  65   1       2       70        60      513      28
38    15  965      1  66   2       1       70        90      875       4
39     1   93      2  74   1       2       50        40     1225      24
40     1  731      2  64   2       1       80       100     1175      15
41     5  460      2  70   1       1       80        60      975      10
42    11  153      2  73   2       2       60        70     1075      11
43    10  433      2  59   2       0       90        90      363      27
44    12  145      2  60   2       2       70        60       NA      NA
45     7  583      2  68   1       1       60        70     1025       7
46     7   95      2  76   2       2       60        60      625     -24
47     1  303      2  74   1       0       90        70      463      30
48     3  519      2  63   1       1       80        70     1025      10
49    13  643      2  74   1       0       90        90     1425       2
50    22  765      2  50   2       1       90       100     1175       4
51     3  735      2  72   2       1       90        90       NA       9
52    12  189      2  63   1       0       80        70       NA       0
53    21   53      2  68   1       0       90       100     1025       0
54     1  246      2  58   1       0      100        90     1175       7
55     6  689      2  59   1       1       90        80     1300      15
56     1   65      2  62   1       0       90        80      725      NA
57     5    5      2  65   2       0      100        80      338       5
58    22  132      2  57   1       2       70        60       NA      18
59     3  687      2  58   2       1       80        80     1225      10
60     1  345      2  64   2       1       90        80     1075      -3
61    22  444      2  75   2       2       70        70      438       8
62    12  223      2  48   1       1       90        80     1300      68
63    21  175      2  73   1       1       80       100     1025      NA
64    11   60      2  65   2       1       90        80     1025       0
65     3  163      2  69   1       1       80        60     1125       0
66     3   65      2  68   1       2       70        50      825       8
67    16  208      2  67   2       2       70        NA      538       2
68     5  821      1  64   2       0       90        70     1025       3
69    22  428      2  68   1       0      100        80     1039       0
70     6  230      2  67   1       1       80       100      488      23
71    13  840      1  63   1       0       90        90     1175      -1
72     3  305      2  48   2       1       80        90      538      29
73     5   11      2  74   1       2       70       100     1175       0
74     2  132      2  40   1       1       80        80       NA       3
75    21  226      2  53   2       1       90        80      825       3
76    12  426      2  71   2       1       90        90     1075      19
77     1  705      2  51   2       0      100        80     1300       0
78     6  363      2  56   2       1       80        70     1225      -2
79     3   11      2  81   1       0       90        NA      731      15
80     1  176      2  73   1       0       90        70      169      30
81     4  791      2  59   1       0      100        80      768       5
82    13   95      2  55   1       1       70        90     1500      15
83    11  196      1  42   1       1       80        80     1425       8
84    21  167      2  44   2       1       80        90      588      -1
85    16  806      1  44   1       1       80        80     1025       1
86     6  284      2  71   1       1       80        90     1100      14
87    22  641      2  62   2       1       80        80     1150       1
88    21  147      2  61   1       0      100        90     1175       4
89    13  740      1  44   2       1       90        80      588      39
90     1  163      2  72   1       2       70        70      910       2
91    11  655      2  63   1       0      100        90      975      -1
92    22  239      2  70   1       1       80       100       NA      23
93     5   88      2  66   1       1       90        80      875       8
94    10  245      2  57   2       1       80        60      280      14
95     1  588      1  69   2       0      100        90       NA      13
96    12   30      2  72   1       2       80        60      288       7
97     3  179      2  69   1       1       80        80       NA      25
98    12  310      2  71   1       1       90       100       NA       0
99    11  477      2  64   1       1       90       100      910       0
100    3  166      2  70   2       0       90        70       NA      10
101    1  559      1  58   2       0      100       100      710      15
102    6  450      2  69   2       1       80        90     1175       3
103   13  364      2  56   1       1       70        80       NA       4
104    6  107      2  63   1       1       90        70       NA       0
105   13  177      2  59   1       2       50        NA       NA      32
106   12  156      2  66   1       1       80        90      875      14
107   26  529      1  54   2       1       80       100      975      -3
108    1   11      2  67   1       1       90        90      925      NA
109   21  429      2  55   1       1      100        80      975       5
110    3  351      2  75   2       2       60        50      925      11
111   13   15      2  69   1       0       90        70      575      10
112    1  181      2  44   1       1       80        90     1175       5
113   10  283      2  80   1       1       80       100     1030       6
114    3  201      2  75   2       0       90       100       NA       1
115    6  524      2  54   2       1       80       100       NA      15
116    1   13      2  76   1       2       70        70      413      20
117    3  212      2  49   1       2       70        60      675      20
118    1  524      2  68   1       2       60        70     1300      30
119   16  288      2  66   1       2       70        60      613      24
120   15  363      2  80   1       1       80        90      346      11
121   22  442      2  75   1       0       90        90       NA       0
122   26  199      2  60   2       2       70        80      675      10
123    3  550      2  69   2       1       70        80      910       0
124   11   54      2  72   1       2       60        60      768      -3
125    1  558      2  70   1       0       90        90     1025      17
126   22  207      2  66   1       1       80        80      925      20
127    7   92      2  50   1       1       80        60     1075      13
128   12   60      2  64   1       1       80        90      993       0
129   16  551      1  77   2       2       80        60      750      28
130   12  543      1  48   2       0       90        60       NA       4
131    4  293      2  59   2       1       80        80      925      52
132   16  202      2  53   1       1       80        80       NA      20
133    6  353      2  47   1       0      100        90     1225       5
134   13  511      1  55   2       1       80        70       NA      49
135    1  267      2  67   1       0       90        70      313       6
136   22  511      1  74   2       2       60        40       96      37
137   12  371      2  58   2       1       80        70       NA       0
138   13  387      2  56   1       2       80        60     1075      NA
139    1  457      2  54   1       1       90        90      975      -5
140    5  337      2  56   1       0      100       100     1500      15
141   21  201      2  73   2       2       70        60     1225     -16
142    3  404      1  74   1       1       80        70      413      38
143   26  222      2  76   1       2       70        70     1500       8
144    1   62      2  65   2       1       80        90     1075       0
145   11  458      1  57   1       1       80       100      513      30
146   26  356      1  53   2       1       90        90       NA       2
147   16  353      2  71   1       0      100        80      775       2
148   16  163      2  54   1       1       90        80     1225      13
149   12   31      2  82   1       0      100        90      413      27
150   13  340      2  59   2       0      100        90       NA       0
151   13  229      2  70   1       1       70        60     1175      -2
152   22  444      1  60   1       0       90       100       NA       7
153    5  315      1  62   2       0       90        90       NA       0
154   16  182      2  53   2       1       80        60       NA       4
155   32  156      2  55   1       2       70        30     1025      10
156   NA  329      2  69   1       2       70        80      713      20
157   26  364      1  68   2       1       90        90       NA       7
158    4  291      2  62   1       2       70        60      475      27
159   12  179      2  63   1       1       80        70      538      -2
160    1  376      1  56   2       1       80        90      825      17
161   32  384      1  62   2       0       90        90      588       8
162   10  268      2  44   2       1       90       100     2450       2
163   11  292      1  69   1       2       60        70     2450      36
164    6  142      2  63   1       1       90        80      875       2
165    7  413      1  64   1       1       80        70      413      16
166   16  266      1  57   2       0       90        90     1075       3
167   11  194      2  60   2       1       80        60       NA      33
168   21  320      2  46   1       0      100       100      860       4
169    6  181      2  61   1       1       90        90      730       0
170   12  285      2  65   1       0      100        90     1025       0
171   13  301      1  61   1       1       90       100      825       2
172    2  348      2  58   2       0       90        80     1225      10
173    2  197      2  56   1       1       90        60      768      37
174   16  382      1  43   2       0      100        90      338       6
175    1  303      1  53   1       1       90        80     1225      12
176   13  296      1  59   2       1       80       100     1025       0
177    1  180      2  56   1       2       60        80     1225      -2
178   13  186      2  55   2       1       80        70       NA      NA
179    1  145      2  53   2       1       80        90      588      13
180    7  269      1  74   2       0      100       100      588       0
181   13  300      1  60   1       0      100       100      975       5
182    1  284      1  39   1       0      100        90     1225      -5
183   16  350      2  66   2       0       90       100     1025      NA
184   32  272      1  65   2       1       80        90       NA      -1
185   12  292      1  51   2       0       90        80     1225       0
186   12  332      1  45   2       0       90       100      975       5
187    2  285      2  72   2       2       70        90      463      20
188    3  259      1  58   1       0       90        80     1300       8
189   15  110      2  64   1       1       80        60     1025      12
190   22  286      2  53   1       0       90        90     1225       8
191   16  270      2  72   1       1       80        90      488      14
192   16   81      2  52   1       2       60        70     1075      NA
193   12  131      2  50   1       1       90        80      513      NA
194    1  225      1  64   1       1       90        80      825      33
195   22  269      2  71   1       1       90        90     1300      -2
196   12  225      1  70   1       0      100       100     1175       6
197   32  243      1  63   2       1       80        90      825       0
198   21  279      1  64   1       1       90        90       NA       4
199    1  276      1  52   2       0      100        80      975       0
200   32  135      2  60   1       1       90        70     1275       0
201   15   79      2  64   2       1       90        90      488      37
202   22   59      2  73   1       1       60        60     2200       5
203   32  240      1  63   2       0       90       100     1025       0
204    3  202      1  50   2       0      100       100      635       1
205   26  235      1  63   2       0      100        90      413       0
206   33  105      2  62   1       2       NA        70       NA      NA
207    5  224      1  55   2       0       80        90       NA      23
208   13  239      2  50   2       2       60        60     1025      -3
209   21  237      1  69   1       1       80        70       NA      NA
210   33  173      1  59   2       1       90        80       NA      10
211    1  252      1  60   2       0      100        90      488      -2
212    6  221      1  67   1       1       80        70      413      23
213   15  185      1  69   1       1       90        70     1075       0
214   11   92      1  64   2       2       70       100       NA      31
215   11   13      2  65   1       1       80        90       NA      10
216   11  222      1  65   1       1       90        70     1025      18
217   13  192      1  41   2       1       90        80       NA     -10
218   21  183      2  76   1       2       80        60      825       7
219   11  211      1  70   2       2       70        30      131       3
220    2  175      1  57   2       0       80        80      725      11
221   22  197      1  67   1       1       80        90     1500       2
222   11  203      1  71   2       1       80        90     1025       0
223    1  116      2  76   1       1       80        80       NA       0
224    1  188      1  77   1       1       80        60       NA       3
225   13  191      1  39   1       0       90        90     2350      -5
226   32  105      1  75   2       2       60        70     1025       5
227    6  174      1  66   1       1       90       100     1075       1
228   22  177      1  58   2       1       80        90     1060       0

A closer look

summary(lung)
      inst            time            status           age       
 Min.   : 1.00   Min.   :   5.0   Min.   :1.000   Min.   :39.00  
 1st Qu.: 3.00   1st Qu.: 166.8   1st Qu.:1.000   1st Qu.:56.00  
 Median :11.00   Median : 255.5   Median :2.000   Median :63.00  
 Mean   :11.09   Mean   : 305.2   Mean   :1.724   Mean   :62.45  
 3rd Qu.:16.00   3rd Qu.: 396.5   3rd Qu.:2.000   3rd Qu.:69.00  
 Max.   :33.00   Max.   :1022.0   Max.   :2.000   Max.   :82.00  
 NA's   :1                                                       
      sex           ph.ecog          ph.karno        pat.karno     
 Min.   :1.000   Min.   :0.0000   Min.   : 50.00   Min.   : 30.00  
 1st Qu.:1.000   1st Qu.:0.0000   1st Qu.: 75.00   1st Qu.: 70.00  
 Median :1.000   Median :1.0000   Median : 80.00   Median : 80.00  
 Mean   :1.395   Mean   :0.9515   Mean   : 81.94   Mean   : 79.96  
 3rd Qu.:2.000   3rd Qu.:1.0000   3rd Qu.: 90.00   3rd Qu.: 90.00  
 Max.   :2.000   Max.   :3.0000   Max.   :100.00   Max.   :100.00  
                 NA's   :1        NA's   :1        NA's   :3       
    meal.cal         wt.loss       
 Min.   :  96.0   Min.   :-24.000  
 1st Qu.: 635.0   1st Qu.:  0.000  
 Median : 975.0   Median :  7.000  
 Mean   : 928.8   Mean   :  9.832  
 3rd Qu.:1150.0   3rd Qu.: 15.750  
 Max.   :2600.0   Max.   : 68.000  
 NA's   :47       NA's   :14       

Remove obs with any missing values

lung %>% drop_na() -> lung.complete
lung.complete %>%
  select(meal.cal:wt.loss) %>%
  slice(1:10)
   meal.cal wt.loss
2      1225      15
4      1150      11
6       513       0
7       384      10
8       538       1
9       825      16
10      271      34
11     1025      27
15     2600      60
17     1150      -5

Missing values seem to be gone.

Check!

summary(lung.complete)
      inst            time            status           age       
 Min.   : 1.00   Min.   :   5.0   Min.   :1.000   Min.   :39.00  
 1st Qu.: 3.00   1st Qu.: 174.5   1st Qu.:1.000   1st Qu.:57.00  
 Median :11.00   Median : 268.0   Median :2.000   Median :64.00  
 Mean   :10.71   Mean   : 309.9   Mean   :1.719   Mean   :62.57  
 3rd Qu.:15.00   3rd Qu.: 419.5   3rd Qu.:2.000   3rd Qu.:70.00  
 Max.   :32.00   Max.   :1022.0   Max.   :2.000   Max.   :82.00  
      sex           ph.ecog          ph.karno        pat.karno     
 Min.   :1.000   Min.   :0.0000   Min.   : 50.00   Min.   : 30.00  
 1st Qu.:1.000   1st Qu.:0.0000   1st Qu.: 70.00   1st Qu.: 70.00  
 Median :1.000   Median :1.0000   Median : 80.00   Median : 80.00  
 Mean   :1.383   Mean   :0.9581   Mean   : 82.04   Mean   : 79.58  
 3rd Qu.:2.000   3rd Qu.:1.0000   3rd Qu.: 90.00   3rd Qu.: 90.00  
 Max.   :2.000   Max.   :3.0000   Max.   :100.00   Max.   :100.00  
    meal.cal         wt.loss       
 Min.   :  96.0   Min.   :-24.000  
 1st Qu.: 619.0   1st Qu.:  0.000  
 Median : 975.0   Median :  7.000  
 Mean   : 929.1   Mean   :  9.719  
 3rd Qu.:1162.5   3rd Qu.: 15.000  
 Max.   :2600.0   Max.   : 68.000  

No missing values left.

Model 1: use everything except inst

names(lung.complete)
 [1] "inst"      "time"      "status"    "age"       "sex"      
 [6] "ph.ecog"   "ph.karno"  "pat.karno" "meal.cal"  "wt.loss"  
  • Event was death, goes with status of 2:
lung.complete %>% 
   mutate(resp = Surv(time, status == 2)) ->
   lung.complete
lung.1 <- coxph(resp ~ . - inst - time - status,
  data = lung.complete
)

“Dot” means “all the other variables”.

summary of model 1

summary(lung.1)
Call:
coxph(formula = resp ~ . - inst - time - status, data = lung.complete)

  n= 167, number of events= 120 

                coef  exp(coef)   se(coef)      z Pr(>|z|)   
age        1.080e-02  1.011e+00  1.160e-02  0.931  0.35168   
sex       -5.536e-01  5.749e-01  2.016e-01 -2.746  0.00603 **
ph.ecog    7.395e-01  2.095e+00  2.250e-01  3.287  0.00101 **
ph.karno   2.244e-02  1.023e+00  1.123e-02  1.998  0.04575 * 
pat.karno -1.207e-02  9.880e-01  8.116e-03 -1.488  0.13685   
meal.cal   2.835e-05  1.000e+00  2.594e-04  0.109  0.91298   
wt.loss   -1.420e-02  9.859e-01  7.766e-03 -1.828  0.06748 . 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

          exp(coef) exp(-coef) lower .95 upper .95
age          1.0109     0.9893    0.9881    1.0341
sex          0.5749     1.7395    0.3872    0.8534
ph.ecog      2.0950     0.4773    1.3479    3.2560
ph.karno     1.0227     0.9778    1.0004    1.0455
pat.karno    0.9880     1.0121    0.9724    1.0038
meal.cal     1.0000     1.0000    0.9995    1.0005
wt.loss      0.9859     1.0143    0.9710    1.0010

Concordance= 0.653  (se = 0.029 )
Likelihood ratio test= 28.16  on 7 df,   p=2e-04
Wald test            = 27.5  on 7 df,   p=3e-04
Score (logrank) test = 28.31  on 7 df,   p=2e-04

Overall significance

The three tests of overall significance:

glance(lung.1) %>% select(starts_with("p.value"))
# A tibble: 1 × 4
  p.value.log p.value.sc p.value.wald p.value.robust
        <dbl>      <dbl>        <dbl>          <dbl>
1    0.000205   0.000193     0.000271             NA

All strongly significant. Something predicts survival.

Coefficients for model 1

tidy(lung.1) %>% select(term, p.value) %>% arrange(p.value)
# A tibble: 7 × 2
  term      p.value
  <chr>       <dbl>
1 ph.ecog   0.00101
2 sex       0.00603
3 ph.karno  0.0457 
4 wt.loss   0.0675 
5 pat.karno 0.137  
6 age       0.352  
7 meal.cal  0.913  
  • sex and ph.ecog definitely significant here

  • age, pat.karno and meal.cal definitely not

  • Take out definitely non-sig variables, and try again.

Model 2

lung.2 <- update(lung.1, . ~ . - age - pat.karno - meal.cal)
tidy(lung.2) %>% select(term, p.value)
# A tibble: 4 × 2
  term      p.value
  <chr>       <dbl>
1 sex      0.00409 
2 ph.ecog  0.000112
3 ph.karno 0.101   
4 wt.loss  0.108   

Compare with first model:

anova(lung.2, lung.1)
Analysis of Deviance Table
 Cox model: response is  resp
 Model 1: ~ sex + ph.ecog + ph.karno + wt.loss
 Model 2: ~ (inst + time + status + age + sex + ph.ecog + ph.karno + pat.karno + meal.cal + wt.loss) - inst - time - status
   loglik Chisq Df Pr(>|Chi|)
1 -495.67                    
2 -494.03 3.269  3      0.352
  • No harm in taking out those variables.

Model 3

Take out ph.karno and wt.loss as well.

lung.3 <- update(lung.2, . ~ . - ph.karno - wt.loss)
tidy(lung.3) %>% select(term, estimate, p.value)
# A tibble: 2 × 3
  term    estimate  p.value
  <chr>      <dbl>    <dbl>
1 sex       -0.510 0.00958 
2 ph.ecog    0.483 0.000266
summary(lung.3)
Call:
coxph(formula = resp ~ sex + ph.ecog, data = lung.complete)

  n= 167, number of events= 120 

           coef exp(coef) se(coef)      z Pr(>|z|)    
sex     -0.5101    0.6004   0.1969 -2.591 0.009579 ** 
ph.ecog  0.4825    1.6201   0.1323  3.647 0.000266 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

        exp(coef) exp(-coef) lower .95 upper .95
sex        0.6004     1.6655    0.4082    0.8832
ph.ecog    1.6201     0.6172    1.2501    2.0998

Concordance= 0.641  (se = 0.031 )
Likelihood ratio test= 19.48  on 2 df,   p=6e-05
Wald test            = 19.35  on 2 df,   p=6e-05
Score (logrank) test = 19.62  on 2 df,   p=5e-05

Check whether that was OK

anova(lung.3, lung.2)
Analysis of Deviance Table
 Cox model: response is  resp
 Model 1: ~ sex + ph.ecog
 Model 2: ~ sex + ph.ecog + ph.karno + wt.loss
   loglik  Chisq Df Pr(>|Chi|)  
1 -498.38                       
2 -495.67 5.4135  2    0.06675 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Just OK.

Commentary

  • OK (just) to take out those two covariates.

  • Both remaining variables strongly significant.

  • Nature of effect on survival time? Consider later.

  • Picture?

Plotting survival probabilities

  • Create new data frame of values to predict for, then predict:
sexes <- c(1, 2)
ph.ecogs <- 0:3
lung.new <- datagrid(sex = sexes, ph.ecog = ph.ecogs, model = lung.3)
Warning: Matrix columns are not supported as predictors and are
  therefore omitted. This may prevent computation of the
  quantities of interest. You can construct your own prediction
  dataset and supply it explicitly to the `newdata` argument.
lung.new
  sex ph.ecog rowid
1   1       0     1
2   1       1     2
3   1       2     3
4   1       3     4
5   2       0     5
6   2       1     6
7   2       2     7
8   2       3     8

Making the plot

s <- survfit(lung.3, newdata = lung.new, data = lung)
summary(s)
Call: survfit(formula = lung.3, newdata = lung.new, data = lung)

 time n.risk n.event survival1 survival2 survival3 survival4
    5    167       1     0.996    0.9932   0.98908  0.982370
   11    166       1     0.992    0.9865   0.97825  0.965007
   12    165       1     0.987    0.9798   0.96743  0.947763
   13    164       1     0.983    0.9730   0.95660  0.930639
   15    163       1     0.979    0.9662   0.94577  0.913634
   26    162       1     0.975    0.9594   0.93502  0.896868
   30    161       1     0.970    0.9526   0.92427  0.880221
   31    160       1     0.966    0.9457   0.91352  0.863694
   53    159       2     0.957    0.9320   0.89222  0.831293
   54    157       1     0.953    0.9252   0.88162  0.815359
   59    156       1     0.949    0.9183   0.87103  0.799544
   60    155       2     0.940    0.9046   0.85000  0.768509
   61    153       1     0.936    0.8977   0.83959  0.753310
   62    152       1     0.931    0.8908   0.82922  0.738301
   65    151       1     0.927    0.8840   0.81894  0.723524
   79    150       1     0.922    0.8771   0.80865  0.708862
   81    149       1     0.918    0.8703   0.79845  0.694431
   88    148       1     0.913    0.8635   0.78835  0.680252
   92    147       1     0.909    0.8567   0.77830  0.666254
   93    146       1     0.904    0.8499   0.76829  0.652435
   95    145       2     0.895    0.8362   0.74834  0.625202
  107    142       1     0.891    0.8293   0.73836  0.611749
  110    141       1     0.886    0.8224   0.72843  0.598481
  118    140       1     0.882    0.8155   0.71856  0.585392
  135    139       1     0.877    0.8085   0.70860  0.572303
  142    138       1     0.872    0.8015   0.69869  0.559396
  145    137       1     0.868    0.7945   0.68884  0.546672
  147    136       1     0.863    0.7875   0.67907  0.534172
  153    135       1     0.858    0.7806   0.66940  0.521891
  156    134       2     0.849    0.7667   0.65018  0.497831
  163    132       3     0.834    0.7457   0.62158  0.462841
  166    129       1     0.829    0.7386   0.61212  0.451483
  167    128       1     0.825    0.7316   0.60266  0.440233
  170    127       1     0.820    0.7245   0.59329  0.429201
  176    124       1     0.815    0.7174   0.58386  0.418196
  179    122       1     0.810    0.7102   0.57446  0.407344
  180    121       1     0.805    0.7031   0.56512  0.396669
  181    120       2     0.794    0.6887   0.54650  0.375711
  183    118       1     0.789    0.6815   0.53728  0.365495
  197    114       1     0.784    0.6741   0.52779  0.355101
  199    112       1     0.778    0.6665   0.51827  0.344779
  201    111       1     0.773    0.6590   0.50882  0.334648
  207    108       1     0.768    0.6514   0.49933  0.324595
  210    107       1     0.762    0.6438   0.48991  0.314732
  212    105       1     0.756    0.6361   0.48045  0.304945
  218    104       1     0.751    0.6283   0.47099  0.295276
  222    102       1     0.745    0.6205   0.46148  0.285683
  223    100       1     0.739    0.6124   0.45186  0.276090
  226     97       1     0.733    0.6042   0.44210  0.266498
  229     96       1     0.727    0.5961   0.43248  0.257165
  230     95       1     0.720    0.5879   0.42294  0.248038
  239     93       1     0.714    0.5797   0.41343  0.239068
  245     90       1     0.708    0.5714   0.40388  0.230184
  246     89       1     0.702    0.5632   0.39447  0.221557
  267     85       1     0.695    0.5548   0.38501  0.213013
  268     84       1     0.689    0.5465   0.37569  0.204723
  269     83       1     0.682    0.5382   0.36652  0.196684
  270     81       1     0.676    0.5299   0.35738  0.188799
  283     79       1     0.669    0.5215   0.34827  0.181068
  284     78       1     0.662    0.5131   0.33926  0.173537
  285     76       2     0.649    0.4962   0.32136  0.158944
  286     74       1     0.642    0.4878   0.31259  0.151979
  288     73       1     0.635    0.4795   0.30397  0.145249
  291     72       1     0.628    0.4711   0.29536  0.138637
  293     69       1     0.621    0.4622   0.28642  0.131905
  301     66       1     0.614    0.4532   0.27745  0.125276
  303     64       1     0.606    0.4441   0.26841  0.118730
  305     62       1     0.598    0.4348   0.25937  0.112320
  310     61       1     0.590    0.4256   0.25052  0.106180
  320     60       1     0.582    0.4163   0.24180  0.100256
  337     58       1     0.574    0.4071   0.23320  0.094542
  345     57       1     0.566    0.3980   0.22479  0.089081
  348     56       1     0.558    0.3890   0.21657  0.083866
  351     55       1     0.550    0.3801   0.20859  0.078916
  353     54       2     0.534    0.3623   0.19306  0.069614
  361     52       1     0.526    0.3536   0.18556  0.065290
  363     51       2     0.510    0.3362   0.17097  0.057176
  371     49       1     0.502    0.3275   0.16390  0.053395
  390     45       1     0.494    0.3186   0.15677  0.049681
  426     42       1     0.485    0.3092   0.14934  0.045926
  428     41       1     0.476    0.2999   0.14215  0.042394
  429     40       1     0.467    0.2908   0.13517  0.039074
  433     39       1     0.457    0.2816   0.12833  0.035920
  444     38       1     0.448    0.2726   0.12175  0.032987
  450     37       1     0.439    0.2636   0.11532  0.030210
  455     36       1     0.430    0.2548   0.10910  0.027616
  457     35       1     0.421    0.2460   0.10310  0.025196
  460     33       1     0.411    0.2369   0.09701  0.022831
  473     32       1     0.401    0.2279   0.09107  0.020608
  477     31       1     0.392    0.2190   0.08536  0.018556
  519     29       1     0.381    0.2097   0.07957  0.016559
  520     28       1     0.371    0.2004   0.07393  0.014701
  524     27       1     0.360    0.1912   0.06855  0.013008
  550     25       1     0.349    0.1815   0.06302  0.011348
  558     23       1     0.337    0.1716   0.05749  0.009781
  567     21       1     0.325    0.1616   0.05217  0.008356
  574     20       1     0.312    0.1516   0.04704  0.007067
  583     19       1     0.300    0.1418   0.04224  0.005936
  613     18       1     0.287    0.1321   0.03764  0.004925
  641     17       1     0.273    0.1223   0.03325  0.004028
  643     16       1     0.260    0.1129   0.02919  0.003262
  655     15       1     0.247    0.1038   0.02546  0.002613
  687     14       1     0.234    0.0949   0.02203  0.002068
  689     13       1     0.221    0.0864   0.01892  0.001615
  705     12       1     0.207    0.0778   0.01598  0.001229
  707     11       1     0.193    0.0699   0.01341  0.000926
  731     10       1     0.178    0.0613   0.01085  0.000656
  765      8       1     0.162    0.0525   0.00843  0.000436
  791      7       1     0.146    0.0442   0.00638  0.000278
  814      5       1     0.126    0.0348   0.00434  0.000149
 survival5 survival6 survival7 survival8
     0.997     0.996    0.9934   0.98938
     0.995     0.992    0.9869   0.97884
     0.992     0.988    0.9803   0.96830
     0.990     0.984    0.9737   0.95776
     0.987     0.980    0.9671   0.94721
     0.985     0.975    0.9605   0.93673
     0.982     0.971    0.9538   0.92626
     0.980     0.967    0.9471   0.91577
     0.974     0.959    0.9338   0.89499
     0.972     0.954    0.9271   0.88465
     0.969     0.950    0.9204   0.87431
     0.964     0.942    0.9070   0.85377
     0.961     0.937    0.9003   0.84359
     0.958     0.933    0.8936   0.83346
     0.955     0.929    0.8870   0.82340
     0.953     0.924    0.8803   0.81334
     0.950     0.920    0.8736   0.80336
     0.947     0.916    0.8669   0.79347
     0.944     0.911    0.8603   0.78362
     0.941     0.907    0.8536   0.77382
     0.936     0.898    0.8402   0.75426
     0.933     0.894    0.8335   0.74448
     0.930     0.889    0.8267   0.73474
     0.927     0.885    0.8200   0.72505
     0.924     0.880    0.8132   0.71527
     0.921     0.876    0.8063   0.70554
     0.918     0.871    0.7995   0.69586
     0.915     0.866    0.7926   0.68626
     0.912     0.862    0.7858   0.67674
     0.906     0.853    0.7722   0.65784
     0.897     0.838    0.7516   0.62967
     0.894     0.834    0.7447   0.62035
     0.891     0.829    0.7378   0.61102
     0.887     0.824    0.7309   0.60178
     0.884     0.819    0.7239   0.59247
     0.881     0.814    0.7169   0.58319
     0.878     0.809    0.7099   0.57396
     0.871     0.799    0.6957   0.55556
     0.868     0.794    0.6887   0.54644
     0.864     0.789    0.6813   0.53705
     0.860     0.784    0.6739   0.52762
     0.857     0.778    0.6665   0.51826
     0.853     0.773    0.6590   0.50885
     0.849     0.768    0.6515   0.49951
     0.846     0.762    0.6439   0.49013
     0.842     0.757    0.6363   0.48074
     0.838     0.751    0.6286   0.47130
     0.834     0.745    0.6207   0.46173
     0.830     0.739    0.6126   0.45203
     0.826     0.733    0.6045   0.44246
     0.821     0.727    0.5965   0.43296
     0.817     0.721    0.5884   0.42349
     0.813     0.715    0.5802   0.41397
     0.808     0.708    0.5720   0.40458
     0.804     0.702    0.5638   0.39514
     0.799     0.696    0.5555   0.38583
     0.795     0.689    0.5474   0.37666
     0.790     0.683    0.5391   0.36752
     0.786     0.676    0.5308   0.35841
     0.781     0.670    0.5225   0.34938
     0.771     0.657    0.5058   0.33143
     0.766     0.650    0.4975   0.32264
     0.762     0.643    0.4892   0.31398
     0.757     0.636    0.4808   0.30532
     0.751     0.629    0.4720   0.29633
     0.746     0.622    0.4631   0.28729
     0.740     0.614    0.4540   0.27818
     0.734     0.606    0.4447   0.26907
     0.729     0.599    0.4356   0.26014
     0.723     0.591    0.4264   0.25132
     0.717     0.583    0.4172   0.24262
     0.711     0.575    0.4081   0.23411
     0.705     0.567    0.3991   0.22578
     0.699     0.559    0.3902   0.21768
     0.686     0.544    0.3725   0.20189
     0.680     0.536    0.3637   0.19426
     0.668     0.520    0.3463   0.17939
     0.661     0.512    0.3376   0.17217
     0.655     0.503    0.3287   0.16487
     0.647     0.494    0.3193   0.15727
     0.640     0.485    0.3099   0.14989
     0.633     0.476    0.3007   0.14273
     0.625     0.467    0.2915   0.13570
     0.618     0.458    0.2824   0.12893
     0.610     0.449    0.2734   0.12230
     0.602     0.440    0.2644   0.11588
     0.595     0.431    0.2556   0.10967
     0.586     0.421    0.2464   0.10337
     0.578     0.411    0.2372   0.09720
     0.570     0.402    0.2282   0.09127
     0.560     0.391    0.2188   0.08524
     0.551     0.381    0.2093   0.07936
     0.542     0.370    0.2000   0.07374
     0.531     0.359    0.1902   0.06794
     0.520     0.347    0.1800   0.06214
     0.509     0.335    0.1698   0.05653
     0.497     0.322    0.1596   0.05112
     0.485     0.310    0.1496   0.04604
     0.472     0.297    0.1396   0.04116
     0.459     0.283    0.1295   0.03647
     0.446     0.270    0.1198   0.03214
     0.432     0.257    0.1104   0.02813
     0.418     0.243    0.1012   0.02444
     0.403     0.230    0.0923   0.02107
     0.388     0.216    0.0834   0.01789
     0.373     0.202    0.0751   0.01508
     0.355     0.187    0.0661   0.01227
     0.335     0.170    0.0568   0.00960
     0.315     0.154    0.0481   0.00732
     0.288     0.133    0.0382   0.00504
g <- ggsurvplot(s, conf.int = F)

The plot

g

Discussion of survival curves

  • Best survival is teal-blue curve, stratum 5, females with ph.ecog score 0.

  • Next best: blue, stratum 6, females with score 1, and red, stratum 1, males score 0.

  • Worst: green, stratum 4, males score 3.

  • For any given ph.ecog score, females have better predicted survival than males.

  • For both genders, a lower score associated with better survival.

The coefficients in model 3

tidy(lung.3) %>% select(term, estimate, p.value)
# A tibble: 2 × 3
  term    estimate  p.value
  <chr>      <dbl>    <dbl>
1 sex       -0.510 0.00958 
2 ph.ecog    0.483 0.000266
summary(lung.3)
Call:
coxph(formula = resp ~ sex + ph.ecog, data = lung.complete)

  n= 167, number of events= 120 

           coef exp(coef) se(coef)      z Pr(>|z|)    
sex     -0.5101    0.6004   0.1969 -2.591 0.009579 ** 
ph.ecog  0.4825    1.6201   0.1323  3.647 0.000266 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

        exp(coef) exp(-coef) lower .95 upper .95
sex        0.6004     1.6655    0.4082    0.8832
ph.ecog    1.6201     0.6172    1.2501    2.0998

Concordance= 0.641  (se = 0.031 )
Likelihood ratio test= 19.48  on 2 df,   p=6e-05
Wald test            = 19.35  on 2 df,   p=6e-05
Score (logrank) test = 19.62  on 2 df,   p=5e-05
  • sex coeff negative, so being higher sex value (female) goes with less hazard of dying.

  • ph.ecog coeff positive, so higher ph.ecog score goes with more hazard of dying

  • Two coeffs about same size, so being male rather than female corresponds to 1-point increase in ph.ecog score. Note how survival curves come in 3 pairs plus 2 odd.

Martingale residuals for this model

No problems here:

ggcoxdiagnostics(lung.3) + geom_smooth(se = F)
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

When the Cox model fails

  • Invent some data where survival is best at middling age, and worse at high and low age:
age <- seq(20, 60, 5)
survtime <- c(10, 12, 11, 21, 15, 20, 8, 9, 11)
stat <- c(1, 1, 1, 1, 0, 1, 1, 1, 1)
d <- tibble(age, survtime, stat)
d %>% mutate(y = Surv(survtime, stat)) -> d
  • Small survival time 15 in middle was actually censored, so would have been longer if observed.

Fit Cox model

y.1 <- coxph(y ~ age, data = d)
summary(y.1)
Call:
coxph(formula = y ~ age, data = d)

  n= 9, number of events= 8 

       coef exp(coef) se(coef)     z Pr(>|z|)
age 0.01984   1.02003  0.03446 0.576    0.565

    exp(coef) exp(-coef) lower .95 upper .95
age      1.02     0.9804    0.9534     1.091

Concordance= 0.545  (se = 0.105 )
Likelihood ratio test= 0.33  on 1 df,   p=0.6
Wald test            = 0.33  on 1 df,   p=0.6
Score (logrank) test = 0.33  on 1 df,   p=0.6

Martingale residuals

Down-and-up indicates incorrect relationship between age and survival:

ggcoxdiagnostics(y.1) + geom_smooth(se = F)

Attempt 2

Add squared term in age:

y.2 <- coxph(y ~ age + I(age^2), data = d)
tidy(y.2) %>% select(term, estimate, p.value)
# A tibble: 2 × 3
  term     estimate p.value
  <chr>       <dbl>   <dbl>
1 age      -0.380    0.116 
2 I(age^2)  0.00483  0.0977
  • (Marginally) helpful.

Martingale residuals this time

Not great, but less problematic than before:

ggcoxdiagnostics(y.2) + geom_smooth(se = F)