Survival methods for right censored data (Survival Series 2)

Introduction
In Post 1 of this series, I introduced the survivor and hazard functions for survival data. One motivation for survival analysis is the topic of censoring. Censoring is defined as when the event occurs but the participant is no longer under observation. There are generally three reasons for right-censored data: 1) the participant did not experience the event by the end of the study period, 2) the participant prematurely withdrew from the study for some known reason, 3) the participant was lost to follow-up for unknown reasons before the study ended. In the case of censoring, some information is provided about the individual’s survival time, but we do not know the survival time exactly. The study of survival data requires the careful treatment of these censored events.
Let
Let
Kaplan-Meier survivor function
For right-censored data, we can the use the Kaplan-Meier (KM) estimator to compute
where
Imagine we are interested in the time to HIV infection. The expression for
To demonstrate how R
.
# Data from Klein & Kleinbaum Chap 2, pg 51
# censored times group 1 only
time1 <- c(6, 6, 6, 7, 10, 13, 16, 22, 23)
# uncensored times
time0 <- c(6, 9, 10, 11, 17, 19, 20, 25, 32, 32, 34, 35)
time <- c(time1, time0)
cens <- c(rep(1, length(time1)), rep(0, length(time0)))
Next, I use the survival
package to set up the data and obtain the relevant estimates. For now, I show the data.frame of the number at risk, the number of events, and the number censored at time
set_surv <- survival::Surv(time, cens)
res <- survival::survfit(set_surv ~ 1)
sdat <- with(res, data.frame(time, n.risk, n.event, n.censor))
head(sdat)
time n.risk n.event n.censor
1 6 21 3 1
2 7 17 1 0
3 9 16 0 1
4 10 15 1 1
5 11 13 0 1
6 13 12 1 0
The first 3 events occurs at time R
to calculate the inner function,
# calculate surv probabilities st
st <- (sdat$n.risk - sdat$n.event) / sdat$n.risk
st
[1] 0.8571429 0.9411765 1.0000000 0.9333333 1.0000000 0.9166667 0.9090909
[8] 1.0000000 1.0000000 1.0000000 0.8571429 0.8333333 1.0000000 1.0000000
[15] 1.0000000 1.0000000
# multiply probabilities at t
St <- cumprod(st)
St
[1] 0.8571429 0.8067227 0.8067227 0.7529412 0.7529412 0.6901961 0.6274510
[8] 0.6274510 0.6274510 0.6274510 0.5378151 0.4481793 0.4481793 0.4481793
[15] 0.4481793 0.4481793
Fortunately, the surv
library calculates the survival probabilities for you, which can
be obtained using the summary
function.
sres <- summary(res)
sres
Call: survfit(formula = set_surv ~ 1)
time n.risk n.event survival std.err lower 95% CI upper 95% CI
6 21 3 0.857 0.0764 0.720 1.000
7 17 1 0.807 0.0869 0.653 0.996
10 15 1 0.753 0.0963 0.586 0.968
13 12 1 0.690 0.1068 0.510 0.935
16 11 1 0.627 0.1141 0.439 0.896
22 7 1 0.538 0.1282 0.337 0.858
23 6 1 0.448 0.1346 0.249 0.807
Recall that
This expression says that the probability of being infected at time
# get the previous interval estimate of survival
Sji <- c(1, St[-length(St)])
ft <- (sdat$n.event / sdat$n.risk) * Sji
ft
[1] 0.14285714 0.05042017 0.00000000 0.05378151 0.00000000 0.06274510
[7] 0.06274510 0.00000000 0.00000000 0.00000000 0.08963585 0.08963585
[13] 0.00000000 0.00000000 0.00000000 0.00000000
We can double check this result with a second method for computing
Ft <- 1 - c(1, St)
ft2 <- diff(Ft)
ft2
[1] 0.14285714 0.05042017 0.00000000 0.05378151 0.00000000 0.06274510
[7] 0.06274510 0.00000000 0.00000000 0.00000000 0.08963585 0.08963585
[13] 0.00000000 0.00000000 0.00000000 0.00000000
We can also calculate the cumulative hazard function for right-censored data using the Nelson-Aalen estimator, given as:
where survival
results.
res$cumhaz
[1] 0.1428571 0.2016807 0.2016807 0.2683473 0.2683473 0.3516807 0.4425898
[8] 0.4425898 0.4425898 0.4425898 0.5854469 0.7521136 0.7521136 0.7521136
[15] 0.7521136 0.7521136
Random censoring
Random censoring, also known as Type III censoring, occurs when a study period is
fixed, but participants enter the study at different times. The definition of right censoring under random censoring is the same. Consider the data used from the previous
example where
# everyone starts at t0
t0 <- rep(0, length(cens))
# except these participants
t0[c(7, 8)] <- 9
t0[c(16, 19)] <- 13
t0[18] <- 15
Kaplan-Meier
In computing the survival function, R
must now account for these delayed entries. We can easily do this using the Surv
function by specifying the time
(the time the participant started) and time2
(the time the participant was censored) arguments.
sobj <- survival::Surv(time = t0, time2 = time, event = cens)
res1 <- survival::survfit(sobj ~ 1)
sdat1 <- with(res1, data.frame(time.t = time, n.risk,
n.enter = as.integer(n.enter), n.censor, n.event,
surv))
sdat1
time.t n.risk n.enter n.censor n.event surv
1 6 16 16 1 3 0.8125000
2 7 12 0 0 1 0.7447917
3 9 11 0 1 0 0.7447917
4 10 12 2 1 1 0.6827257
5 11 10 0 1 0 0.6827257
6 13 9 0 0 1 0.6068673
7 16 11 3 0 1 0.5516975
8 17 10 0 1 0 0.5516975
9 19 9 0 1 0 0.5516975
10 20 8 0 1 0 0.5516975
11 22 7 0 0 1 0.4728836
12 23 6 0 0 1 0.3940697
13 25 5 0 1 0 0.3940697
14 32 4 0 2 0 0.3940697
15 34 2 0 1 0 0.3940697
16 35 1 0 1 0 0.3940697
We now have a new n.enter
column in the output. At
# calculate inner function probabilities
st1 <- (sdat1$n.risk - sdat1$n.event) / sdat1$n.risk
# multiply probabilities to get KM estimates
St1 <- cumprod(st1)
St1
[1] 0.8125000 0.7447917 0.7447917 0.6827257 0.6827257 0.6068673 0.5516975
[8] 0.5516975 0.5516975 0.5516975 0.4728836 0.3940697 0.3940697 0.3940697
[15] 0.3940697 0.3940697
Probability density function
We can also confirm that two methods for obtaining the probability
density function
# first method
Sji <- c(1, St1[-length(St1)])
ft3 <- (sdat1$n.event / sdat1$n.risk) * Sji
ft3
[1] 0.18750000 0.06770833 0.00000000 0.06206597 0.00000000 0.07585841
[7] 0.05516975 0.00000000 0.00000000 0.00000000 0.07881393 0.07881393
[13] 0.00000000 0.00000000 0.00000000 0.00000000
# second method
Ft1 <- 1 - c(1, St1)
ft4 <- diff(Ft1)
ft4
[1] 0.18750000 0.06770833 0.00000000 0.06206597 0.00000000 0.07585841
[7] 0.05516975 0.00000000 0.00000000 0.00000000 0.07881393 0.07881393
[13] 0.00000000 0.00000000 0.00000000 0.00000000