Survival Methods for left truncated data (Survival Series 3)

Introduction

In this post I relate the survival methods and kernel smoothing methods.

In Post 1 of this series, I showed how the probability density can be calculated for exact failure times.

f^(t)=number of persons failing in interval t(number of persons at t0)×(interval width).

In the kernel smoothing post I showed that

f^(x)=1n×number of Xi in same bin as xwidth of bin containing x.

These two equations are equivalent. To demonstrate this, consider the data used in Post 1. Using the first equation, we have:

n <- c(40, 35, 28, 22, 18, 13, 9, 5, 5, 3, 2)
hiv  <- c(5, 7, 6, 4, 5, 4, 4, 0, 2, 1, 2)
ft <- ldat$hiv / (ldat$n[1] * 5)
ft
 [1] 0.025 0.035 0.030 0.020 0.025 0.020 0.020 0.000 0.010 0.005 0.010

For the second equation, we use the hist function:

p1 <- hist(time1, breaks = seq(0, 55, 5), plot = FALSE)
p1$density
 [1] 0.025 0.035 0.030 0.020 0.025 0.020 0.020 0.000 0.010 0.005 0.010

which gives the same result.

We can also estimate the density using with a Gaussian kernel and binwidth =2.3. To do this, we reshape the data from aggregate to individual level data.

time1  <- c(rep(5, 5), rep(10, 7), rep(15, 6),
  rep(20, 4), rep(25, 5), rep(30, 4), rep(35, 4),
  rep(45, 2), rep(50, 1), rep(55, 2))

ft2 <-  density(time1, kernel = "gaussian", 
  bw = 2.3, from = 0, to = 55, cut = 5, n = 12)
hist(time1, breaks = seq(0, 55, 5), freq = FALSE, col = "white",
  main = "probability density")
lines(ft2$x, ft2$y, col = "red")
Alain Vandormael
Alain Vandormael
Senior Data Scientist, PhD, MSc