Survival Methods for left truncated data (Survival Series 3)

Introduction

In this post I relate the survival methods and kernel smoothing methods.

In Post 1 of this series, I showed how the probability density can be calculated for exact failure times.

\begin{equation*} \hat{f}(t) = \frac{\text{number of persons failing in interval } t}{(\text{number of persons at } t_0) \times (\text{interval width})}. \end{equation*}

In the kernel smoothing post I showed that

\begin{equation*} \hat{f}(x) = \frac{1}{n} \times \frac{\text{number of } X_i \text{ in same bin as } x}{\text{width of bin containing } x}. \end{equation*}

These two equations are equivalent. To demonstrate this, consider the data used in Post 1. Using the first equation, we have:

n <- c(40, 35, 28, 22, 18, 13, 9, 5, 5, 3, 2)
hiv  <- c(5, 7, 6, 4, 5, 4, 4, 0, 2, 1, 2)
ft <- ldat$hiv / (ldat$n[1] * 5)
ft
 [1] 0.025 0.035 0.030 0.020 0.025 0.020 0.020 0.000 0.010 0.005 0.010

For the second equation, we use the hist function:

p1 <- hist(time1, breaks = seq(0, 55, 5), plot = FALSE)
p1$density
 [1] 0.025 0.035 0.030 0.020 0.025 0.020 0.020 0.000 0.010 0.005 0.010

which gives the same result.

We can also estimate the density using with a Gaussian kernel and binwidth $=2.3$. To do this, we reshape the data from aggregate to individual level data.

time1  <- c(rep(5, 5), rep(10, 7), rep(15, 6),
  rep(20, 4), rep(25, 5), rep(30, 4), rep(35, 4),
  rep(45, 2), rep(50, 1), rep(55, 2))

ft2 <-  density(time1, kernel = "gaussian", 
  bw = 2.3, from = 0, to = 55, cut = 5, n = 12)
hist(time1, breaks = seq(0, 55, 5), freq = FALSE, col = "white",
  main = "probability density")
lines(ft2$x, ft2$y, col = "red")
Alain Vandormael
Alain Vandormael
Senior Data Scientist, PhD, MSc