Survival Methods for left truncated data (Survival Series 3)

Last updated on Nov 11, 2022 2 min read

Introduction

In this post I relate the survival methods and kernel smoothing methods.

In Post 1 of this series, I showed how the probability density can be calculated for exact failure times.

$\hat{f} (t) = \frac{number of persons failing in interval t}{(number of persons at t_{0}) \times (interval width)} .$

In the kernel smoothing post I showed that

$\hat{f} (x) = \frac{1}{n} \times \frac{number of X_{i} in same bin as x}{width of bin containing x} .$

These two equations are equivalent. To demonstrate this, consider the data used in Post 1. Using the first equation, we have:

n <- c(40, 35, 28, 22, 18, 13, 9, 5, 5, 3, 2)
hiv  <- c(5, 7, 6, 4, 5, 4, 4, 0, 2, 1, 2)
ft <- ldat$hiv / (ldat$n[1] * 5)
ft
 [1] 0.025 0.035 0.030 0.020 0.025 0.020 0.020 0.000 0.010 0.005 0.010

For the second equation, we use the hist function:

p1 <- hist(time1, breaks = seq(0, 55, 5), plot = FALSE)
p1$density
 [1] 0.025 0.035 0.030 0.020 0.025 0.020 0.020 0.000 0.010 0.005 0.010

which gives the same result.

We can also estimate the density using with a Gaussian kernel and binwidth $= 2.3$ . To do this, we reshape the data from aggregate to individual level data.

time1  <- c(rep(5, 5), rep(10, 7), rep(15, 6),
  rep(20, 4), rep(25, 5), rep(30, 4), rep(35, 4),
  rep(45, 2), rep(50, 1), rep(55, 2))

ft2 <-  density(time1, kernel = "gaussian", 
  bw = 2.3, from = 0, to = 55, cut = 5, n = 12)

hist(time1, breaks = seq(0, 55, 5), freq = FALSE, col = "white",
  main = "probability density")
lines(ft2$x, ft2$y, col = "red")

R Statistics Data analytics

Survival Methods for left truncated data (Survival Series 3)

Introduction

Alain Vandormael

Senior Data Scientist, PhD, MSc