Survival Methods for left truncated data (Survival Series 3)
Introduction
In this post I relate the survival methods and kernel smoothing methods.
In Post 1 of this series, I showed how the probability density can be calculated for exact failure times.
\begin{equation*} \hat{f}(t) = \frac{\text{number of persons failing in interval } t}{(\text{number of persons at } t_0) \times (\text{interval width})}. \end{equation*}
In the kernel smoothing post I showed that
\begin{equation*} \hat{f}(x) = \frac{1}{n} \times \frac{\text{number of } X_i \text{ in same bin as } x}{\text{width of bin containing } x}. \end{equation*}
These two equations are equivalent. To demonstrate this, consider the data used in Post 1. Using the first equation, we have:
n <- c(40, 35, 28, 22, 18, 13, 9, 5, 5, 3, 2)
hiv <- c(5, 7, 6, 4, 5, 4, 4, 0, 2, 1, 2)
ft <- ldat$hiv / (ldat$n[1] * 5)
ft
[1] 0.025 0.035 0.030 0.020 0.025 0.020 0.020 0.000 0.010 0.005 0.010
For the second equation, we use the hist
function:
p1 <- hist(time1, breaks = seq(0, 55, 5), plot = FALSE)
p1$density
[1] 0.025 0.035 0.030 0.020 0.025 0.020 0.020 0.000 0.010 0.005 0.010
which gives the same result.
We can also estimate the density using with a Gaussian kernel and binwidth $=2.3$. To do this, we reshape the data from aggregate to individual level data.
time1 <- c(rep(5, 5), rep(10, 7), rep(15, 6),
rep(20, 4), rep(25, 5), rep(30, 4), rep(35, 4),
rep(45, 2), rep(50, 1), rep(55, 2))
ft2 <- density(time1, kernel = "gaussian",
bw = 2.3, from = 0, to = 55, cut = 5, n = 12)
hist(time1, breaks = seq(0, 55, 5), freq = FALSE, col = "white",
main = "probability density")
lines(ft2$x, ft2$y, col = "red")