Kernel smoothed hazard rates (Survival Series 4)
Introduction
In an earlier post, I introduced a kernel smoothing method for univariate data. The aim of this post is to describe how to use survival and kernel smoothing methods to estimate the hazard rate function. I do so using an example of 5 exact HIV-infection times.
Kernel smoothing
As a brief review, let $X_1,\dots,X_n$ be a set of continuous random variables having a common probability density function $f$. The kernel density estimator is:
\begin{equation} \hat{f}(x;b) = (nb)^{-1} \sum_{i = 1}^n K\bigg([x - X_i]/b \bigg) \end{equation}
where $K$ is called the kernel and satisfies $\int K(x), dx =1$ and $b$ is the bandwidth (for $b >0$). This equation can be written more compactly by rescaling $K_b(u) = b^{-1} K(u/b)$ such that:
\begin{equation} \hat{f}(x;b) = n^{-1} \sum_{i = 1}^n K_b(x - X_i). \end{equation}
One common choice for $K_b$ is the Gaussian density, $N(0, \sigma^2)$,
\begin{equation} \phi(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(\frac{-x^2}{2 \sigma^2}\right), \end{equation}
which is $N(0, 1)$ for the standard normal density. Consider obtaining the kernel smoothed estimates with (bandwidth $b = 0.6$) for 5 HIV-infection times:
$$ X = {1.5, 2.0, 2.2, 3.8, 4.5}. $$
The kernel hazard functions for $X_i$ (the dashed color lines) and the overall density estimate (the solid line) are shown in the Figure below.
Smoothing the hazard rates using survival weights
Previously, Wand and Jones proposed a smoothing approach to the hazard rate with survival weights. This is:
\begin{equation} \hat{h}(x; b) = n^{-1} \sum_{i=1}^n \frac{K_b (x - X_i)}{1 - \tilde{F}(X_i)}, \end{equation}
where
\begin{equation} \tilde{F}(x) = \frac{ \sum_{i=1}^n 1_{{X_i \leq x }}}{n-1} \end{equation}
is a slightly modifed empirical distribution function that avoids division by zero. The idea is that a survival weight at $X_i$ (the denominator) is applied to each participant’s kernel density (the numerator) so that the weight increases through the ordered $X_i$’s as the probability of survival decreases. The kernel weights at each $X$ are therefore inversely proportional to the height of $1 - \tilde{F}(X_i)$.
In the R
code below I show the step by step calculations. First, I define a
function (edf
) to calculate the empirical distribution.
# func to calculate edf
edf <- function(x, Xi, adjust = TRUE) {
n <- length(Xi)
if (adjust) n <- n + 1
sXi <- sort(Xi)
sum(sXi <= x) / n
}
# calculate surv func s(t)
surv <- 1 - sapply(c(0, Xi), edf, Xi = Xi)
surv
[1] 1.0000000 0.8333333 0.6666667 0.5000000 0.3333333 0.1666667
Then, for each $X_i$, I multiply the kernel estimate of all $x$ by the inverse of the survival weight at $X_i$, which gives the hazard rate estimates, $\hat{h}(x)$.
# plot
set1 <- RColorBrewer::brewer.pal(7, "Set1")
out <- matrix(NA, nrow = length(Xi), ncol = length(x))
plot(c(0, Xi), surv, type = "s", ylab = "Rate", xlab = "Time (x)",
bty = "n", ylim = c(0, 1), xlim = c(0, 6), lty = 2)
# collect density at xi
for (i in Xi) {
pw <- 1 - edf(i, Xi) # get weight at Xi
haz <- kern(i, x, b = 0.6) / (pw * length(Xi))
out[which(Xi == i), ] <- haz
lines(x, haz, lty = 2, lwd =2, col = set1[which(Xi == i)])
}
points(Xi, rep(0, length(Xi)), pch = 18, cex = 1.8,
col = set1[seq(length(Xi))])
# get overall hazard rate, sum because already divide by length(Xi)
hazard <- apply(out, 2, sum)
lines(x, hazard)
The figure shows the hazard rate estimates (solid line), which is a function of the survivor function (dashed step line). An important point to make is that the kernel densities of the ordered observations increase as the probability of survival decreases over time. This is because the survival estimate at each $X_i$ is being applied to the kernel density estimate. For example, the adjusted survival estimate at $\hat{S}(1.5) = 0.83$, and hence the weight is $1/0.83 = 1.2$, which is used to multiply all the kernel density estimates at $x$ for observation $X_1$. At $x = 4.5$, the survival estimate is $\hat{S}(4.5) = 0.167$ and so the survival weight for $X_5$ is $1/0.167 \approx 6$. This explains why the maximum of the kernel density estimate for $X_5$ is higher than the maximum estimate of $X_1$.
Smoothing the hazard rate using the jump sizes of the cumulative hazard function
An alternative approach is kernel smoothed estimates derived from the cumulative hazard function ($\hat{H}[t]$). In Part 1 and Part 2 of this series, I introduced the cumulative hazard function and showed how to estimate it from the data:
\begin{equation} H(t) = \int_{0}^{t} h(u)du. \end{equation}
Theoretically, the above equation tells us that the hazard rate, $h(t)$, is the derivative of $H(t)$. However, since the cumulative hazard estimate is a step function, it is not differentiable. One solution is to estimate the hazard rate by obtaining the step or jump sizes of the Nelson-Aalen cumulative hazard rate. Thus, for each observed HIV infection time, the estimated hazard contribution is:
\begin{equation} \Delta \hat{H}(X_i) = \hat{H}(X_i) - \hat{H}(X_{i-1}). \end{equation}
We can then smooth using:
\begin{equation} \hat{h}(x) = b^{-1} \sum_{i=1}^n K_b \bigg( \frac{x-X_i}{b} \bigg) \Delta \hat{H}(X_i), \end{equation}
where $n$ represents the number of infection times and $b$ is the bandwidth.
Using the same 5 HIV-infection times as before, I first calculate the hazard rates by
dividing the number of infections by the number at risk at time $X_i$. We can also
extract the hazard rates from the survival
package.
Xi <- c(1.5, 2.0, 2.2, 3.8, 4.5)
set_surv <- survival::Surv(Xi)
dat <- survival::survfit(set_surv ~ 1)
sdat <- summary(dat)
sdat
Call: survfit(formula = set_surv ~ 1)
time n.risk n.event survival std.err lower 95% CI upper 95% CI
1.5 5 1 0.8 0.179 0.5161 1
2.0 4 1 0.6 0.219 0.2933 1
2.2 3 1 0.4 0.219 0.1367 1
3.8 2 1 0.2 0.179 0.0346 1
4.5 1 1 0.0 NaN NA NA
ht <- sdat$n.event / sdat$n.risk
ht
[1] 0.2000000 0.2500000 0.3333333 0.5000000 1.0000000
# alternative
ht2 <- diff(c(0, sdat$cumhaz))
ht2
[1] 0.2000000 0.2500000 0.3333333 0.5000000 1.0000000
I now plot the data and show the individual density estimates for each $X_i$, which is multiplied by the hazard rate at $X_i$. The overall smoothed hazard rate function is shown in the solid line and the observed hazard rates and the non-smoothed hazard rate is shown in the dashed line.
x <- seq(0, 6, 0.1)
out <- matrix(NA, nrow = length(Xi), ncol = length(x))
# collect density at xi
plot(c(0, Xi, 6), c(0, ht, 1), type = "s", ylab = "Rate", xlab = "Time (x)",
bty = "n", ylim = c(0, 1.0), xlim = c(0, 6), lty = 2)
for (i in Xi) {
haz <- kern(i, x, b = 0.5) * ht[which(Xi == i)]
out[which(Xi == i), ] <- haz
lines(x, haz,
lty = 2, lwd =2, col = set1[which(Xi == i)])
}
hazard <- apply(out, 2, sum) # note that kern already divided by b
lines(x, hazard)
points(Xi, rep(0, length(Xi)), pch = 18, cex = 1.8,
col = set1[seq(length(Xi))])
legend("topleft", c("Hazard rate", "Hazard rate smoothed"),
lty = c(2, 1), col = "black", bty = "n")