Kernel smoothed hazard rates (Survival Series 4)

Introduction
In an earlier post, I introduced a kernel smoothing method for univariate data. The aim of this post is to describe how to use survival and kernel smoothing methods to estimate the hazard rate function. I do so using an example of 5 exact HIV-infection times.
Kernel smoothing
As a brief review, let
where
One common choice for
which is
The kernel hazard functions for

Smoothing the hazard rates using survival weights
Previously, Wand and Jones proposed a smoothing approach to the hazard rate with survival weights. This is:
where
is a slightly modifed empirical distribution function that avoids division by
zero. The idea is that a survival weight at
In the R
code below I show the step by step calculations. First, I define a
function (edf
) to calculate the empirical distribution.
# func to calculate edf
edf <- function(x, Xi, adjust = TRUE) {
n <- length(Xi)
if (adjust) n <- n + 1
sXi <- sort(Xi)
sum(sXi <= x) / n
}
# calculate surv func s(t)
surv <- 1 - sapply(c(0, Xi), edf, Xi = Xi)
surv
[1] 1.0000000 0.8333333 0.6666667 0.5000000 0.3333333 0.1666667
Then, for each
# plot
set1 <- RColorBrewer::brewer.pal(7, "Set1")
out <- matrix(NA, nrow = length(Xi), ncol = length(x))
plot(c(0, Xi), surv, type = "s", ylab = "Rate", xlab = "Time (x)",
bty = "n", ylim = c(0, 1), xlim = c(0, 6), lty = 2)
# collect density at xi
for (i in Xi) {
pw <- 1 - edf(i, Xi) # get weight at Xi
haz <- kern(i, x, b = 0.6) / (pw * length(Xi))
out[which(Xi == i), ] <- haz
lines(x, haz, lty = 2, lwd =2, col = set1[which(Xi == i)])
}
points(Xi, rep(0, length(Xi)), pch = 18, cex = 1.8,
col = set1[seq(length(Xi))])
# get overall hazard rate, sum because already divide by length(Xi)
hazard <- apply(out, 2, sum)
lines(x, hazard)

The figure shows the hazard rate estimates (solid line), which is a function of the
survivor function (dashed step line). An important point to make is that the kernel
densities of the ordered observations increase as the probability of survival
decreases over time. This is because the survival estimate at each
Smoothing the hazard rate using the jump sizes of the cumulative hazard function
An alternative approach is kernel smoothed estimates derived from the cumulative
hazard function (
Theoretically, the above equation tells us that the hazard rate,
We can then smooth using:
where
Using the same 5 HIV-infection times as before, I first calculate the hazard rates by
dividing the number of infections by the number at risk at time survival
package.
Xi <- c(1.5, 2.0, 2.2, 3.8, 4.5)
set_surv <- survival::Surv(Xi)
dat <- survival::survfit(set_surv ~ 1)
sdat <- summary(dat)
sdat
Call: survfit(formula = set_surv ~ 1)
time n.risk n.event survival std.err lower 95% CI upper 95% CI
1.5 5 1 0.8 0.179 0.5161 1
2.0 4 1 0.6 0.219 0.2933 1
2.2 3 1 0.4 0.219 0.1367 1
3.8 2 1 0.2 0.179 0.0346 1
4.5 1 1 0.0 NaN NA NA
ht <- sdat$n.event / sdat$n.risk
ht
[1] 0.2000000 0.2500000 0.3333333 0.5000000 1.0000000
# alternative
ht2 <- diff(c(0, sdat$cumhaz))
ht2
[1] 0.2000000 0.2500000 0.3333333 0.5000000 1.0000000
I now plot the data and show the individual density estimates for each
x <- seq(0, 6, 0.1)
out <- matrix(NA, nrow = length(Xi), ncol = length(x))
# collect density at xi
plot(c(0, Xi, 6), c(0, ht, 1), type = "s", ylab = "Rate", xlab = "Time (x)",
bty = "n", ylim = c(0, 1.0), xlim = c(0, 6), lty = 2)
for (i in Xi) {
haz <- kern(i, x, b = 0.5) * ht[which(Xi == i)]
out[which(Xi == i), ] <- haz
lines(x, haz,
lty = 2, lwd =2, col = set1[which(Xi == i)])
}
hazard <- apply(out, 2, sum) # note that kern already divided by b
lines(x, hazard)
points(Xi, rep(0, length(Xi)), pch = 18, cex = 1.8,
col = set1[seq(length(Xi))])
legend("topleft", c("Hazard rate", "Hazard rate smoothed"),
lty = c(2, 1), col = "black", bty = "n")
