Simulating event times (Simulation series 1)

Introduction
In this post, I describe how to simulate HIV infection times from an exponential
distribution. I begin by simulating the unconditional infection times. Then, I show how to condition the infection times on covariate data using a Cox Proportional Hazards model. I specifically demonstrate this procedure with R
code using two covariates (age, sex).
Simulating unconditional infection times
Let
where
To simulate the infection times, I use the Inverse Transform
Theorem (ITT), which states that if
which gives the procedure for simulating the infection times.
Simulating conditional infection times
The choice of
It is common to model infection times using a Cox Proportional Hazards (PH) model:
where
Then we have (see the referenced post):
We can obtain
and then invoke ITT to isolate
We now have a generalized formula for generating infection times from a uniform
distribution. The next step is to derive
for
HIV cohort: testing and censoring
The final step of this simulation exercise is to generate the
Let
Let
R code
The R
code below shows the procedure for generating the covariate values, the censoring indicator, and the infection times.
gen_data <- function(N, lambda, beta_1, beta_2) {
# gen covariate data
x1 <- rnorm(N, 0, 4)
x2 <- rbinom(N, 1, 0.6)
X <- as.matrix(cbind(x1, x2))
betas <- matrix(c(beta_1, beta_2))
# gen uniforms
u <- runif(N)
# gen the infections times
itime <- -log(u)/(lambda * exp(X %*% betas))
# Gen censoring times
ctime <- rexp(N, rate = lambda - 0.1)
# get the right censored time
time <- pmin(itime, ctime)
# got HIV or not during obs period
hiv <- as.numeric(itime <= ctime)
dat <- data.frame(id = 1:N, time = time, hiv = hiv, x1 = x1, x2 = x2)
return(dat)
}
Next, I pass the generated data to the coxph
function. This function estimates
library(survival)
do_sim <- function(N, lambda = 1, beta_1 = 0.05, beta_2 = 1.5) {
# gen a dataset
dat <- gen_data(N, lambda, beta_1, beta_2)
# get the beta estimates
bhat <- coxph(Surv(time, hiv) ~ x1 + x2, data = dat)
return(bhat$coefficients)
}
Let coxph
function and extract the
estimated coefficients in a
set.seed(100400)
K <- 1000
# take the mean of the estimates
bhats <- sapply(seq(K), function(x) do_sim(200))
beta_mn <- apply(bhats, 1, mean)
beta_sd <- apply(bhats, 1, sd)
beta_mn; beta_sd
x1 x2
0.05120952 1.51963760
x1 x2
0.02239449 0.21762811
# get the average over K runs
b1 <- cumsum(bhats[1, ])/(1:K)
b2 <- cumsum(bhats[2, ])/(1:K)
library(RColorBrewer)
Blues <- brewer.pal(4, "Blues")
plot(1:K, b1, type = "l", bty = "l", lwd =2, ylim = c(0.02, 0.1),
xlab = "Iterations (K)", ylab = "Estimate",
main = "Age coefficient", col = Blues[3])
abline(h = 0.05, lty = 2)

plot(1:K, b2, type = "l", bty = "l", lwd = 2, ylim = c(1.0, 2.0),
xlab = "Iterations (K)", ylab = "Estimate",
main = "Sex coefficient", col = Blues[3])
abline(h = 1.5, lty = 2)

The two figures show the results of the simulation exercise, where the left
and right panels are the estimates for the Age
and Sex
coefficients, respectively. Both panels show that there is some variability in
the estimates for
Proof of Inverse Transform Theorem:
Let
which is the cdf of the