Simulating event times (Simulation series 2)

Introduction
I extend the work presented in the first
post of this series to
demonstrate how to simulate infections times from a Cox Proportional Hazards (CPH)
model conditional on one dichotomous time-dependent covariate. I show the
mathematical derivations and demonstrate a simulation example for the dichotomous
covariate case using R
code.
Simulating conditional infection times
Let
with cumulative hazard function (CHF):
Choose a dichotomous variable that can change its value at most once, for
example: from treated to untreated, single to married, high school to
college degree. Let
where
We therefore have:
The two ranges of the piece-wise function correspond to
When
For
if
Recalling that
and if
HIV cohort: testing and censoring
The next step of this simulation exercise is to generate the infection times
conditional on one time-independent covariate
Let
Let
R code
The main function gen_dich
generates a single
dataset for the
library(survival)
# Func to gen the data
gen_dich <- function(i) {
lambda <- 0.2
# Number of FU times
ntimes <- 10
# Generate covariate vals
x1 <- round(runif(1, 0, 1), 2)
time <- 1:ntimes
t0 <- sample(seq(ntimes), 1) * rbinom(1, 1, 0.3)
x2 <- as.numeric(time > t0)
u <- -log(runif(1))
b1x1 <- as.vector(x1 %*% 0.5)
# Generate infection time
t1 <- lambda * exp(b1x1) * t0
t2 <- lambda * exp(b1x1 + -0.5)
if (u < t1) {
itime <- u / (lambda * exp(b1x1))
} else if (u >= t1) {
itime <- (u - t1 + (t2 * t0)) / t2
}
# Make a censoring event
ctime <- rexp(1, rate = lambda - 0.01)
if (ctime < itime) {
hiv <- 0
} else {
hiv <- as.numeric(itime < time)
}
# Get the censoring time
mintime <- pmin(itime, ctime)
start <- time - 1
dat <- data.frame(id = i, start, time, ctime, itime, hiv, x1, x2)
dat <- dat[time <= ceiling(mintime), ]
return(dat)
}
An example of the dataset for the first participant is given below, which shows
that x2
, the marital status variable, changes over time for the
fourth participant.
set.seed(187900)
lapply(seq(10), gen_dich)[4]
[[1]]
id start time ctime itime hiv x1 x2
1 4 0 1 11.29535 11.38225 0 0.14 0
2 4 1 2 11.29535 11.38225 0 0.14 0
3 4 2 3 11.29535 11.38225 0 0.14 0
4 4 3 4 11.29535 11.38225 0 0.14 0
5 4 4 5 11.29535 11.38225 0 0.14 0
6 4 5 6 11.29535 11.38225 0 0.14 0
7 4 6 7 11.29535 11.38225 0 0.14 0
8 4 7 8 11.29535 11.38225 0 0.14 1
9 4 8 9 11.29535 11.38225 0 0.14 1
10 4 9 10 11.29535 11.38225 0 0.14 1
Next, I write a function to pass the data to the coxph
function. This
function estimates
# Function to run the simulations
do_sim <- function(n, f) {
dat <- do.call(rbind, lapply(seq(n), f))
bhat <- coxph(Surv(start, time, hiv) ~ x1 + x2 + cluster(id), data = dat)
return(bhat$coefficients)
}
Let
set.seed(100400)
K <- 400
# Take the mean of the estimates
bhats <- sapply(seq(K), function(x) do_sim(400, gen_dich))
beta_mn <- apply(bhats, 1, mean)
beta_sd <- apply(bhats, 1, sd)
beta_mn; beta_sd
The two figures show the results of the simulation exercise, where the top
and bottom panels are the estimates for the age and marital status
coefficients, respectively. Both panels show that there is some variability in
the estimates for

