Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Dynamic survival prediction for multivariate jo...

Graeme Hickey
December 17, 2017

Dynamic survival prediction for multivariate joint models using the R package joineRML

Presented at the 10th International Conference of the ERCIM WG on Computational and Methodological Statistics (CMStatistics 2017), Senate House , University of London, UK, 16-18 December

Graeme Hickey

December 17, 2017
Tweet

More Decks by Graeme Hickey

Other Decks in Research

Transcript

  1. Introduction Model Estimation Software & Example revisited Prediction References Dynamic

    survival prediction for multivariate joint models using the R package joineRML Graeme L. Hickey1, Pete Philipson2, Andrea Jorgensen1, Ruwanthi Kolamunnage-Dona1 1Department of Biostatistics, University of Liverpool, UK 2Mathematics, Physics and Electrical Engineering, University of Northumbria, UK [email protected] 17th December 2017 GL. Hickey Joint modelling of multivariate data 1 / 28
  2. Introduction Model Estimation Software & Example revisited Prediction References Preamble

    Funding Grant MR/M013227/1 Slides Available from www.glhickey.com GL. Hickey Joint modelling of multivariate data 2 / 28
  3. Introduction Model Estimation Software & Example revisited Prediction References Joint

    modelling Covariate data Longitudinal outcomes data Random effects GL. Hickey Joint modelling of multivariate data 3 / 28
  4. Introduction Model Estimation Software & Example revisited Prediction References Joint

    modelling Covariate data Time-to-event outcomes data Frailties GL. Hickey Joint modelling of multivariate data 3 / 28
  5. Introduction Model Estimation Software & Example revisited Prediction References Joint

    modelling Covariate data Time-to-event outcomes data Frailties Longitudinal outcomes data Random effects GL. Hickey Joint modelling of multivariate data 3 / 28
  6. Introduction Model Estimation Software & Example revisited Prediction References Joint

    modelling Covariate data Time-to-event outcomes data Frailties Longitudinal outcomes data Random effects GL. Hickey Joint modelling of multivariate data 3 / 28
  7. Introduction Model Estimation Software & Example revisited Prediction References Joint

    modelling Typically used for inference, but growing interest in using for dynamic prediction Typically used with a single longitudinal outcome, but growing interest in using multiple (or multivariate) longitudinal outcomes We will describe a recent –package that resolves these two issues GL. Hickey Joint modelling of multivariate data 4 / 28
  8. Introduction Model Estimation Software & Example revisited Prediction References Clinical

    example Consider a subset of 154 patients with primary biliary cirrhosis (PBC) randomized to placebo treatment from Mayo Clinic trial (Murtaugh et al. 1994) Multiple biomarkers repeatedly measured at intermittent times, of which we consider 3 clinically relevant ones: 1 serum bilirubin (mg/dl) 2 serum albumin (mg/dl) 3 prothrombin time (seconds) GL. Hickey Joint modelling of multivariate data 5 / 28
  9. Introduction Model Estimation Software & Example revisited Prediction References Prothrombin

    time (0.1 x seconds)−4 Albumin (mg dL) Serum bilirubin (loge mg dL) 0 5 10 0 5 10 0 5 10 0.0 0.5 1.0 1.5 2 4 6 8 −2 0 2 4 Time from registration (years) Alive Dead The natural disease progression of untreated patients with PBC enrolled into the trial GL. Hickey Joint modelling of multivariate data 6 / 28
  10. Introduction Model Estimation Software & Example revisited Prediction References 0

    2 4 6 8 10 12 14 0.0 0.2 0.4 0.6 0.8 1.0 Time from registration (years) Survival probability nevents = 69, 44.8% GL. Hickey Joint modelling of multivariate data 7 / 28
  11. Introduction Model Estimation Software & Example revisited Prediction References Data

    For each subject i = 1, . . . , n, we observe yi = (yi1 , . . . , yiK ) is a K-variate continuous outcome vector, where each yik denotes an (nik × 1)-vector of observed longitudinal measurements for the k-th outcome type: yik = (yi1k, . . . , yinik k) Observation times tijk for j = 1, . . . , nik, which can differ between subjects and outcomes (Ti , δi ), where Ti = min(T∗ i , Ci ), where T∗ i is the true event time, Ci corresponds to a potential right-censoring time, and δi is the failure indicator equal to 1 if the failure is observed (T∗ i ≤ Ci ) and 0 otherwise GL. Hickey Joint modelling of multivariate data 8 / 28
  12. Introduction Model Estimation Software & Example revisited Prediction References Longitudinal

    sub-model We extend the univariate joint model from Henderson et al. (2000): yik(t) = xik (t)βk + zik (t)bik + εik(t), with k = 1, . . . , K where xik(t) is a pk-vector of (possibly) time-varying covariates with corresponding fixed effect terms βk bik ∼ N(0, Dkk) subject-and-marker specific random effects cov(bik, bil ) = Dkl for k = l εik(t) is the model error term, which is i.i.d. N(0, σ2 k ) and independent of bik GL. Hickey Joint modelling of multivariate data 9 / 28
  13. Introduction Model Estimation Software & Example revisited Prediction References Time-to-event

    sub-model λi (t) = lim dt→0 P(t ≤ Ti < t + dt | Ti ≥ t) dt = λ0(t) exp vi (t)γv + Wi (t) , where λ0(·) is an unspecified baseline hazard function vi (t) is a q-vector of (possibly) time-varying covariates with corresponding fixed effect terms γv Wi (t) = K k=1 γyk{zik (t)bik} independent of the censoring process GL. Hickey Joint modelling of multivariate data 10 / 28
  14. Introduction Model Estimation Software & Example revisited Prediction References Likelihood

    The observed data likelihood is given by n i=1 ∞ −∞ f (yi | bi , θ)f (Ti , δi | bi , θ)f (bi | θ)dbi where θ = (β , vech(D), σ2 1 , . . . , σ2 K , λ0(t), γv , γy ), β = (β1 , . . . , βK ), bi = (bi1 , . . . , biK ) ∼ N(0, D) GL. Hickey Joint modelling of multivariate data 11 / 28
  15. Introduction Model Estimation Software & Example revisited Prediction References Estimation

    Estimation of θ done using Monte Carlo Expectation-Maximisation (MCEM) algorithm (treating the random effects as missing data) M-step: closed-form updates for all parameters except γ = (γv , γy ), so use a one-step Newton-Raphson iteration E-step: requires calculating several multidimensional integrals of form E h(bi ) | Ti , δi , yi ; ˆ θ , which can be evaluated by: 1 Monte Carlo integration 2 antithetic variates for variance reduction 3 quasi-Monte Carlo (low-discrepancy deterministic sequences, e.g Sobol sequence) Dynamic update rule for MC size and convergence criterion SE estimation by bootstrap or empirical information matrix approximation GL. Hickey Joint modelling of multivariate data 12 / 28
  16. Introduction Model Estimation Software & Example revisited Prediction References joineRML

    + getVarCov() vcov() fixef() ranef()* AIC() BIC() confint() formula() sampleData() dynSurv()* dynLong()* print() summary() plot() sigma() coef() update() baseHaz() residuals() fitted() logLik() bootSE() Rich collection of associated methods * associated with additional plot methods mjoint() Version 0.4.0 available on CRAN https://cran.r-project.org/web/packages/joineRML/ Developmental version available on GitHub https://github.com/graemeleehickey/joineRML Parallel Computing + GL. Hickey Joint modelling of multivariate data 13 / 28
  17. Introduction Model Estimation Software & Example revisited Prediction References Proposed

    model for PBC data Longitudinal sub-model log(serBilir) = (β0,1 + b0i,1 ) + (β1,1 + b1i,1 )year + εij1, albumin = (β0,2 + b0i,2 ) + (β1,2 + b1i,2 )year + εij2, (0.1 × prothrombin)−4 = (β0,3 + b0i,3 ) + (β1,3 + b1i,3 )year + εij3, bi ∼ N6 (0, D), and εijk ∼ N(0, σ2 k ) for k = 1, 2, 3; Time-to-event sub-model λi (t) = λ0 (t) exp {γv age + W2i (t)} , W2i (t) = γbil (b0i,1 + b1i,1 t) + γalb (b0i,2 + b1i,2 t) + γpro (b0i,3 + b1i,3 t). GL. Hickey Joint modelling of multivariate data 14 / 28
  18. Introduction Model Estimation Software & Example revisited Prediction References Example

    code data(pbc2) placebo <- subset(pbc2, drug == "placebo") fit.pbc <- mjoint( formLongFixed = list( "bil" = log(serBilir) ˜ year, "alb" = albumin ˜ year, "pro" = (0.1 * prothrombin)ˆ-4 ˜ year), formLongRandom = list( "bil" = ˜ year | id, "alb" = ˜ year | id, "pro" = ˜ year | id), formSurv = Surv(years, status2) ˜ age, data = placebo, timeVar = "year", control = list(tol0 = 0.001, burin = 400)) GL. Hickey Joint modelling of multivariate data 15 / 28
  19. Introduction Model Estimation Software & Example revisited Prediction References Results

    Parameter Estimate SE 95% CI β0,1 0.5541 0.0858 (0.3859, 0.7223) β1,1 0.2009 0.0201 (0.1616, 0.2402) β0,2 3.5549 0.0356 (3.4850, 3.6248) β1,2 -0.1245 0.0101 (-0.1444, -0.1047) β0,3 0.8304 0.0212 (0.7888, 0.8719) β1,3 -0.0577 0.0062 (-0.0699, -0.0456) γv 0.0462 0.0151 (0.0166, 0.0759) γbil 0.8181 0.2046 (0.4171, 1.2191) γalb -1.7060 0.6181 (-2.9173, -0.4946) γpro -2.2085 1.6070 (-5.3582, 0.9412) GL. Hickey Joint modelling of multivariate data 16 / 28
  20. Introduction Model Estimation Software & Example revisited Prediction References Results

    Effect of multivariate inference over univariate joint model: Parameter Model Estimate 95% CI γbil UV 1.2182 (0.9789, 1.6130) γbil MV 0.8181 (0.4171, 1.2191) γalb UV -3.0770 (-4.4865, -2.3466) γalb MV -1.7060 (-2.9173, -0.4946) γpro UV -7.2078 (-10.5410, -5.3917) γpro MV -2.2085 (-5.3582, 0.9412) UV = univariate joint model (fitted with joineR package); MV = multivariate joint model GL. Hickey Joint modelling of multivariate data 17 / 28
  21. Introduction Model Estimation Software & Example revisited Prediction References Dynamic

    prediction So far we have only discussed inference from joint models Dynamic prediction Can we predict failure probability at time u > t given longitudinal data observed up until time t Utility of multivariate data Do we gain from using multiple longitudinal outcomes? GL. Hickey Joint modelling of multivariate data 18 / 28
  22. Introduction Model Estimation Software & Example revisited Prediction References Dynamic

    prediction For a new subject i = n + 1 with available longitudinal measurements yn+1{t} up to time t, we want to calculate P[T∗ n+1 ≥ u | T∗ n+1 > t, yn+1{t}; θ] = E Sn+1 (u | bn+1; θ) Sn+1 (t | bn+1; θ) , where the expectation is taken with respect to the distribution p(bn+1 | T∗ n+1 > t, yn+1{t}; θ), and Sn+1 (u | bn+1; θ) = exp − u 0 λ0(s) exp{vn+1 (s)γv + Wn+1(s)}ds GL. Hickey Joint modelling of multivariate data 19 / 28
  23. Introduction Model Estimation Software & Example revisited Prediction References Dynamic

    prediction Rizopoulos (2011) proposed two estimators for this: 1. First-order approximation P[T∗ n+1 ≥ u | T∗ n+1 > t, yn+1{t}; θ] ≈ Sn+1 u | ˆ bn+1 ; ˆ θ Sn+1 t | ˆ bn+1 ; ˆ θ , where ˆ bn+1 is the mode of p(bn+1 | T∗ n+1 > t, yn+1{t}; ˆ θ) 2. Simulated scheme 1 Draw θ(l) ∼ N(ˆ θ, V (ˆ θ)) 2 Draw b(l) n+1 ∼ p(bn+1 | T∗ n+1 > t, yn+1{t}; θ(l)) [Metropolis-Hastings] 3 Calculate Sn+1 u | b(l) n+1 ;θ(l) Sn+1 t | b(l) n+1 ;θ(l) 4 Repeat Steps 1–3 l = 2, . . . , L times GL. Hickey Joint modelling of multivariate data 20 / 28
  24. Introduction Model Estimation Software & Example revisited Prediction References Example

    code # New patient nd <- subset(placebo, id == "11") # patient 11 nd <- nd[1:4, ] # take first four follow-up times # First-order prediction (default) pred1 <- dynSurv(fit.pbc, nd) pred1 plot(pred1) # Simulated prediction pred2 <- dynSurv(fit.pbc, nd, type = "simulated", scale = 1.6) pred2 plot(pred2) GL. Hickey Joint modelling of multivariate data 21 / 28
  25. Introduction Model Estimation Software & Example revisited Prediction References Example

    code First order approximation 0.2 0.3 0.4 0.5 0.6 0.7 0.8 data.t$tk[[k]] log, serBilir 3.7 3.8 3.9 4.0 4.1 data.t$tk[[k]] albumin 0.45 0.50 0.55 0.60 0.65 data.t$tk[[k]] ^, (0.1 * prothrombin), −4 0 2 4 6 8 10 12 14 xpts ypts 0.0 0.2 0.4 0.6 0.8 1.0 Time Event−free probability GL. Hickey Joint modelling of multivariate data 22 / 28
  26. Introduction Model Estimation Software & Example revisited Prediction References Example

    code Simulated scheme 0.2 0.3 0.4 0.5 0.6 0.7 0.8 data.t$tk[[k]] log, serBilir 3.7 3.8 3.9 4.0 4.1 data.t$tk[[k]] albumin 0.45 0.50 0.55 0.60 0.65 data.t$tk[[k]] ^, (0.1 * prothrombin), −4 0 2 4 6 8 10 12 14 xpts ypts 0.0 0.2 0.4 0.6 0.8 1.0 Time Event−free probability GL. Hickey Joint modelling of multivariate data 22 / 28
  27. Introduction Model Estimation Software & Example revisited Prediction References Example

    code Simulated scheme + constrained B-spline smoother 0.2 0.3 0.4 0.5 0.6 0.7 0.8 data.t$tk[[k]] log, serBilir 3.7 3.8 3.9 4.0 4.1 data.t$tk[[k]] albumin 0.45 0.50 0.55 0.60 0.65 data.t$tk[[k]] ^, (0.1 * prothrombin), −4 0 2 4 6 8 10 12 14 xpts ypts 0.0 0.2 0.4 0.6 0.8 1.0 Time Event−free probability GL. Hickey Joint modelling of multivariate data 22 / 28
  28. Introduction Model Estimation Software & Example revisited Prediction References Truly

    dynamic prediction GL. Hickey Joint modelling of multivariate data 23 / 28 0.20 0.25 0.30 0.35 0.40 0.45 data.t$tk[[k]] log, serBilir 2.5 3.0 3.5 4.0 4.5 5.0 5.5 data.t$tk[[k]] albumin 0.3 0.4 0.5 0.6 data.t$tk[[k]] ^, (0.1 * prothrombin), −4 0 0 2 4 6 8 10 12 14 xpts ypts 0.0 0.2 0.4 0.6 0.8 1.0 Time Event−free probability
  29. Introduction Model Estimation Software & Example revisited Prediction References Other

    software for dynamic prediction R (JM): univariate joint models (spline model for λ0(t)) 0 2 4 6 8 10 12 14 0.0 0.2 0.4 0.6 0.8 1.0 Univariate joint model {B} Time Pr(Ti ≥ u | Ti > t, y ~ i (t)) R package JM joineRML 0 2 4 6 8 10 12 14 0.0 0.2 0.4 0.6 0.8 1.0 Univariate joint model {A} Time Pr(Ti ≥ u | Ti > t, y ~ i (t)) R package JM joineRML 0 2 4 6 8 10 12 14 0.0 0.2 0.4 0.6 0.8 1.0 Univariate joint model {P} Time Pr(Ti ≥ u | Ti > t, y ~ i (t)) R package JM joineRML R (JMbayes): uni- and multi-variate joint models Stata (stjm): univariate joint models GL. Hickey Joint modelling of multivariate data 24 / 28
  30. Introduction Model Estimation Software & Example revisited Prediction References Value

    of multiple biomarkers Build all 32 − 1 = 7 joint models: i.e. from trivariate ({B, A, P}) to univariate ({B}, {A}, {P}) For all patients still in the risk set at landmark time (t), predict their failure probability at horizon time (s), i.e. P[t < T∗ i ≤ s + t | T∗ i > t, yi {t}; θ] Calculate concordance statistic1 (Therneau and Watson 2015) between prediction and whether subject had event in (t, s + t] 1Using the survConcordance() function in the R survival package GL. Hickey Joint modelling of multivariate data 25 / 28
  31. Introduction Model Estimation Software & Example revisited Prediction References Ensemble

    of joint models (patient #11) GL. Hickey Joint modelling of multivariate data 26 / 28 0 2 4 6 8 10 12 14 0.0 0.2 0.4 0.6 0.8 1.0 Visit 1 (t = 0) Time (years) Predicted survival {B, A, P} {B, A} {B, P} {A, P} {B} {A} {P}
  32. Introduction Model Estimation Software & Example revisited Prediction References Discrimination

    (concordance) q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0 2 4 1 2 4 8 Landmark time (t; years) Horizon time (s; years) 0.6 0.7 0.8 0.9 1.0 0.6 0.7 0.8 0.9 1.0 0.6 0.7 0.8 0.9 1.0 0.6 0.7 0.8 0.9 1.0 Predicted survival Model q q q q q q q {B, A, P} {B, A} {B, P} {A, P} {B} {A} {P} GL. Hickey Joint modelling of multivariate data 27 / 28
  33. Introduction Model Estimation Software & Example revisited Prediction References References

    Questions? Murtaugh, Paul A et al. (1994). Primary biliary cirrhosis: prediction of short-term survival based on repeated patient visits. Hepatology 20(1), pp. 126–134. Henderson, R et al. (2000). Joint modelling of longitudinal measurements and event time data. Biostatistics 1(4), pp. 465–480. Rizopoulos, Dimitris (2011). Dynamic predictions and prospective accuracy in joint models for longitudinal and time-to-event data. Biometrics 67(3), pp. 819–829. Therneau, Terry M and Watson, David A (2015). The concordance statistic and the Cox model. Tech. rep. Rochester, Minnesota: Department of Health Science Research, Mayo Clinic, Technical Report Series No. 85. GL. Hickey Joint modelling of multivariate data 28 / 28