$30 off During Our Annual Pro Sale. View Details »

Dynamic survival prediction for multivariate joint models using the R package joineRML

Graeme Hickey
December 17, 2017

Dynamic survival prediction for multivariate joint models using the R package joineRML

Presented at the 10th International Conference of the ERCIM WG on Computational and Methodological Statistics (CMStatistics 2017), Senate House , University of London, UK, 16-18 December

Graeme Hickey

December 17, 2017
Tweet

More Decks by Graeme Hickey

Other Decks in Research

Transcript

  1. Introduction Model Estimation Software & Example revisited Prediction References
    Dynamic survival prediction for multivariate joint
    models using the R package joineRML
    Graeme L. Hickey1, Pete Philipson2, Andrea Jorgensen1,
    Ruwanthi Kolamunnage-Dona1
    1Department of Biostatistics, University of Liverpool, UK
    2Mathematics, Physics and Electrical Engineering, University of Northumbria, UK
    [email protected]
    17th December 2017
    GL. Hickey Joint modelling of multivariate data 1 / 28

    View Slide

  2. Introduction Model Estimation Software & Example revisited Prediction References
    Preamble
    Funding
    Grant MR/M013227/1
    Slides
    Available from www.glhickey.com
    GL. Hickey Joint modelling of multivariate data 2 / 28

    View Slide

  3. Introduction Model Estimation Software & Example revisited Prediction References
    Joint modelling
    Covariate data
    Longitudinal
    outcomes data
    Random
    effects

    GL. Hickey Joint modelling of multivariate data 3 / 28

    View Slide

  4. Introduction Model Estimation Software & Example revisited Prediction References
    Joint modelling
    Covariate data
    Time-to-event
    outcomes data
    Frailties

    GL. Hickey Joint modelling of multivariate data 3 / 28

    View Slide

  5. Introduction Model Estimation Software & Example revisited Prediction References
    Joint modelling
    Covariate data
    Time-to-event
    outcomes data
    Frailties

    Longitudinal
    outcomes data
    Random
    effects

    GL. Hickey Joint modelling of multivariate data 3 / 28

    View Slide

  6. Introduction Model Estimation Software & Example revisited Prediction References
    Joint modelling
    Covariate data
    Time-to-event
    outcomes data
    Frailties

    Longitudinal
    outcomes data
    Random
    effects


    GL. Hickey Joint modelling of multivariate data 3 / 28

    View Slide

  7. Introduction Model Estimation Software & Example revisited Prediction References
    Joint modelling
    Typically used for inference, but growing interest in using for
    dynamic prediction
    Typically used with a single longitudinal outcome, but growing
    interest in using multiple (or multivariate) longitudinal outcomes
    We will describe a recent –package that resolves these two issues
    GL. Hickey Joint modelling of multivariate data 4 / 28

    View Slide

  8. Introduction Model Estimation Software & Example revisited Prediction References
    Clinical example
    Consider a subset of 154 patients with primary biliary cirrhosis
    (PBC) randomized to placebo treatment from Mayo Clinic trial
    (Murtaugh et al. 1994)
    Multiple biomarkers repeatedly measured at intermittent times,
    of which we consider 3 clinically relevant ones:
    1 serum bilirubin (mg/dl)
    2 serum albumin (mg/dl)
    3 prothrombin time (seconds)
    GL. Hickey Joint modelling of multivariate data 5 / 28

    View Slide

  9. Introduction Model Estimation Software & Example revisited Prediction References
    Prothrombin time (0.1 x seconds)−4
    Albumin (mg dL)
    Serum bilirubin (loge
    mg dL)
    0 5 10 0 5 10 0 5 10
    0.0
    0.5
    1.0
    1.5
    2
    4
    6
    8
    −2
    0
    2
    4
    Time from registration (years)
    Alive Dead
    The natural disease progression of untreated patients with PBC
    enrolled into the trial
    GL. Hickey Joint modelling of multivariate data 6 / 28

    View Slide

  10. Introduction Model Estimation Software & Example revisited Prediction References
    0 2 4 6 8 10 12 14
    0.0
    0.2
    0.4
    0.6
    0.8
    1.0
    Time from registration (years)
    Survival probability
    nevents = 69, 44.8%
    GL. Hickey Joint modelling of multivariate data 7 / 28

    View Slide

  11. Introduction Model Estimation Software & Example revisited Prediction References
    Data
    For each subject i = 1, . . . , n, we observe
    yi = (yi1
    , . . . , yiK
    ) is a K-variate continuous outcome vector,
    where each yik denotes an (nik × 1)-vector of observed
    longitudinal measurements for the k-th outcome type:
    yik = (yi1k, . . . , yinik k)
    Observation times tijk for j = 1, . . . , nik, which can differ
    between subjects and outcomes
    (Ti , δi ), where Ti = min(T∗
    i
    , Ci ), where T∗
    i
    is the true event
    time, Ci corresponds to a potential right-censoring time, and δi
    is the failure indicator equal to 1 if the failure is observed
    (T∗
    i
    ≤ Ci ) and 0 otherwise
    GL. Hickey Joint modelling of multivariate data 8 / 28

    View Slide

  12. Introduction Model Estimation Software & Example revisited Prediction References
    Longitudinal sub-model
    We extend the univariate joint model from Henderson et al. (2000):
    yik(t) = xik
    (t)βk + zik
    (t)bik + εik(t), with k = 1, . . . , K
    where
    xik(t) is a pk-vector of (possibly) time-varying covariates with
    corresponding fixed effect terms βk
    bik ∼ N(0, Dkk) subject-and-marker specific random effects
    cov(bik, bil ) = Dkl for k = l
    εik(t) is the model error term, which is i.i.d. N(0, σ2
    k
    ) and
    independent of bik
    GL. Hickey Joint modelling of multivariate data 9 / 28

    View Slide

  13. Introduction Model Estimation Software & Example revisited Prediction References
    Time-to-event sub-model
    λi (t) = lim
    dt→0
    P(t ≤ Ti < t + dt | Ti ≥ t)
    dt
    = λ0(t) exp vi
    (t)γv + Wi (t) ,
    where
    λ0(·) is an unspecified baseline hazard function
    vi (t) is a q-vector of (possibly) time-varying covariates with
    corresponding fixed effect terms γv
    Wi (t) = K
    k=1
    γyk{zik
    (t)bik} independent of the censoring
    process
    GL. Hickey Joint modelling of multivariate data 10 / 28

    View Slide

  14. Introduction Model Estimation Software & Example revisited Prediction References
    Likelihood
    The observed data likelihood is given by
    n
    i=1

    −∞
    f (yi | bi , θ)f (Ti , δi | bi , θ)f (bi | θ)dbi
    where
    θ = (β , vech(D), σ2
    1
    , . . . , σ2
    K
    , λ0(t), γv
    , γy
    ),
    β = (β1
    , . . . , βK
    ),
    bi = (bi1
    , . . . , biK
    ) ∼ N(0, D)
    GL. Hickey Joint modelling of multivariate data 11 / 28

    View Slide

  15. Introduction Model Estimation Software & Example revisited Prediction References
    Estimation
    Estimation of θ done using Monte Carlo Expectation-Maximisation
    (MCEM) algorithm (treating the random effects as missing data)
    M-step: closed-form updates for all parameters except
    γ = (γv
    , γy
    ), so use a one-step Newton-Raphson iteration
    E-step: requires calculating several multidimensional integrals of
    form E h(bi ) | Ti , δi , yi ; ˆ
    θ , which can be evaluated by:
    1 Monte Carlo integration
    2 antithetic variates for variance reduction
    3 quasi-Monte Carlo (low-discrepancy deterministic sequences, e.g
    Sobol sequence)
    Dynamic update rule for MC size and convergence criterion
    SE estimation by bootstrap or empirical information matrix
    approximation
    GL. Hickey Joint modelling of multivariate data 12 / 28

    View Slide

  16. Introduction Model Estimation Software & Example revisited Prediction References
    joineRML
    +
    getVarCov()
    vcov()
    fixef()
    ranef()*
    AIC()
    BIC()
    confint()
    formula()
    sampleData()
    dynSurv()*
    dynLong()*
    print()
    summary()
    plot()
    sigma()
    coef()
    update()
    baseHaz()
    residuals()
    fitted()
    logLik()
    bootSE()
    Rich collection of associated methods
    * associated with additional plot methods
    mjoint()
    Version 0.4.0 available on CRAN
    https://cran.r-project.org/web/packages/joineRML/
    Developmental version available on
    GitHub
    https://github.com/graemeleehickey/joineRML
    Parallel
    Computing
    +
    GL. Hickey Joint modelling of multivariate data 13 / 28

    View Slide

  17. Introduction Model Estimation Software & Example revisited Prediction References
    Proposed model for PBC data
    Longitudinal sub-model
    log(serBilir) = (β0,1
    + b0i,1
    ) + (β1,1
    + b1i,1
    )year + εij1,
    albumin = (β0,2
    + b0i,2
    ) + (β1,2
    + b1i,2
    )year + εij2,
    (0.1 × prothrombin)−4 = (β0,3
    + b0i,3
    ) + (β1,3
    + b1i,3
    )year + εij3,
    bi ∼ N6
    (0, D), and εijk
    ∼ N(0, σ2
    k
    ) for k = 1, 2, 3;
    Time-to-event sub-model
    λi
    (t) = λ0
    (t) exp {γv age + W2i
    (t)} ,
    W2i
    (t) = γbil
    (b0i,1
    + b1i,1
    t) + γalb
    (b0i,2
    + b1i,2
    t) + γpro
    (b0i,3
    + b1i,3
    t).
    GL. Hickey Joint modelling of multivariate data 14 / 28

    View Slide

  18. Introduction Model Estimation Software & Example revisited Prediction References
    Example code
    data(pbc2)
    placebo <- subset(pbc2, drug == "placebo")
    fit.pbc <- mjoint(
    formLongFixed = list(
    "bil" = log(serBilir) ˜ year,
    "alb" = albumin ˜ year,
    "pro" = (0.1 * prothrombin)ˆ-4 ˜ year),
    formLongRandom = list(
    "bil" = ˜ year | id,
    "alb" = ˜ year | id,
    "pro" = ˜ year | id),
    formSurv = Surv(years, status2) ˜ age,
    data = placebo,
    timeVar = "year",
    control = list(tol0 = 0.001, burin = 400))
    GL. Hickey Joint modelling of multivariate data 15 / 28

    View Slide

  19. Introduction Model Estimation Software & Example revisited Prediction References
    Results
    Parameter Estimate SE 95% CI
    β0,1 0.5541 0.0858 (0.3859, 0.7223)
    β1,1 0.2009 0.0201 (0.1616, 0.2402)
    β0,2 3.5549 0.0356 (3.4850, 3.6248)
    β1,2 -0.1245 0.0101 (-0.1444, -0.1047)
    β0,3 0.8304 0.0212 (0.7888, 0.8719)
    β1,3 -0.0577 0.0062 (-0.0699, -0.0456)
    γv 0.0462 0.0151 (0.0166, 0.0759)
    γbil
    0.8181 0.2046 (0.4171, 1.2191)
    γalb
    -1.7060 0.6181 (-2.9173, -0.4946)
    γpro
    -2.2085 1.6070 (-5.3582, 0.9412)
    GL. Hickey Joint modelling of multivariate data 16 / 28

    View Slide

  20. Introduction Model Estimation Software & Example revisited Prediction References
    Results
    Effect of multivariate inference over univariate joint model:
    Parameter Model Estimate 95% CI
    γbil
    UV 1.2182 (0.9789, 1.6130)
    γbil
    MV 0.8181 (0.4171, 1.2191)
    γalb
    UV -3.0770 (-4.4865, -2.3466)
    γalb
    MV -1.7060 (-2.9173, -0.4946)
    γpro
    UV -7.2078 (-10.5410, -5.3917)
    γpro
    MV -2.2085 (-5.3582, 0.9412)
    UV = univariate joint model (fitted with joineR package); MV =
    multivariate joint model
    GL. Hickey Joint modelling of multivariate data 17 / 28

    View Slide

  21. Introduction Model Estimation Software & Example revisited Prediction References
    Dynamic prediction
    So far we have only discussed inference from joint models
    Dynamic prediction
    Can we predict failure probability at time u > t given longitudinal
    data observed up until time t
    Utility of multivariate data
    Do we gain from using multiple longitudinal outcomes?
    GL. Hickey Joint modelling of multivariate data 18 / 28

    View Slide

  22. Introduction Model Estimation Software & Example revisited Prediction References
    Dynamic prediction
    For a new subject i = n + 1 with available longitudinal measurements
    yn+1{t} up to time t, we want to calculate
    P[T∗
    n+1
    ≥ u | T∗
    n+1
    > t, yn+1{t}; θ] = E
    Sn+1 (u | bn+1; θ)
    Sn+1 (t | bn+1; θ)
    ,
    where the expectation is taken with respect to the distribution
    p(bn+1 | T∗
    n+1
    > t, yn+1{t}; θ), and
    Sn+1 (u | bn+1; θ) = exp −
    u
    0
    λ0(s) exp{vn+1
    (s)γv + Wn+1(s)}ds
    GL. Hickey Joint modelling of multivariate data 19 / 28

    View Slide

  23. Introduction Model Estimation Software & Example revisited Prediction References
    Dynamic prediction
    Rizopoulos (2011) proposed two estimators for this:
    1. First-order approximation
    P[T∗
    n+1
    ≥ u | T∗
    n+1
    > t, yn+1{t}; θ] ≈
    Sn+1
    u | ˆ
    bn+1
    ; ˆ
    θ
    Sn+1
    t | ˆ
    bn+1
    ; ˆ
    θ
    ,
    where ˆ
    bn+1
    is the mode of p(bn+1 | T∗
    n+1
    > t, yn+1{t}; ˆ
    θ)
    2. Simulated scheme
    1 Draw θ(l) ∼ N(ˆ
    θ, V (ˆ
    θ))
    2 Draw b(l)
    n+1
    ∼ p(bn+1 | T∗
    n+1
    > t, yn+1{t}; θ(l)) [Metropolis-Hastings]
    3 Calculate
    Sn+1 u | b(l)
    n+1
    ;θ(l)
    Sn+1 t | b(l)
    n+1
    ;θ(l)
    4 Repeat Steps 1–3 l = 2, . . . , L times
    GL. Hickey Joint modelling of multivariate data 20 / 28

    View Slide

  24. Introduction Model Estimation Software & Example revisited Prediction References
    Example code
    # New patient
    nd <- subset(placebo, id == "11") # patient 11
    nd <- nd[1:4, ] # take first four follow-up times
    # First-order prediction (default)
    pred1 <- dynSurv(fit.pbc, nd)
    pred1
    plot(pred1)
    # Simulated prediction
    pred2 <- dynSurv(fit.pbc, nd, type = "simulated", scale = 1.6)
    pred2
    plot(pred2)
    GL. Hickey Joint modelling of multivariate data 21 / 28

    View Slide

  25. Introduction Model Estimation Software & Example revisited Prediction References
    Example code
    First order approximation
    0.2
    0.3
    0.4
    0.5
    0.6
    0.7
    0.8
    data.t$tk[[k]]
    log, serBilir
    3.7
    3.8
    3.9
    4.0
    4.1
    data.t$tk[[k]]
    albumin
    0.45
    0.50
    0.55
    0.60
    0.65
    data.t$tk[[k]]
    ^, (0.1 * prothrombin), −4
    0 2 4 6 8 10 12 14
    xpts
    ypts
    0.0
    0.2
    0.4
    0.6
    0.8
    1.0
    Time
    Event−free probability
    GL. Hickey Joint modelling of multivariate data 22 / 28

    View Slide

  26. Introduction Model Estimation Software & Example revisited Prediction References
    Example code
    Simulated scheme
    0.2
    0.3
    0.4
    0.5
    0.6
    0.7
    0.8
    data.t$tk[[k]]
    log, serBilir
    3.7
    3.8
    3.9
    4.0
    4.1
    data.t$tk[[k]]
    albumin
    0.45
    0.50
    0.55
    0.60
    0.65
    data.t$tk[[k]]
    ^, (0.1 * prothrombin), −4
    0 2 4 6 8 10 12 14
    xpts
    ypts
    0.0
    0.2
    0.4
    0.6
    0.8
    1.0
    Time
    Event−free probability
    GL. Hickey Joint modelling of multivariate data 22 / 28

    View Slide

  27. Introduction Model Estimation Software & Example revisited Prediction References
    Example code
    Simulated scheme + constrained B-spline smoother
    0.2
    0.3
    0.4
    0.5
    0.6
    0.7
    0.8
    data.t$tk[[k]]
    log, serBilir
    3.7
    3.8
    3.9
    4.0
    4.1
    data.t$tk[[k]]
    albumin
    0.45
    0.50
    0.55
    0.60
    0.65
    data.t$tk[[k]]
    ^, (0.1 * prothrombin), −4
    0 2 4 6 8 10 12 14
    xpts
    ypts
    0.0
    0.2
    0.4
    0.6
    0.8
    1.0
    Time
    Event−free probability
    GL. Hickey Joint modelling of multivariate data 22 / 28

    View Slide

  28. Introduction Model Estimation Software & Example revisited Prediction References
    Truly dynamic prediction
    GL. Hickey Joint modelling of multivariate data 23 / 28
    0.20
    0.25
    0.30
    0.35
    0.40
    0.45
    data.t$tk[[k]]
    log, serBilir
    2.5
    3.0
    3.5
    4.0
    4.5
    5.0
    5.5
    data.t$tk[[k]]
    albumin
    0.3
    0.4
    0.5
    0.6
    data.t$tk[[k]]
    ^, (0.1 * prothrombin), −4
    0 0 2 4 6 8 10 12 14
    xpts
    ypts
    0.0
    0.2
    0.4
    0.6
    0.8
    1.0
    Time
    Event−free probability

    View Slide

  29. Introduction Model Estimation Software & Example revisited Prediction References
    Other software for dynamic prediction
    R (JM): univariate joint models (spline model for λ0(t))
    0 2 4 6 8 10 12 14
    0.0 0.2 0.4 0.6 0.8 1.0
    Univariate joint model {B}
    Time
    Pr(Ti
    ≥ u | Ti
    > t, y
    ~
    i
    (t))
    R package
    JM
    joineRML
    0 2 4 6 8 10 12 14
    0.0 0.2 0.4 0.6 0.8 1.0
    Univariate joint model {A}
    Time
    Pr(Ti
    ≥ u | Ti
    > t, y
    ~
    i
    (t))
    R package
    JM
    joineRML
    0 2 4 6 8 10 12 14
    0.0 0.2 0.4 0.6 0.8 1.0
    Univariate joint model {P}
    Time
    Pr(Ti
    ≥ u | Ti
    > t, y
    ~
    i
    (t))
    R package
    JM
    joineRML
    R (JMbayes): uni- and multi-variate joint models
    Stata (stjm): univariate joint models
    GL. Hickey Joint modelling of multivariate data 24 / 28

    View Slide

  30. Introduction Model Estimation Software & Example revisited Prediction References
    Value of multiple biomarkers
    Build all 32 − 1 = 7 joint models: i.e. from trivariate ({B, A, P})
    to univariate ({B}, {A}, {P})
    For all patients still in the risk set at landmark time (t), predict
    their failure probability at horizon time (s), i.e.
    P[t < T∗
    i
    ≤ s + t | T∗
    i
    > t, yi {t}; θ]
    Calculate concordance statistic1 (Therneau and Watson 2015)
    between prediction and whether subject had event in (t, s + t]
    1Using the survConcordance() function in the R survival package
    GL. Hickey Joint modelling of multivariate data 25 / 28

    View Slide

  31. Introduction Model Estimation Software & Example revisited Prediction References
    Ensemble of joint models (patient #11)
    GL. Hickey Joint modelling of multivariate data 26 / 28
    0 2 4 6 8 10 12 14
    0.0
    0.2
    0.4
    0.6
    0.8
    1.0
    Visit 1 (t = 0)
    Time (years)
    Predicted survival
    {B, A, P}
    {B, A}
    {B, P}
    {A, P}
    {B}
    {A}
    {P}

    View Slide

  32. Introduction Model Estimation Software & Example revisited Prediction References
    Discrimination (concordance)
    q q q q
    q
    q
    q
    q q q q
    q
    q
    q
    q q q
    q
    q
    q
    q
    q q q
    q
    q
    q q
    q q q
    q
    q
    q q
    q q q
    q
    q
    q q
    q q q
    q
    q
    q
    q
    q q q
    q
    q
    q q
    q q q
    q
    q
    q q
    q q q
    q
    q
    q
    q
    q q q
    q
    q
    q q
    q q q
    q
    q
    q
    q
    0 2 4
    1 2 4 8
    Landmark time (t; years)
    Horizon time (s; years)
    0.6
    0.7
    0.8
    0.9
    1.0
    0.6
    0.7
    0.8
    0.9
    1.0
    0.6
    0.7
    0.8
    0.9
    1.0
    0.6
    0.7
    0.8
    0.9
    1.0
    Predicted survival
    Model
    q
    q
    q
    q
    q
    q
    q
    {B, A, P}
    {B, A}
    {B, P}
    {A, P}
    {B}
    {A}
    {P}
    GL. Hickey Joint modelling of multivariate data 27 / 28

    View Slide

  33. Introduction Model Estimation Software & Example revisited Prediction References
    References
    Questions?
    Murtaugh, Paul A et al. (1994). Primary biliary cirrhosis: prediction of
    short-term survival based on repeated patient visits. Hepatology
    20(1), pp. 126–134.
    Henderson, R et al. (2000). Joint modelling of longitudinal
    measurements and event time data. Biostatistics 1(4), pp. 465–480.
    Rizopoulos, Dimitris (2011). Dynamic predictions and prospective
    accuracy in joint models for longitudinal and time-to-event data.
    Biometrics 67(3), pp. 819–829.
    Therneau, Terry M and Watson, David A (2015). The concordance
    statistic and the Cox model. Tech. rep. Rochester, Minnesota:
    Department of Health Science Research, Mayo Clinic, Technical
    Report Series No. 85.
    GL. Hickey Joint modelling of multivariate data 28 / 28

    View Slide