Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Vanda Carvalho

Vanda Carvalho

SAM Conference 2017

July 03, 2017
Tweet

More Decks by SAM Conference 2017

Other Decks in Research

Transcript

  1. Nonparametric Bayesian Covariate-Adjusted Estimation of the Youden Index Vanda Inácio

    de Carvalho [email protected] University of Edinburgh July 3, 2017 Vanda Inácio (UoE) BNP Youden Index Regression July 3, 2017 1 / 21
  2. What is a diagnostic test and why is it important?

    General concepts → Before a marker is routinely used in practice its ability to distinguish between diseased and non-diseased states must be rigorously assessed. → We assume the existence of a gold standard, that is, a marker/procedure/test that perfectly classifies the individuals as diseased and non-diseased. → Compared to the truth one wants to know how well the marker being evaluated performs. Vanda Inácio (UoE) BNP Youden Index Regression July 3, 2017 2 / 21
  3. What is a diagnostic test and why is it important?

    General concepts → Let Y0 and Y1 be two independent continuous random variables denoting the marker outcomes in the non-diseased and diseased populations, with CDF F0 and F1, respectively. → Further, let c be a cutoff for defining a positive marker result. → Without loss of generality, we proceed with the assumption that a subject is classified as diseased when the marker outcome is greater or equal than c and as non-diseased when it is below c. → Then, for each cutoff value c, the accuracy of the marker can be summarized by its sensitivity (Se) and specificity (Sp) Se(c) = Pr(Y1 ≥ c) = 1 − F1(c), Sp(c) = Pr(Y0 < c) = F0(c). → Obviously, for each cutoff value c we obtain a different sensitivity and specificity. Vanda Inácio (UoE) BNP Youden Index Regression July 3, 2017 3 / 21
  4. Why are biomarkers important? General concepts −4 −2 0 2

    4 0.0 0.1 0.2 0.3 0.4 Marker outcome Density c Se −4 −2 0 2 4 0.0 0.1 0.2 0.3 0.4 Marker outcome Density c Sp −4 −2 0 2 4 0.0 0.1 0.2 0.3 0.4 Marker outcome Density c Se −4 −2 0 2 4 0.0 0.1 0.2 0.3 0.4 Marker outcome Density c Sp Vanda Inácio (UoE) BNP Youden Index Regression July 3, 2017 4 / 21
  5. What is a diagnostic test and why is it important?

    Youden index → A commonly used global summary measure of diagnostic accuracy is the Youden index (Youden, 1950), which is defined as YI = max c∈R {Se(c) + Sp(c) − 1} = max c∈R {F0(c) − F1(c)}. → The YI ranges from 0 to 1: → YI = 0 corresponds to complete overlap of the data distributions for the D = 0 and D = 1 populations (i.e., F0(c) = F1(c), for all c). → YI = 1 when the data distributions are completely separated. → YI between 0 and 1 corresponds to different levels of stochastic ordering between F0 and F1. Vanda Inácio (UoE) BNP Youden Index Regression July 3, 2017 5 / 21
  6. What is a diagnostic test and why is it important?

    Youden index → In addition to providing a global measure of marker accuracy, YI provides a criterion to select an optimal threshold to screen subjects in clinical practice. → The criterion is to choose the cutoff value for which sensitivity plus specificity is maximized, i.e., c∗ = arg max c∈R {F0(c) − F1(c)}. → The Youden index criterion to select the optimal threshold c∗ is thus based on maximizing the correct classification rate. Vanda Inácio (UoE) BNP Youden Index Regression July 3, 2017 6 / 21
  7. What is a diagnostic test and why is it important?

    Youden index −4 −2 0 2 4 0.0 0.1 0.2 0.3 0.4 Marker outcome Density −4 −2 0 2 4 0.0 0.1 0.2 0.3 0.4 Marker outcome Density −4 −2 0 2 4 6 0.0 0.1 0.2 0.3 0.4 Marker outcome Density −4 −2 0 2 4 6 8 0.0 0.1 0.2 0.3 0.4 Marker outcome Density −4 −2 0 2 4 0.0 0.2 0.4 0.6 0.8 1.0 Marker outcome CDF −4 −2 0 2 4 0.0 0.2 0.4 0.6 0.8 1.0 Marker outcome CDF −4 −2 0 2 4 6 0.0 0.2 0.4 0.6 0.8 1.0 Marker outcome CDF −4 −2 0 2 4 6 8 0.0 0.2 0.4 0.6 0.8 1.0 Marker outcome CDF YI=0 YI=0.29 YI=0.62 YI=1 Vanda Inácio (UoE) BNP Youden Index Regression July 3, 2017 7 / 21
  8. What is a diagnostic test and why is it important?

    Covariate information → Moreover, recently the interest on the subject has moved beyond determining the basic accuracy of a marker. → It has been recognized that the discriminatory ability of a marker is often affected by patient-specific characteristics, such as age or gender. → In this setting, sensitivity and specificity depend on a covariate vector x so that Se(c | x) = Pr(Y1 > c | x) = 1 − F1(c | x) and Sp(c | x) = F0(c | x). → We thus define the covariate-dependent Youden index and the covariate-dependent optimal cutoff as YI(x) = max c∈R {F0(c | x) − F1(c | x)}, c∗(x) = arg max c∈R {F0(c | x) − F1(c | x)}. Vanda Inácio (UoE) BNP Youden Index Regression July 3, 2017 8 / 21
  9. Modelling approach DDP model → Our modelling approach is based

    on mixture models induced by a dependent Dirichlet process (DDP), which allows the entire distribution of the marker outcomes, in each population, to smoothly change as a function of covariates. → Let {(x01, Y01), . . . , (x0n0 , Y0n0 )} and {(x11, Y11), . . . , (x1n1 , Y1n1 )} be regression data for the non-diseased and diseased groups. → It is assumed that given the covariates, the marker outcomes in each population are independent and that Y0i | x0i ind ∼ F0(· | x0i ), i = 1, . . . , n0, Y1j | x1j ind ∼ F1(· | x1j ), j = 1, . . . , n1. Vanda Inácio (UoE) BNP Youden Index Regression July 3, 2017 9 / 21
  10. Modelling approach DDP model → In a linear-dependent Dirichlet process,

    the conditional cdf is specified as Fh(c | x) = Φ(c | x β, σ2)dGh(β, σ2), Gh ∼ DP(αh, G∗ h ), h ∈ {0, 1}. → We take G∗ h (β, σ2) ≡ Nq(β | mβh , Sβh )Gamma(σ−2 | ah, bh). → Additionally, we consider mβh ∼ Nq(mh0, Sh0), S−1 βh ∼ Wishartq(νh, (νhψh)−1). Vanda Inácio (UoE) BNP Youden Index Regression July 3, 2017 10 / 21
  11. Modelling approach DDP model → Under the stick-breaking representation of

    Sethuraman (1994), the conditional cdf can be written as Fh(c | x) = ∞ k=1 ωhk Φ(c | x βhk , σ2 hk ). → The weights follow the so-called stick-breaking construction ωhk = vhk l<k (1 − vhl ), vhk ∼ Beta(1, αh), k = 1, . . . , ∞. → This representation characterises the conditional cdf using an infinite mixture of Gaussian linear models. → The model can also accommodate nonlinearities through the inclusion of B-splines. → Posterior inference is conducted through the blocked Gibbs sampler of Ishwaran and James (2001). Vanda Inácio (UoE) BNP Youden Index Regression July 3, 2017 11 / 21
  12. Diabetes data Data description → Our motivating dataset comes from

    a population-based survey in Cairo, Egypt (Smith and Thompson, 1996). → In this study, postprandial glucose measurements were obtained from a fingerstick on 286 subjects. → Accordingly to the gold standard, 88 subjects were classified as diabetic and 198 subjects were classified as non-diabetic. → Age is believed to play a key role, especially in the non-diabetic group, whose subjects tend to have higher glucose levels. → Our aim is two-fold: → Assess the accuracy of the glucose levels as a marker for diabetes and determine the optimal glucose level to screen subjects in practice. → Assess how the accuracy and optimal cutoff change as a function of age. Vanda Inácio (UoE) BNP Youden Index Regression July 3, 2017 12 / 21
  13. Diabetes data No age-adjusted analysis Non−diabetic group Glucose level Density

    100 200 300 400 500 0.000 0.010 0.020 0.030 Diabetic group Glucose level Density 100 200 300 400 500 0.000 0.010 0.020 0.030 100 200 300 400 500 0.0 0.2 0.4 0.6 0.8 1.0 Glucose level CDF c∗ = 127 (118, 142), YI = 0.66 (0.56, 0.75) Vanda Inácio (UoE) BNP Youden Index Regression July 3, 2017 13 / 21
  14. Diabetes data Age-adjusted analysis 100 200 300 400 500 0.0

    0.2 0.4 0.6 0.8 1.0 Age=36 Glucose level CDF 100 200 300 400 500 0.0 0.2 0.4 0.6 0.8 1.0 Age=52 Glucose level CDF 100 200 300 400 500 0.0 0.2 0.4 0.6 0.8 1.0 Age=68 Glucose level CDF c∗ = 122 (113, 136) c∗ = 127 (120, 138) c∗ = 143 (134, 154) YI = 0.760 (0.547, 0.920) YI = 0.683 (0.571, 0.783) YI = 0.644 (0.436, 0.824) Vanda Inácio (UoE) BNP Youden Index Regression July 3, 2017 14 / 21
  15. Diabetes data Age-adjusted analysis 40 50 60 70 100 120

    140 160 180 Age Optimal Cutoff 40 50 60 70 0.0 0.2 0.4 0.6 0.8 1.0 Age Youden Index Vanda Inácio (UoE) BNP Youden Index Regression July 3, 2017 15 / 21
  16. Diabetes data Age-adjusted analysis (top row: DDP model, bottom row:

    Normal model) 40 50 60 70 100 120 140 160 180 Age Optimal Cutoff 40 50 60 70 0.0 0.2 0.4 0.6 0.8 1.0 Age Youden Index 40 50 60 70 100 120 140 160 180 Age Optimal Cutoff 40 50 60 70 0.0 0.2 0.4 0.6 0.8 1.0 Age Youden Index Vanda Inácio (UoE) BNP Youden Index Regression July 3, 2017 16 / 21
  17. Diabetes data Age-adjusted analysis 40 50 60 70 100 120

    140 160 180 Age Optimal Cutoff BLDDP Normal 40 50 60 70 0.0 0.2 0.4 0.6 0.8 1.0 Age Youden Index BLDDP Normal Vanda Inácio (UoE) BNP Youden Index Regression July 3, 2017 17 / 21
  18. Diabetes data Age-adjusted analysis To compare the two Bayesian parameter

    models we work on the posterior predictive space and compute the log-pseudo marginal likelihood (LPML) statistics (Geisser and Eddy, 1979). The higher the LPML, the best the model fit. LPML0 LPML1 DDP −880 −529 Normal −934 −533 As can be observed, LPML statistic indicates a better model fit using the DDP model in both groups. Vanda Inácio (UoE) BNP Youden Index Regression July 3, 2017 18 / 21
  19. Conclusions → We developed a Bayesian nonparametric regression model to

    estimate the covariate-specific Youden index and the corresponding optimal cutoff. → The extreme flexibility of our model arises from using dependent Dirichlet process mixtures combined with B-splines regression. → Our investigation into the potential of glucose to serve as a biomarker of diabetes found that its classification accuracy decreases with age and the optimal cutoff to screen subjects in practice increases with age. → LPML criterion indicated a preference for our nonparametric model over a parametric linear regression, which gave much higher estimates of the optimal cutoff across age compared to the nonparametric estimates. → Our simulation study illustrated the ability of the model to dynamically respond to complex data distributions in a variety of scenarios, with little price to be paid in terms of decreased posterior precision for the extra generality of our nonparametric estimator when compared with parametric estimates (even when the parametric model holds). Vanda Inácio (UoE) BNP Youden Index Regression July 3, 2017 19 / 21
  20. Main reference This paper is based on: Inácio de Carvalho,

    V., de Carvalho, M., Branscum, A. J. Nonparametric Bayesian covariate-adjusted estimation of the Youden index Biometrics DOI: 10.1111/biom.12686 Vanda Inácio (UoE) BNP Youden Index Regression July 3, 2017 20 / 21