Unlocking a national adult cardiac surgery audit registry with R

Unlocking a na+onal adult cardiac surgery audit registry with
GL Hickey1,2,3, SW Grant2,3 & B Bridgewater1,2,3 1Northwest Ins.tute of BioHealth Informa.cs, University of Manchester 2University Hospital of South Manchester 3Na.onal Ins.tute of Cardiovascular Outcomes Research, UCL The R User Conference 2013 University of Cas+lla-‐La Mancha, Albacete, Spain

BACKGROUND

Bristol Inquiry Contributory factors that led to the
failings included: 1.  Inadequate collec+on of data 2.  Inadequate monitoring of data

Na+onal Adult Cardiac Surgery Audit registry •  Up
to 166 clinical variables collected on each pa+ent: administra+ve, demographics, comorbidi+es, opera+ve factors, outcomes •  15 years of data •  465,000 records •  44 hospitals + >400 consultant surgeons

Flow of data NICOR NIBHI HOSPITALS
DATABASE CLEANING ANALYSES The Society for Cardiothoracic Surgery in Great Britain & Ireland Sixth National Adult Cardiac Surgical Database Report 2008 Demonstrating quality Prepared by Ben Bridgewater PhD FRCS Bruce Keogh KBE DSc MD FRCS FRCP on behalf of the Society for Cardiothoracic Surgery in Great Britain & Ireland Robin Kinsman BSc PhD Peter Walton MA MB BChir MBA Dendrite Clinical Systems Cardiac Surgery AUDIT & GOVERNANCE TOOLS CLINICAL RESEARCHERS NATIONAL DEATH REGISTER* * Ability to link with many other na+onal registries RESEARCH

UNLOCKING THE REGISTRY MESSY DATA

Cleaning the registry in DATA EXTRACT
VARIABLE 1 VARIABLE 2 VARIABLE 3 ………… EXCLUDE RECORDS ADD VALUE CLEANED DATA Scripts to add: •  Risk scores •  Combined variables •  ‘Resolve’ conﬂic+ng variables •  Script per each variable •  Some dependencies E.g. duplicates Rapidly reproducible

> with(SCTS, table(X4.04.Discharge.Destination, X4.05.Status.at.Discharge))! X4.05.Status.at.Discharge! X4.04.Discharge.Destination 0. Alive 1. Dead!
828 48296 2453! . Another dept within the trust 0 57 0! 0 1 1 0! 0. Not applicable - patient deceased 0 0 1! 1 Home 0 4104 0! 1. Home 674 370763 374! 2 Convalescence 0 63 0! 2. Convalescence 8 7347 4! 2. Convalescence (Non acute Hospital) 2 2164 0! 3 Other hospital 0 1 0! 3 Other Hospital 0 151 0! 3 Other Hospital - wd 6 0 1 0! 3 Other Hospital wd 2 0 1 0! 3 Other ward 0 1 0! 3. Other Acute hospital 1 7680 1! 3. Other hospital 115 22935 37! 4 Patient deceased 0 0 173! 4. Not applicable - patient deceased 51 412 13286! 4. Patient Deceased 0 0 19! 5 0 7 0! 5. Transferred to different Consultant - NGH 0 42 0! 7 0 2 0! 8 0 38 4! 9 114 3820 518! Second op 0 2 6! Illegal op+ons Transcrip+onal discrepancies Missing data Conﬂicts

•  Errors are diﬃcult to ﬁnd and not all can
be resolved •  Excluding all imperfect data not an op+on •  Balance between a ‘research ready’ dataset and robust audit capability •  Needs to be reproducible •  It is locked to clinicians & researchers without being cleaned Cleaning the registry in

Warning: cleaning clinical registries without experts is dangerous*
* Applies to analysing healthcare data also + = DATA

UNLOCKING THE REGISTRY MONITORING

• • • • Crude 0% 5% 10% 15% 0
200 400 600 800 0 200 Number of procedures Mortality rate • • • • • • Crude RAMR 200 400 600 800 0 200 400 600 800 Number of procedures Healthcare provider • • • 2386780 2503756 3166114 3207776 3226274 3286898 3451180 3631845 4445638 4473204 4683551 Publica+on of named healthcare provider outcomes hgp://www.scts.org/pa+ents/

Publica+on of named healthcare provider outcomes FILTER DATA
subset! RISK ADJUSTMENT glm, glmer {lme4}, mfp {mfp}, predict, auc {pROC}, ! CLASSIFICATION & PRESENTATION ggplot {ggplot2}, write.csv! AGGREGATION summaryBy {doBy}, merge, arrange {plyr}!

Exploratory analyses hgp://www.scts.org/DynamicCharts/ summaryBy {doBy} + gvisMotionChart {googleVis}!

Monitoring medical devices •  Currently does not happen
in UK •  Data: 200 valve types entered 13,000 ways (free text) •  But R is good with regular expressions

UNLOCKING THE REGISTRY RESEARCH

0.0 0.2 0.4 0.6 0.8 1.0 0 1 2 3
4 5 6 7 8 9 10 Time from procedure (years) Survival probability No. at risk 1415 991 779 559 398 276 180 114 64 23 6 All octogenarians having MV surgery Evidence based medicine Octogenarians having Mitral Valve Surgery ± CABG ± TV repair over 10-‐year window survfit + Surv {survival} kmplot {by Tatsuki Koyama} Mean 4 pa+ents per unit / year

Contemporary sta+s+cal methodology for retrospec+ve data Unmatched Unmatched
3 2 1 0 1 2 0.0 0.2 0.3 0.5 0.6 0.8 0.9 Mechanical Biological Propensity score Matched Matched 3 2 1 0 1 2 3 0.0 0.2 0.3 0.5 0.6 0.8 0.9 Mechanical Biological Propensity score matchit {MatchIt} Probability of receiving a mechanical valve Mechanical valve Biological valve Mechanical valve Biological valve

Risk predic+on: status quo 2002 2004 2006 2008 2010
0.02 0.04 0.06 0.08 0.10 Time Mortality proportion Observed Expected Actual Overall average Trend Mortality propor+on Ra+o = 0.37 Ra+o = 0.73 2% 4% 6% 8% 10% Mortality Date of surgery

Risk predic+on: with R Biometrics 68, 23–30 March 2012
DOI: 10.1111/j.1541-0420.2011.01645.x Dynamic Logistic Regression and Dynamic Model Averaging for Binary Classification Tyler H. McCormick,1,∗ Adrian E. Raftery,2 David Madigan,1 and Randall S. Burd3 1Department of Statistics, Columbia University, 1255 Amsterdam Avenue, New York, New York 10025, U.S.A. 2Department of Statistics, University of Washington, Box 354322, Seattle, Washington 98195-4322, U.S.A. 3Children’s National Medical Center, 111 Michigan Avenue NW, Washington, District of Columbia 20010, U.S.A. ∗email: [email protected] Summary. We propose an online binary classification procedure for cases when there is uncertainty about the model to use and parameters within a model change over time. We account for model uncertainty through dynamic model averaging, a dynamic extension of Bayesian model averaging in which posterior model probabilities may also change with time. We apply a state-space model to the parameters of each model and we allow the data-generating model to change over time according to a Markov chain. Calibrating a “forgetting” factor accommodates different levels of change in the data-generating mechanism. We propose an algorithm that adjusts the level of forgetting in an online fashion using the posterior predictive distribution, and so accommodates various levels of change at different times. We apply our method to data from children with appendicitis who receive either a traditional (open) appendectomy or a laparoscopic procedure. Factors associated with which children receive a particular type of procedure changed substantially over the 7 years of data collection, a feature that is not captured using standard regression modeling. Because our procedure can be implemented completely online, future data collection for similar studies would require storing sensitive patient information only temporarily, reducing the risk of a breach of confidentiality. Key words: Bayesian model averaging; Binary classification; Confidentiality; Hidden Markov model; Laparoscopic surgery; Markov chain. 1. Introduction We describe a method suited for high-dimensional predictive modeling applications with streaming, massive data in which the data-generating process is itself changing over time. Specifically, we propose an online implementation of the dynamic binary classifier, which dynamically accounts for model uncertainty and allows within-model parameters to change over time. Our model contains three key statistical features that make it well suited for such applications. First, we propose an en- tirely online implementation that allows rapid updating of model parameters as new data arrive. Second, we adopt an ensemble approach in response to a potentially large space of features that addresses overfitting. Specifically we com- bine models using dynamic model averaging (DMA), an extension of Bayesian model averaging (BMA) that allows model weights to change over time. Third, our autotuning algorithm and Bayesian inference address the dynamic nature of the data-generating mechanism. Through the Bayesian paradigm, our adaptive algorithm incorporates more information from past time periods when the process is stable, and less dur- ing periods of volatility. This feature allows us to model local fluctuations without losing sight of overall trends. In what follows we consider a finite set of candidate logistic regression models and assume that the data-generating model follows a (hidden) Markov chain. Within each candidate model, the parameters follow a state-space model. We present algorithms for recursively updating both the Markov chain and the state-space model in an online fashion. Each candidate model is updated independently because the defi- nition of the state vector is different for each candidate model. This alleviates much of the computational burden associated with hidden Markov models. We also update the posterior model probabilities dynamically, allowing the “correct” model to change over time. “Forgetting” eliminates the need for between-state transi- tion matrices and makes online prediction computationally feasible. The key idea within each candidate model is to center the prior for the unobserved state of the process at time t on the center of the posterior at the (t − 1)th observation, and to set the prior variance of the state at time t equal to the posterior variance at time (t − 1) inflated by a forgetting factor. Forgetting is similar to applying weights to the sample, where temporally distant observations receive smaller weight than more recent observations. Forgetting calibrates or tunes the influence of past observations. Adaptively calibrating the procedure allows the amount of change in the model parameters to change over time. Our procedure is online and requires no additional data storage, preserving our method’s applicability for large-scale problems and for cases where sensitive information should be discarded as soon as possible. Our method combines components of several well-known dynamic modeling schemes (see Smith, 1979, or Smith, 1992, C 2011, The International Biometric Society 23 + Intercept −6.00 −5.75 −5.50 −5.25 2002 2004 2006 2008 2010 Time Coefficient Estimate 95% CI No update Rolling 24−month window (12−months) Rolling 24−month window (1−month) Piecewise recalibration (12−months) Piecewise recalibration (24−months) Dynamic logistic regression logistic.dma {dma}

CONCLUSIONS

Conclusions •  We need to unlock healthcare registries to:
§  Monitor quality & avoid a repeat of Bristol §  Revalida+on of professional creden+als §  Facilitate pa+ent choice §  Develop & validate evidence based medicine §  Increase in demand •  We can do it all in R!

Comments & sugges+ons •  Funded by Heart Research UK
[Grant Number RG2583] •  Dr Norman Stein, North West e-‐Health Acknowledgements [email protected]

Unlocking a national adult cardiac surgery audi...

Unlocking a national adult cardiac surgery audit registry with R

Graeme Hickey

More Decks by Graeme Hickey

Featured

Transcript

Unlocking a na+onal adult cardiac surgery audit registry with

BACKGROUND

Bristol Inquiry Contributory factors that led to the

Na+onal Adult Cardiac Surgery Audit registry •  Up

Flow of data NICOR NIBHI HOSPITALS

UNLOCKING THE REGISTRY MESSY DATA

Cleaning the registry in DATA EXTRACT

> with(SCTS, table(X4.04.Discharge.Destination, X4.05.Status.at.Discharge))! X4.05.Status.at.Discharge! X4.04.Discharge.Destination 0. Alive 1. Dead!

•  Errors are diﬃcult to ﬁnd and not all can

Warning: cleaning clinical registries without experts is dangerous*

UNLOCKING THE REGISTRY MONITORING

• • • • Crude 0% 5% 10% 15% 0

Publica+on of named healthcare provider outcomes FILTER DATA

Exploratory analyses hgp://www.scts.org/DynamicCharts/ summaryBy {doBy} + gvisMotionChart {googleVis}!

Monitoring medical devices •  Currently does not happen

UNLOCKING THE REGISTRY RESEARCH

0.0 0.2 0.4 0.6 0.8 1.0 0 1 2 3

Contemporary sta+s+cal methodology for retrospec+ve data Unmatched Unmatched

Risk predic+on: status quo 2002 2004 2006 2008 2010

Risk predic+on: with R Biometrics 68, 23–30 March 2012

CONCLUSIONS

Conclusions •  We need to unlock healthcare registries to:

Comments & sugges+ons •  Funded by Heart Research UK