Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Unlocking a national adult cardiac surgery audit registry with R

Graeme Hickey
July 11, 2013
130

Unlocking a national adult cardiac surgery audit registry with R

Presented at the Use-R Conference 2013, University of Castilla-La Mancha, Albacete, Spain

Graeme Hickey

July 11, 2013
Tweet

More Decks by Graeme Hickey

Transcript

  1. Unlocking  a  na+onal  adult  cardiac  
    surgery  audit  registry  with  
    GL  Hickey1,2,3,  SW  Grant2,3  &  B  Bridgewater1,2,3  
     
    1Northwest  Ins.tute  of  BioHealth  Informa.cs,  University  of  Manchester  
    2University  Hospital  of  South  Manchester  
    3Na.onal  Ins.tute  of  Cardiovascular  Outcomes  Research,  UCL  
    The  R  User  Conference  2013  
    University  of  Cas+lla-­‐La  Mancha,  Albacete,  Spain  

    View Slide

  2. BACKGROUND  

    View Slide

  3. Bristol  Inquiry  
    Contributory  factors  
    that  led  to  the  failings  
    included:  
    1.  Inadequate  
    collec+on  of  data  
    2.  Inadequate  
    monitoring  of  data  

    View Slide

  4. Na+onal  Adult  Cardiac  Surgery  Audit  
    registry  
    •  Up  to  166  clinical  variables  collected  on  each  
    pa+ent:  administra+ve,  demographics,  
    comorbidi+es,  opera+ve  factors,  outcomes  
    •  15  years  of  data  
    •  465,000  records  
    •  44  hospitals  +  >400  consultant  surgeons  

    View Slide

  5. Flow  of  data  
    NICOR   NIBHI  
    HOSPITALS  
    DATABASE  
    CLEANING  
    ANALYSES  
    The Society for
    Cardiothoracic Surgery
    in Great Britain & Ireland
    Sixth
    National Adult Cardiac
    Surgical Database Report
    2008
    Demonstrating quality
    Prepared by
    Ben Bridgewater PhD FRCS
    Bruce Keogh KBE DSc MD FRCS FRCP
    on behalf of the Society for Cardiothoracic Surgery
    in Great Britain & Ireland
    Robin Kinsman BSc PhD
    Peter Walton MA MB BChir MBA
    Dendrite Clinical Systems
    Cardiac Surgery
    AUDIT  &  GOVERNANCE  TOOLS  
    CLINICAL  
    RESEARCHERS  
    NATIONAL  
    DEATH  
    REGISTER*  
    *  Ability  to  link  with  many  
    other  na+onal  registries   RESEARCH  

    View Slide

  6. UNLOCKING  THE  REGISTRY  
    MESSY  DATA  

    View Slide

  7. Cleaning  the  registry  in    
    DATA  
    EXTRACT  
    VARIABLE  1  
    VARIABLE  2  
    VARIABLE  3  
    …………  
    EXCLUDE  
    RECORDS  
    ADD  
    VALUE  
    CLEANED  
    DATA  
    Scripts  to  add:  
    •  Risk  scores  
    •  Combined  variables  
    •  ‘Resolve’  conflic+ng  
    variables  
    •  Script  per  each  variable  
    •  Some  dependencies  
    E.g.  duplicates  
    Rapidly  
    reproducible  

    View Slide

  8. > with(SCTS, table(X4.04.Discharge.Destination, X4.05.Status.at.Discharge))!
    X4.05.Status.at.Discharge!
    X4.04.Discharge.Destination 0. Alive 1. Dead!
    828 48296 2453!
    . Another dept within the trust 0 57 0!
    0 1 1 0!
    0. Not applicable - patient deceased 0 0 1!
    1 Home 0 4104 0!
    1. Home 674 370763 374!
    2 Convalescence 0 63 0!
    2. Convalescence 8 7347 4!
    2. Convalescence (Non acute Hospital) 2 2164 0!
    3 Other hospital 0 1 0!
    3 Other Hospital 0 151 0!
    3 Other Hospital - wd 6 0 1 0!
    3 Other Hospital wd 2 0 1 0!
    3 Other ward 0 1 0!
    3. Other Acute hospital 1 7680 1!
    3. Other hospital 115 22935 37!
    4 Patient deceased 0 0 173!
    4. Not applicable - patient deceased 51 412 13286!
    4. Patient Deceased 0 0 19!
    5 0 7 0!
    5. Transferred to different Consultant - NGH 0 42 0!
    7 0 2 0!
    8 0 38 4!
    9 114 3820 518!
    Second op 0 2 6!
    Illegal  op+ons  
    Transcrip+onal  
    discrepancies  
    Missing  data  
    Conflicts  

    View Slide

  9. •  Errors  are  difficult  to  find  and  not  all  can  be  
    resolved  
    •  Excluding  all  imperfect  data  not  an  op+on  
    •  Balance  between  a  ‘research  ready’  dataset  and  
    robust  audit  capability  
    •  Needs  to  be  reproducible  
     
    •  It  is  locked  to  clinicians  &  researchers  without  
    being  cleaned  
    Cleaning  the  registry  in    

    View Slide

  10. Warning:  cleaning  clinical  registries  
    without  experts  is  dangerous*  
    *  Applies  to  analysing  healthcare  data  also  
    +   =   DATA  

    View Slide

  11. UNLOCKING  THE  REGISTRY  
    MONITORING  

    View Slide



  12. ● ●
    Crude
    0%
    5%
    10%
    15%
    0 200 400 600 800 0 200
    Number of procedures
    Mortality rate






    Crude RAMR
    200 400 600 800 0 200 400 600 800
    Number of procedures
    Healthcare
    provider



    2386780
    2503756
    3166114
    3207776
    3226274
    3286898
    3451180
    3631845
    4445638
    4473204
    4683551
    Publica+on  of  named  healthcare  
    provider  outcomes  
    hgp://www.scts.org/pa+ents/  

    View Slide

  13. Publica+on  of  named  healthcare  
    provider  outcomes  
    FILTER  DATA  
     
    subset!
    RISK  ADJUSTMENT  
     
    glm, glmer {lme4}, mfp
    {mfp}, predict, auc {pROC}, !
    CLASSIFICATION  &  
    PRESENTATION  
     
    ggplot {ggplot2}, write.csv!
    AGGREGATION  
     
    summaryBy {doBy}, merge,
    arrange {plyr}!

    View Slide

  14. Exploratory  analyses  
    hgp://www.scts.org/DynamicCharts/  
    summaryBy {doBy} + gvisMotionChart {googleVis}!

    View Slide

  15. Monitoring  medical  devices  
    •  Currently  does  not  
    happen  in  UK  
    •  Data:  200  valve  types  
    entered  13,000  ways  
    (free  text)  
    •  But  R  is  good  with  
    regular  expressions  

    View Slide

  16. UNLOCKING  THE  REGISTRY  
    RESEARCH  

    View Slide

  17. 0.0
    0.2
    0.4
    0.6
    0.8
    1.0
    0 1 2 3 4 5 6 7 8 9 10
    Time from procedure (years)
    Survival probability
    No. at risk 1415 991 779 559 398 276 180 114 64 23 6
    All octogenarians having MV surgery
    Evidence  based  medicine  
    Octogenarians  having  Mitral  Valve  Surgery  ±  CABG  ±  TV  repair  
    over  10-­‐year  window        
    survfit + Surv {survival}
    kmplot {by Tatsuki Koyama}
    Mean  4  pa+ents  per  unit  /  year  

    View Slide

  18. Contemporary  sta+s+cal  methodology  
    for  retrospec+ve  data  
    Unmatched
    Unmatched
    3 2 1 0 1 2
    0.0 0.2 0.3 0.5 0.6 0.8 0.9
    Mechanical Biological
    Propensity score
    Matched
    Matched
    3 2 1 0 1 2 3
    0.0 0.2 0.3 0.5 0.6 0.8 0.9
    Mechanical Biological
    Propensity score
    matchit {MatchIt}
    Probability  of  receiving  a  mechanical  valve  
    Mechanical  valve   Biological  valve   Mechanical  valve   Biological  valve  

    View Slide

  19. Risk  predic+on:  status  quo  
    2002 2004 2006 2008 2010
    0.02 0.04 0.06 0.08 0.10
    Time
    Mortality proportion
    Observed
    Expected
    Actual
    Overall average
    Trend
    Mortality  propor+on  
    Ra+o  =  0.37  
    Ra+o  =  0.73  
    2%  
    4%  
    6%  
    8%  
    10%  
    Mortality  
    Date  of  surgery  

    View Slide

  20. Risk  predic+on:  with  R  
    Biometrics 68, 23–30
    March 2012
    DOI: 10.1111/j.1541-0420.2011.01645.x
    Dynamic Logistic Regression and Dynamic Model Averaging
    for Binary Classification
    Tyler H. McCormick,1,∗ Adrian E. Raftery,2 David Madigan,1 and Randall S. Burd3
    1Department of Statistics, Columbia University, 1255 Amsterdam Avenue, New York, New York 10025, U.S.A.
    2Department of Statistics, University of Washington, Box 354322, Seattle, Washington 98195-4322, U.S.A.
    3Children’s National Medical Center, 111 Michigan Avenue NW, Washington, District of Columbia 20010, U.S.A.
    ∗email: [email protected]
    Summary. We propose an online binary classification procedure for cases when there is uncertainty about the model to
    use and parameters within a model change over time. We account for model uncertainty through dynamic model averaging,
    a dynamic extension of Bayesian model averaging in which posterior model probabilities may also change with time. We
    apply a state-space model to the parameters of each model and we allow the data-generating model to change over time
    according to a Markov chain. Calibrating a “forgetting” factor accommodates different levels of change in the data-generating
    mechanism. We propose an algorithm that adjusts the level of forgetting in an online fashion using the posterior predictive
    distribution, and so accommodates various levels of change at different times. We apply our method to data from children
    with appendicitis who receive either a traditional (open) appendectomy or a laparoscopic procedure. Factors associated with
    which children receive a particular type of procedure changed substantially over the 7 years of data collection, a feature that
    is not captured using standard regression modeling. Because our procedure can be implemented completely online, future
    data collection for similar studies would require storing sensitive patient information only temporarily, reducing the risk of a
    breach of confidentiality.
    Key words: Bayesian model averaging; Binary classification; Confidentiality; Hidden Markov model; Laparoscopic
    surgery; Markov chain.
    1. Introduction
    We describe a method suited for high-dimensional predic-
    tive modeling applications with streaming, massive data in
    which the data-generating process is itself changing over time.
    Specifically, we propose an online implementation of the dy-
    namic binary classifier, which dynamically accounts for model
    uncertainty and allows within-model parameters to change
    over time.
    Our model contains three key statistical features that make
    it well suited for such applications. First, we propose an en-
    tirely online implementation that allows rapid updating of
    model parameters as new data arrive. Second, we adopt an
    ensemble approach in response to a potentially large space
    of features that addresses overfitting. Specifically we com-
    bine models using dynamic model averaging (DMA), an exten-
    sion of Bayesian model averaging (BMA) that allows model
    weights to change over time. Third, our autotuning algorithm
    and Bayesian inference address the dynamic nature of the
    data-generating mechanism. Through the Bayesian paradigm,
    our adaptive algorithm incorporates more information from
    past time periods when the process is stable, and less dur-
    ing periods of volatility. This feature allows us to model local
    fluctuations without losing sight of overall trends.
    In what follows we consider a finite set of candidate lo-
    gistic regression models and assume that the data-generating
    model follows a (hidden) Markov chain. Within each candi-
    date model, the parameters follow a state-space model. We
    present algorithms for recursively updating both the Markov
    chain and the state-space model in an online fashion. Each
    candidate model is updated independently because the defi-
    nition of the state vector is different for each candidate model.
    This alleviates much of the computational burden associated
    with hidden Markov models. We also update the posterior
    model probabilities dynamically, allowing the “correct” model
    to change over time.
    “Forgetting” eliminates the need for between-state transi-
    tion matrices and makes online prediction computationally
    feasible. The key idea within each candidate model is to cen-
    ter the prior for the unobserved state of the process at time
    t on the center of the posterior at the (t − 1)th observation,
    and to set the prior variance of the state at time t equal to
    the posterior variance at time (t − 1) inflated by a forgetting
    factor. Forgetting is similar to applying weights to the sample,
    where temporally distant observations receive smaller weight
    than more recent observations.
    Forgetting calibrates or tunes the influence of past observa-
    tions. Adaptively calibrating the procedure allows the amount
    of change in the model parameters to change over time. Our
    procedure is online and requires no additional data storage,
    preserving our method’s applicability for large-scale problems
    and for cases where sensitive information should be discarded
    as soon as possible.
    Our method combines components of several well-known
    dynamic modeling schemes (see Smith, 1979, or Smith, 1992,
    C
    2011, The International Biometric Society 23
    +  
    Intercept
    −6.00
    −5.75
    −5.50
    −5.25
    2002 2004 2006 2008 2010
    Time
    Coefficient
    Estimate 95% CI
    No update
    Rolling 24−month window (12−months)
    Rolling 24−month window (1−month)
    Piecewise recalibration (12−months)
    Piecewise recalibration (24−months)
    Dynamic logistic regression
    logistic.dma {dma}

    View Slide

  21. CONCLUSIONS  

    View Slide

  22. Conclusions  
    •  We  need  to  unlock  healthcare  registries  to:  
    §  Monitor  quality  &  avoid  a  repeat  of  Bristol  
    §  Revalida+on  of  professional  creden+als  
    §  Facilitate  pa+ent  choice  
    §  Develop  &  validate  evidence  based  medicine  
    §  Increase  in  demand  
     
    •  We  can  do  it  all  in  R!  
     

    View Slide

  23. Comments  &  sugges+ons  
    •  Funded  by  Heart  Research  UK  [Grant  Number  
    RG2583]  
    •  Dr  Norman  Stein,  North  West  e-­‐Health  
     
    Acknowledgements  
    [email protected]  

    View Slide