Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cleaning and analysis of the SCTS database

Cleaning and analysis of the SCTS database

Presented at the SCTS Annual Meeting 2012, Manchester, UK (18-20 April, 2012)

Graeme Hickey

April 18, 2012
Tweet

More Decks by Graeme Hickey

Other Decks in Research

Transcript

  1. Cleaning  and  analysis  of  the  SCTS  
    database  
    Graeme  L  Hickey1,2;  Stuart  W  Grant2;  Kate  
    McAllister1;  Norman  Stein1;  Iain  Buchan1;  Ben  
    Bridgewater2  
     
    1Northwest  Ins-tute  of  BioHealth  Informa-cs  
    2University  Hospital  South  Manchester  

    View Slide

  2. Structure:  20  March  2012  
    •  444,289  records  pre-­‐cleaning  
    •  422,493  records  post-­‐cleaning  
    •  181  fields  made  available  
    •  45  hospitals  in  UK  and  Ireland  
    requires  cleaning  
    •  Real  world  data  is  messy:    
    – missingness  
    – measurement  error  
    – conflicts  /  miscoding  

    View Slide

  3. Cleaning  schema  
    CCAD EXTRACT HOUSEKEEPING DATES
    NUMERICAL
    DATA
    STRING
    CLEANING
    MULTI-OPTION
    FIELDS
    MAPPING
    DUPLICATES
    ONS MERGE
    FLAGS
    POST-FLAG ONS
    LOGIC EUROSCORE  
    CONSULTANT  
    IDENTIFIERS  
    AD  HOC  
    SHORTCUTS  
    FINAL EXTRACT

    View Slide

  4. Implementa[on  
    •                     :  a  language  and  environment  for  sta[s[cal  
    compu[ng  and  graphics  
    •  Transparent  (common  S  language  and  open  
    source)  
    •  Sharable  (free  so^ware);    
    •  Reproducible  (tweak  and  re-­‐run)  
    •  Programmable  reports  (data  organisa[on,  
    cleaning,  analysis,  presenta[on)  
    •  Seamless  transi[on  from  cleaning  to  analysis  

    View Slide

  5. Database  in  ac[on  

    View Slide

  6. Cleaning  schema  
    CCAD EXTRACT HOUSEKEEPING DATES
    NUMERICAL
    DATA
    STRING
    CLEANING
    MULTI-OPTION
    FIELDS
    MAPPING
    DUPLICATES
    ONS MERGE
    FLAGS
    POST-FLAG ONS
    LOGIC EUROSCORE  
    CONSULTANT  
    IDENTIFIERS  
    AD  HOC  
    SHORTCUTS  
    FINAL EXTRACT

    View Slide

  7. Housekeeping  
    •  Remove  iden[fiable  fields  
    •  Delete  free  text  and  low-­‐importance  fields  
    •  Tidy-­‐up  field  names  (spelling,  whitespace,  
    etc.)  

    View Slide

  8. Cleaning  schema  
    CCAD EXTRACT HOUSEKEEPING DATES
    NUMERICAL
    DATA
    STRING
    CLEANING
    MULTI-OPTION
    FIELDS
    MAPPING
    DUPLICATES
    ONS MERGE
    FLAGS
    POST-FLAG ONS
    LOGIC EUROSCORE  
    CONSULTANT  
    IDENTIFIERS  
    AD  HOC  
    SHORTCUTS  
    FINAL EXTRACT

    View Slide

  9. Dates  
    •  Formabng  –  [me  discarded  except  for  
    procedure  
    •  Delete  records  <  1st  Jan  1998  
    •  Delete  dates  (pre-­‐67  and  future)  
    •  Delete  records  not  sa[sfying  sensible  logic:  
    admission  ≤  procedure  ≤  discharge  

    View Slide

  10. Cleaning  schema  
    CCAD EXTRACT HOUSEKEEPING DATES
    NUMERICAL
    DATA
    STRING
    CLEANING
    MULTI-OPTION
    FIELDS
    MAPPING
    DUPLICATES
    ONS MERGE
    FLAGS
    POST-FLAG ONS
    LOGIC EUROSCORE  
    CONSULTANT  
    IDENTIFIERS  
    AD  HOC  
    SHORTCUTS  
    FINAL EXTRACT

    View Slide

  11. Numerical  data  
    •  Delete  free  text  and  symbols  
    •  Delete  impossible  values  (e.g.  5  valves  
    operated  on)  
    •  Delete  [clinically]  unlikely  values  (e.g.  >  11  
    gra^s)  
    •  Resolve  ‘obvious’  serial  imputa[on  errors  (e.g.  
    height  recorded  in  mm  and  not  cm)  

    View Slide

  12. Cleaning  schema  
    CCAD EXTRACT HOUSEKEEPING DATES
    NUMERICAL
    DATA
    STRING
    CLEANING
    MULTI-OPTION
    FIELDS
    MAPPING
    DUPLICATES
    ONS MERGE
    FLAGS
    POST-FLAG ONS
    LOGIC EUROSCORE  
    CONSULTANT  
    IDENTIFIERS  
    AD  HOC  
    SHORTCUTS  
    FINAL EXTRACT

    View Slide

  13. String  cleaning  
    •  Transcrip[onal  errors  harmonized  (e.g.  ‘female’  
    è  ‘2.  Female’)    
    –  manual  
    –  automated  macros    
    •  Invalid  inputs  (e.g.  free  text)  assigned  to  
    [clinically]  appropriate  op[ons  
    •  Mul[-­‐op[on  fields  (ordered  +  unordered)  –  
    structure  retained  
    •  Small  number  of  conflicts  and  mappings  handled  

    View Slide

  14. Cleaning  schema  
    CCAD EXTRACT HOUSEKEEPING DATES
    NUMERICAL
    DATA
    STRING
    CLEANING
    MULTI-OPTION
    FIELDS
    MAPPING
    DUPLICATES
    ONS MERGE
    FLAGS
    POST-FLAG ONS
    LOGIC EUROSCORE  
    CONSULTANT  
    IDENTIFIERS  
    AD  HOC  
    SHORTCUTS  
    FINAL EXTRACT

    View Slide

  15. Mapping  
    •  Par[ally  fragmented  about  March  2010:  
    Version  3  &  4.    
    •  Scripts  wriren  to  map  V3.8  into  V4.1.2  
    •  Simultaneous  pre-­‐  and  post-­‐mapping  cleaning  
    •  Retrospec[vely  deleted  isolated  abdominal  
    procedure  records  

    View Slide

  16. Example:  major  aor[c  fields  
    3.68  
    3.70  
    3.76  
    3.72  
    3.74  
    3.68.1  
    3.70.1  
    3.76.1  
    3.72.1  
    3.74.1  
    3.69.1  
    3.71.1  
    3.77.1  
    3.73.1  
    3.75.1  
    3.69  
    3.71  
    3.77  
    3.73  
    3.75  
    3.90  
    2.07  
    2.10  
    3.67  
                                                                             
     
    3.11.1  
    3.11.2  
    3.11.4  
    3.11.3  
    3.11  
    2.35   3.12  
    3.13  

    View Slide

  17. Cleaning  schema  
    CCAD EXTRACT HOUSEKEEPING DATES
    NUMERICAL
    DATA
    STRING
    CLEANING
    MULTI-OPTION
    FIELDS
    MAPPING
    DUPLICATES
    ONS MERGE
    FLAGS
    POST-FLAG ONS
    LOGIC EUROSCORE  
    CONSULTANT  
    IDENTIFIERS  
    AD  HOC  
    SHORTCUTS  
    FINAL EXTRACT

    View Slide

  18. Duplicate  records  
    •  A  record  is  classed  as  a  
    duplicate  if  it  matches  
    on  a  subset.  
    •  The  most  recent  record  
    created  is  kept;  others  
    deleted  
    •  Records  inspected  a^er  
    removal  to  ‘confirm’  
    duplicates  and  not  re-­‐
    dos    
    Match  criteria  
     
    ü  hospital  
    ü  gender  
    ü  age  (decimal  precision)  
    ü  Apollo  number  (where  
    available)  
    ü  number  of  previous  heart  
    opera[ons  
    ü  procedure  indicators  (CABG,  
    valve,  major  aor[c,  other)  
    ü  admission,  procedure  (incl.  
    [me)  and  discharge  date  
    ü  elec[ve  (true/false)  

    View Slide

  19. Cleaning  schema  
    CCAD EXTRACT HOUSEKEEPING DATES
    NUMERICAL
    DATA
    STRING
    CLEANING
    MULTI-OPTION
    FIELDS
    MAPPING
    DUPLICATES
    ONS MERGE
    FLAGS
    POST-FLAG ONS
    LOGIC EUROSCORE  
    CONSULTANT  
    IDENTIFIERS  
    AD  HOC  
    SHORTCUTS  
    FINAL EXTRACT

    View Slide

  20. ONS  data  linkage  
    •  Life  status  data  extracted  from  the  Office  for  
    Na[onal  Sta[s[cs  (ONS)  
    •  ONS  data  removed  if  precedes  procedure  date  
    •  Records  deleted  if  pa[ent  deceased  prior  to  a  
    first-­‐[me  cardiac  procedure  

    View Slide

  21. Cleaning  schema  
    CCAD EXTRACT HOUSEKEEPING DATES
    NUMERICAL
    DATA
    STRING
    CLEANING
    MULTI-OPTION
    FIELDS
    MAPPING
    DUPLICATES
    ONS MERGE
    FLAGS
    POST-FLAG ONS
    LOGIC EUROSCORE  
    CONSULTANT  
    IDENTIFIERS  
    AD  HOC  
    SHORTCUTS  
    FINAL EXTRACT

    View Slide

  22. Flags  
    •  Resolve  conflicts  
    – in-­‐hospital  mortality  (e.g.  deceased  but  sent  
    home)  
    – back-­‐fill  missing  mortality  from  ONS  
    •  Evidence  based  indicators  (incl.  resolving  
    conflicts):  
    – (individual)  valve  procedures  
    – first  opera[on  in  a  single  admission  spell  
    – first-­‐[me  cardiac  surgery  

    View Slide

  23. Cleaning  schema  
    CCAD EXTRACT HOUSEKEEPING DATES
    NUMERICAL
    DATA
    STRING
    CLEANING
    MULTI-OPTION
    FIELDS
    MAPPING
    DUPLICATES
    ONS MERGE
    FLAGS
    POST-FLAG ONS
    LOGIC EUROSCORE  
    CONSULTANT  
    IDENTIFIERS  
    AD  HOC  
    SHORTCUTS  
    FINAL EXTRACT

    View Slide

  24. EuroSCORE  
    •  3  predic[ons  calculated:  logis[c,  mEuroSCORE  
    &  EuroSCORE  II  
    •  Emphasis  on  iden[fying  true  missing  values:  
    – data  quality  measure  
    – future  analysis  of  consequences  of  SCTS  
    imputa[on  
    •  Database  not  developed  with  EuroSCORE  II  in  
    mind  

    View Slide

  25. Cleaning  schema  
    CCAD EXTRACT HOUSEKEEPING DATES
    NUMERICAL
    DATA
    STRING
    CLEANING
    MULTI-OPTION
    FIELDS
    MAPPING
    DUPLICATES
    ONS MERGE
    FLAGS
    POST-FLAG ONS
    LOGIC EUROSCORE  
    CONSULTANT  
    IDENTIFIERS  
    AD  HOC  
    SHORTCUTS  
    FINAL EXTRACT

    View Slide

  26. Addi[onal  modules  
    •  Consultant  iden[fiers  coded  to  GMC  numbers  
    – GMC  database;  hospital  webpage;  Dr.  Forster  
    •  Records  deleted  for  serious  ONS  date  
    discrepancies  
    •  Expanding  list  of  shortcut  fields  (e.g.  country,  
    financial  year)  

    View Slide

  27. Future  cleaning  
    •  Trust-­‐level  publica[on  of  deleted  records  
    •  Tweaks  based  on  valida[on  feedback  
    •  Revisit  assump[ons  +  ‘quick-­‐fixes’  of  
    numerical  values  
    •  Refinement  of  the  aor[c  field  mappings  
    •  Centralized  cleaning  /  mapping  by  NICOR  

    View Slide

  28. Analyzing  the  data  
    Governance   Scien[fic  

    View Slide

  29. Governance  
    The Society for
    Cardiothoracic Surgery
    in Great Britain & Ireland
    Sixth
    National Adult Cardiac
    Surgical Database Report
    2008
    Demonstrating quality
    Prepared by
    Ben Bridgewater PhD FRCS
    Bruce Keogh KBE DSc MD FRCS FRCP
    on behalf of the Society for Cardiothoracic Surgery
    in Great Britain & Ireland
    Robin Kinsman BSc PhD
    Peter Walton MA MB BChir MBA
    Dendrite Clinical Systems
    Cardiac Surgery
    0.00
    0.02
    0.04
    200 400 600 800 1000 1200
    All Cardiac Surgery (26.07.2010 - 31.03.2011)
    Number of cardiac procedures
    Risk adjusted mortality rate
    EuroSCORE  II:  all  cardiac  surgery  

    View Slide

  30. Informing  our  members  
    Mr Ben Bridgewater
    EuroSCORE series
    Date
    mEuroSCORE
    0.1
    0.2
    0.3
    0.4
    0.5
    2009 2010 2011
    Cumulative mortality
    Date
    Total number of deaths
    0
    2
    4
    6
    8
    2009 2010 2011
    VLAD (with date dispersion)
    Date
    Predicted − Observed
    2
    4
    6
    8
    2009 2010 2011
    Crude mortality funnel plot
    Number of cardiac procedures
    Mortality rate
    0.00
    0.05

    200 400 600 800
    Risk adjusted mortality funnel plot
    Number of cardiac procedures
    Mortality rate
    0.00
    0.05

    200 400 600 800
    Cummulative mEuroSCORE
    Cummulative Mortality
    VLAD
    5
    10
    15
    0
    2
    4
    6
    8
    10
    12
    14
    0
    2
    4
    6
    2008−07 2009−01 2009−07 2010−01 2010−07 2011−01
    Date
    Unit of Interest
    2386780
    2503756
    3166114
    3207776
    3226274
    3286898
    3451180
    3631845
    4002776
    4473204
    4486266
    4683551

    View Slide

  31. Responding  to  contemporary  
    ques[ons  

    View Slide

  32. Measuring  data  quality  
    Rank
    10
    20
    30
    40 ●


    Hospital
    BAL. Barts and The London
    BAS. Basildon Hospital
    BHL. Liverpool Heart and Chest Hospital
    BRI. Bristol Royal Infirmary
    CHH. Castle Hill Hospital
    CHN. Nottingham City Hospital
    ERI. Royal Infirmary of Edinburgh
    FRE. Freeman Hospital
    GEO. St George's Hospital
    GJH. Golden Jubilee Hospital
    GRL. Glenfield Hospital
    HAM. Hammersmith Hospital
    HH. Harefield Hospital
    HHW. Wellington Hospital North
    HSC. Harley Street Clinic
    KCH. King's College Hospital
    LBH. London Bridge Hospital
    LGI. Leeds General Infirmary
    MOR. Morriston Hospital
    MRI. Manchester Royal Infirmary
    NCR. New Cross Hospital
    NGS. Northern General Hospital
    NHB. Royal Brompton Hospital
    PAP. Papworth Hospital
    PLY. Derriford Hospital
    QEB. Queen Elizabeth Hospital
    RAD. John Radcliffe Hospital
    RIA. Aberdeen Royal Infirmary
    RSC. Royal Sussex County Hospital
    RVB. Royal Victoria Hospital
    SCM. James Cook University Hospital
    SGH. Southampton General Hospital
    STH. St Thomas Hospital
    STM. St Marys Hospital Paddington
    STO. University Hospital of North Staffordshire
    UCL. University College Hospital
    UHW. University Hospital of Wales
    VIC. Victoria Hospital
    WAL. University Hospital Coventry
    WYT. Wythenshawe Hospital
    Hospitals  
    Distribu[on  of  
    ranks  of  
    EuroSCORE  
    risk  factor  
    prevalence  
    might  be  
    expected  to  
    homogenous  
    across  hospital  
     
     
     
    Further  
    inves[ga[on  
    required  

    View Slide

  33. Scien[fic  
    •  Mitral  valve  prosthesis:  mechanical  vs.  
    biological  
    •  Model  valida[on  (è  ensure  current  
    governance)  
    •  Calibra[on  dri^  detec[on  methodology  (è  
    inform  future  governance)  

    View Slide

  34. Further  informa[on  
    •  SCTS  website  
    – www.scts.org/  
    •  SCTS-­‐NIBHI  project  website  (incl.  contacts)  
    – personalpages.manchester.ac.uk/staff/
    graeme.hickey/scts/  
    •  NICOR  website  
    – www.ucl.ac.uk/nicor  
     

    View Slide

  35. Acknowledgements  
    •  Heart  Research  UK  –  funding  
    •  Sue  Manuel  (NICOR)  –  database  extracts  
    •  All  hospital  audit  leads  and  database  managers  –  valida[ng  
    audit  summaries  
    •  UK  cardiac  surgeons  –  ensuring  the  validity  and  accuracy  of  
    the  data  inpured  
    •  The  SCTS  and  all  its  members  –  for  suppor[ng  the  audit  
    project  

    View Slide