Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cleaning and analysis of the SCTS database

Cleaning and analysis of the SCTS database

Presented at the SCTS Annual Meeting 2012, Manchester, UK (18-20 April, 2012)

Graeme Hickey

April 18, 2012
Tweet

More Decks by Graeme Hickey

Other Decks in Research

Transcript

  1. Cleaning  and  analysis  of  the  SCTS   database   Graeme

     L  Hickey1,2;  Stuart  W  Grant2;  Kate   McAllister1;  Norman  Stein1;  Iain  Buchan1;  Ben   Bridgewater2     1Northwest  Ins-tute  of  BioHealth  Informa-cs   2University  Hospital  South  Manchester  
  2. Structure:  20  March  2012   •  444,289  records  pre-­‐cleaning  

    •  422,493  records  post-­‐cleaning   •  181  fields  made  available   •  45  hospitals  in  UK  and  Ireland   requires  cleaning   •  Real  world  data  is  messy:     – missingness   – measurement  error   – conflicts  /  miscoding  
  3. Cleaning  schema   CCAD EXTRACT HOUSEKEEPING DATES NUMERICAL DATA STRING

    CLEANING MULTI-OPTION FIELDS MAPPING DUPLICATES ONS MERGE FLAGS POST-FLAG ONS LOGIC EUROSCORE   CONSULTANT   IDENTIFIERS   AD  HOC   SHORTCUTS   FINAL EXTRACT
  4. Implementa[on   •               

         :  a  language  and  environment  for  sta[s[cal   compu[ng  and  graphics   •  Transparent  (common  S  language  and  open   source)   •  Sharable  (free  so^ware);     •  Reproducible  (tweak  and  re-­‐run)   •  Programmable  reports  (data  organisa[on,   cleaning,  analysis,  presenta[on)   •  Seamless  transi[on  from  cleaning  to  analysis  
  5. Cleaning  schema   CCAD EXTRACT HOUSEKEEPING DATES NUMERICAL DATA STRING

    CLEANING MULTI-OPTION FIELDS MAPPING DUPLICATES ONS MERGE FLAGS POST-FLAG ONS LOGIC EUROSCORE   CONSULTANT   IDENTIFIERS   AD  HOC   SHORTCUTS   FINAL EXTRACT
  6. Housekeeping   •  Remove  iden[fiable  fields   •  Delete  free

     text  and  low-­‐importance  fields   •  Tidy-­‐up  field  names  (spelling,  whitespace,   etc.)  
  7. Cleaning  schema   CCAD EXTRACT HOUSEKEEPING DATES NUMERICAL DATA STRING

    CLEANING MULTI-OPTION FIELDS MAPPING DUPLICATES ONS MERGE FLAGS POST-FLAG ONS LOGIC EUROSCORE   CONSULTANT   IDENTIFIERS   AD  HOC   SHORTCUTS   FINAL EXTRACT
  8. Dates   •  Formabng  –  [me  discarded  except  for  

    procedure   •  Delete  records  <  1st  Jan  1998   •  Delete  dates  (pre-­‐67  and  future)   •  Delete  records  not  sa[sfying  sensible  logic:   admission  ≤  procedure  ≤  discharge  
  9. Cleaning  schema   CCAD EXTRACT HOUSEKEEPING DATES NUMERICAL DATA STRING

    CLEANING MULTI-OPTION FIELDS MAPPING DUPLICATES ONS MERGE FLAGS POST-FLAG ONS LOGIC EUROSCORE   CONSULTANT   IDENTIFIERS   AD  HOC   SHORTCUTS   FINAL EXTRACT
  10. Numerical  data   •  Delete  free  text  and  symbols  

    •  Delete  impossible  values  (e.g.  5  valves   operated  on)   •  Delete  [clinically]  unlikely  values  (e.g.  >  11   gra^s)   •  Resolve  ‘obvious’  serial  imputa[on  errors  (e.g.   height  recorded  in  mm  and  not  cm)  
  11. Cleaning  schema   CCAD EXTRACT HOUSEKEEPING DATES NUMERICAL DATA STRING

    CLEANING MULTI-OPTION FIELDS MAPPING DUPLICATES ONS MERGE FLAGS POST-FLAG ONS LOGIC EUROSCORE   CONSULTANT   IDENTIFIERS   AD  HOC   SHORTCUTS   FINAL EXTRACT
  12. String  cleaning   •  Transcrip[onal  errors  harmonized  (e.g.  ‘female’  

    è  ‘2.  Female’)     –  manual   –  automated  macros     •  Invalid  inputs  (e.g.  free  text)  assigned  to   [clinically]  appropriate  op[ons   •  Mul[-­‐op[on  fields  (ordered  +  unordered)  –   structure  retained   •  Small  number  of  conflicts  and  mappings  handled  
  13. Cleaning  schema   CCAD EXTRACT HOUSEKEEPING DATES NUMERICAL DATA STRING

    CLEANING MULTI-OPTION FIELDS MAPPING DUPLICATES ONS MERGE FLAGS POST-FLAG ONS LOGIC EUROSCORE   CONSULTANT   IDENTIFIERS   AD  HOC   SHORTCUTS   FINAL EXTRACT
  14. Mapping   •  Par[ally  fragmented  about  March  2010:   Version

     3  &  4.     •  Scripts  wriren  to  map  V3.8  into  V4.1.2   •  Simultaneous  pre-­‐  and  post-­‐mapping  cleaning   •  Retrospec[vely  deleted  isolated  abdominal   procedure  records  
  15. Example:  major  aor[c  fields   3.68   3.70   3.76

      3.72   3.74   3.68.1   3.70.1   3.76.1   3.72.1   3.74.1   3.69.1   3.71.1   3.77.1   3.73.1   3.75.1   3.69   3.71   3.77   3.73   3.75   3.90   2.07   2.10   3.67                                                                               3.11.1   3.11.2   3.11.4   3.11.3   3.11   2.35   3.12   3.13  
  16. Cleaning  schema   CCAD EXTRACT HOUSEKEEPING DATES NUMERICAL DATA STRING

    CLEANING MULTI-OPTION FIELDS MAPPING DUPLICATES ONS MERGE FLAGS POST-FLAG ONS LOGIC EUROSCORE   CONSULTANT   IDENTIFIERS   AD  HOC   SHORTCUTS   FINAL EXTRACT
  17. Duplicate  records   •  A  record  is  classed  as  a

      duplicate  if  it  matches   on  a  subset.   •  The  most  recent  record   created  is  kept;  others   deleted   •  Records  inspected  a^er   removal  to  ‘confirm’   duplicates  and  not  re-­‐ dos     Match  criteria     ü  hospital   ü  gender   ü  age  (decimal  precision)   ü  Apollo  number  (where   available)   ü  number  of  previous  heart   opera[ons   ü  procedure  indicators  (CABG,   valve,  major  aor[c,  other)   ü  admission,  procedure  (incl.   [me)  and  discharge  date   ü  elec[ve  (true/false)  
  18. Cleaning  schema   CCAD EXTRACT HOUSEKEEPING DATES NUMERICAL DATA STRING

    CLEANING MULTI-OPTION FIELDS MAPPING DUPLICATES ONS MERGE FLAGS POST-FLAG ONS LOGIC EUROSCORE   CONSULTANT   IDENTIFIERS   AD  HOC   SHORTCUTS   FINAL EXTRACT
  19. ONS  data  linkage   •  Life  status  data  extracted  from

     the  Office  for   Na[onal  Sta[s[cs  (ONS)   •  ONS  data  removed  if  precedes  procedure  date   •  Records  deleted  if  pa[ent  deceased  prior  to  a   first-­‐[me  cardiac  procedure  
  20. Cleaning  schema   CCAD EXTRACT HOUSEKEEPING DATES NUMERICAL DATA STRING

    CLEANING MULTI-OPTION FIELDS MAPPING DUPLICATES ONS MERGE FLAGS POST-FLAG ONS LOGIC EUROSCORE   CONSULTANT   IDENTIFIERS   AD  HOC   SHORTCUTS   FINAL EXTRACT
  21. Flags   •  Resolve  conflicts   – in-­‐hospital  mortality  (e.g.  deceased

     but  sent   home)   – back-­‐fill  missing  mortality  from  ONS   •  Evidence  based  indicators  (incl.  resolving   conflicts):   – (individual)  valve  procedures   – first  opera[on  in  a  single  admission  spell   – first-­‐[me  cardiac  surgery  
  22. Cleaning  schema   CCAD EXTRACT HOUSEKEEPING DATES NUMERICAL DATA STRING

    CLEANING MULTI-OPTION FIELDS MAPPING DUPLICATES ONS MERGE FLAGS POST-FLAG ONS LOGIC EUROSCORE   CONSULTANT   IDENTIFIERS   AD  HOC   SHORTCUTS   FINAL EXTRACT
  23. EuroSCORE   •  3  predic[ons  calculated:  logis[c,  mEuroSCORE   &

     EuroSCORE  II   •  Emphasis  on  iden[fying  true  missing  values:   – data  quality  measure   – future  analysis  of  consequences  of  SCTS   imputa[on   •  Database  not  developed  with  EuroSCORE  II  in   mind  
  24. Cleaning  schema   CCAD EXTRACT HOUSEKEEPING DATES NUMERICAL DATA STRING

    CLEANING MULTI-OPTION FIELDS MAPPING DUPLICATES ONS MERGE FLAGS POST-FLAG ONS LOGIC EUROSCORE   CONSULTANT   IDENTIFIERS   AD  HOC   SHORTCUTS   FINAL EXTRACT
  25. Addi[onal  modules   •  Consultant  iden[fiers  coded  to  GMC  numbers

      – GMC  database;  hospital  webpage;  Dr.  Forster   •  Records  deleted  for  serious  ONS  date   discrepancies   •  Expanding  list  of  shortcut  fields  (e.g.  country,   financial  year)  
  26. Future  cleaning   •  Trust-­‐level  publica[on  of  deleted  records  

    •  Tweaks  based  on  valida[on  feedback   •  Revisit  assump[ons  +  ‘quick-­‐fixes’  of   numerical  values   •  Refinement  of  the  aor[c  field  mappings   •  Centralized  cleaning  /  mapping  by  NICOR  
  27. Governance   The Society for Cardiothoracic Surgery in Great Britain

    & Ireland Sixth National Adult Cardiac Surgical Database Report 2008 Demonstrating quality Prepared by Ben Bridgewater PhD FRCS Bruce Keogh KBE DSc MD FRCS FRCP on behalf of the Society for Cardiothoracic Surgery in Great Britain & Ireland Robin Kinsman BSc PhD Peter Walton MA MB BChir MBA Dendrite Clinical Systems Cardiac Surgery 0.00 0.02 0.04 200 400 600 800 1000 1200 All Cardiac Surgery (26.07.2010 - 31.03.2011) Number of cardiac procedures Risk adjusted mortality rate EuroSCORE  II:  all  cardiac  surgery  
  28. Informing  our  members   Mr Ben Bridgewater EuroSCORE series Date

    mEuroSCORE 0.1 0.2 0.3 0.4 0.5 2009 2010 2011 Cumulative mortality Date Total number of deaths 0 2 4 6 8 2009 2010 2011 VLAD (with date dispersion) Date Predicted − Observed 2 4 6 8 2009 2010 2011 Crude mortality funnel plot Number of cardiac procedures Mortality rate 0.00 0.05 • 200 400 600 800 Risk adjusted mortality funnel plot Number of cardiac procedures Mortality rate 0.00 0.05 • 200 400 600 800 Cummulative mEuroSCORE Cummulative Mortality VLAD 5 10 15 0 2 4 6 8 10 12 14 0 2 4 6 2008−07 2009−01 2009−07 2010−01 2010−07 2011−01 Date Unit of Interest 2386780 2503756 3166114 3207776 3226274 3286898 3451180 3631845 4002776 4473204 4486266 4683551
  29. Measuring  data  quality   Rank 10 20 30 40 •

    • • Hospital BAL. Barts and The London BAS. Basildon Hospital BHL. Liverpool Heart and Chest Hospital BRI. Bristol Royal Infirmary CHH. Castle Hill Hospital CHN. Nottingham City Hospital ERI. Royal Infirmary of Edinburgh FRE. Freeman Hospital GEO. St George's Hospital GJH. Golden Jubilee Hospital GRL. Glenfield Hospital HAM. Hammersmith Hospital HH. Harefield Hospital HHW. Wellington Hospital North HSC. Harley Street Clinic KCH. King's College Hospital LBH. London Bridge Hospital LGI. Leeds General Infirmary MOR. Morriston Hospital MRI. Manchester Royal Infirmary NCR. New Cross Hospital NGS. Northern General Hospital NHB. Royal Brompton Hospital PAP. Papworth Hospital PLY. Derriford Hospital QEB. Queen Elizabeth Hospital RAD. John Radcliffe Hospital RIA. Aberdeen Royal Infirmary RSC. Royal Sussex County Hospital RVB. Royal Victoria Hospital SCM. James Cook University Hospital SGH. Southampton General Hospital STH. St Thomas Hospital STM. St Marys Hospital Paddington STO. University Hospital of North Staffordshire UCL. University College Hospital UHW. University Hospital of Wales VIC. Victoria Hospital WAL. University Hospital Coventry WYT. Wythenshawe Hospital Hospitals   Distribu[on  of   ranks  of   EuroSCORE   risk  factor   prevalence   might  be   expected  to   homogenous   across  hospital         Further   inves[ga[on   required  
  30. Scien[fic   •  Mitral  valve  prosthesis:  mechanical  vs.   biological

      •  Model  valida[on  (è  ensure  current   governance)   •  Calibra[on  dri^  detec[on  methodology  (è   inform  future  governance)  
  31. Further  informa[on   •  SCTS  website   – www.scts.org/   • 

    SCTS-­‐NIBHI  project  website  (incl.  contacts)   – personalpages.manchester.ac.uk/staff/ graeme.hickey/scts/   •  NICOR  website   – www.ucl.ac.uk/nicor    
  32. Acknowledgements   •  Heart  Research  UK  –  funding   • 

    Sue  Manuel  (NICOR)  –  database  extracts   •  All  hospital  audit  leads  and  database  managers  –  valida[ng   audit  summaries   •  UK  cardiac  surgeons  –  ensuring  the  validity  and  accuracy  of   the  data  inpured   •  The  SCTS  and  all  its  members  –  for  suppor[ng  the  audit   project