Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Automatically Extracting Population Level Cause-of- death Information from Free-text Death Certificates

Automatically Extracting Population Level Cause-of- death Information from Free-text Death Certificates

Presentation to the NSW Epidemiology Special Interest Group on our projects on Automatically Extracting Population Level Cause-of- death Information from Free-text Death Certificates.

Bevan Koopman

March 06, 2015
Tweet

More Decks by Bevan Koopman

Other Decks in Science

Transcript

  1. Automatically Extracting Population Level Cause-of- death Information from Free-text Death

    Certificates Bevan Koopman, Postdoctoral Research Fellow AUSTRALIAN  E-­‐HEALTH  RESEARCH  CENTRE   @bevan_koopman
  2. Extracting Cause-of-death Information from Free-text Death Certificates | Dr. Bevan

    Koopman Australian e-Health Research Centre 2 • National e-Health research group in Australia • Joint venture between CSIRO and Qld Health • Currently 60-70 staff, students, visiting researchers
  3. Extracting Cause-of-death Information from Free-text Death Certificates | Dr. Bevan

    Koopman Health Data 
 Semantics • Clinical language processing • Clinical search • Clinical terminology Health 
 Services • Mobile/Tele Health • Forecasting Research Areas 3 Biomedical Informatics • Medical Imaging • Biostatistics
  4. Extracting Cause-of-death Information from Free-text Death Certificates | Dr. Bevan

    Koopman Overview • Death certificates: 
 a valuable source of cause-of-death information • The challenge: 
 extracting accurate statistics from death certificates • The approach: 
 natural language processing and machine learning • The evaluation: 
 10 years of NSW death certificates Disease surveillance: 
 Diabetes, Flu, HIV & Pneumonia Cancer statistics 4
  5. Extracting Cause-of-death Information from Free-text Death Certificates | Dr. Bevan

    Koopman Death Certificates 5 http://en.wikipedia.org/wiki/Al_Capone
  6. Extracting Cause-of-death Information from Free-text Death Certificates | Dr. Bevan

    Koopman Death Certificates 5 http://en.wikipedia.org/wiki/Al_Capone
  7. Extracting Cause-of-death Information from Free-text Death Certificates | Dr. Bevan

    Koopman Death Certificates 5 http://en.wikipedia.org/wiki/Al_Capone
  8. Extracting Cause-of-death Information from Free-text Death Certificates | Dr. Bevan

    Koopman Death Certificates 5 http://en.wikipedia.org/wiki/Al_Capone
  9. Extracting Cause-of-death Information from Free-text Death Certificates | Dr. Bevan

    Koopman Death Certificates 5 http://en.wikipedia.org/wiki/Al_Capone Death certificates are a valuable 
 source of mortality statistics.
  10. Extracting Cause-of-death Information from Free-text Death Certificates | Dr. Bevan

    Koopman Death Certificates 5 http://en.wikipedia.org/wiki/Al_Capone Death certificates are a valuable 
 source of mortality statistics. Surveillance and warnings of increases in disease activity
  11. Extracting Cause-of-death Information from Free-text Death Certificates | Dr. Bevan

    Koopman Death Certificates 5 http://en.wikipedia.org/wiki/Al_Capone Death certificates are a valuable 
 source of mortality statistics. Surveillance and warnings of increases in disease activity Support the development and monitoring of prevention or response strategies.
  12. Extracting Cause-of-death Information from Free-text Death Certificates | Dr. Bevan

    Koopman The Challenge • Extracting accurate, quantitative data from death certificates authored in unstructured free-text • Ambiguity of natural language • Variety in expressing the same meaning • stomach cancer vs. gastric carcinoma • AIDS, HIV, Human immunodeficiency virus • Errors; e.g., misspellings • Volume of death certificates 7
  13. Extracting Cause-of-death Information from Free-text Death Certificates | Dr. Bevan

    Koopman The Challenge • Extracting accurate, quantitative data from death certificates authored in unstructured free-text • Ambiguity of natural language • Variety in expressing the same meaning • stomach cancer vs. gastric carcinoma • AIDS, HIV, Human immunodeficiency virus • Errors; e.g., misspellings • Volume of death certificates 7 Two Choices: 1.Get people to structure their data so computers can understand it; or 2.Get computers to better understand people’s natural language.
  14. Extracting Cause-of-death Information from Free-text Death Certificates | Dr. Bevan

    Koopman 8 The solution: Clinical Natural Language Processing and Machine Learning
  15. Extracting Cause-of-death Information from Free-text Death Certificates | Dr. Bevan

    Koopman Machine Learning for Disease Classification 1. Extract natural language features from death certificates • terms, phrases and medical concepts.
 2. Train a supervised model (Support Vector Machine) to recognise different diseases based on the natural language features. 9
  16. Extracting Cause-of-death Information from Free-text Death Certificates | Dr. Bevan

    Koopman Feature Extraction 10 A) HYPOXIC BRAIN INJURY B) GASTRIC CARCINOMA WITH GASTRECTOMY C) ATRIAL FIBRILLATION d1 =
  17. Extracting Cause-of-death Information from Free-text Death Certificates | Dr. Bevan

    Koopman Feature Extraction 10 A) HYPOXIC BRAIN INJURY B) GASTRIC CARCINOMA WITH GASTRECTOMY C) ATRIAL FIBRILLATION d1 = 126944002:
 HYPOXIC BRAIN INJURY SNOMED CT Medical Concepts
  18. Extracting Cause-of-death Information from Free-text Death Certificates | Dr. Bevan

    Koopman Feature Extraction 10 A) HYPOXIC BRAIN INJURY B) GASTRIC CARCINOMA WITH GASTRECTOMY C) ATRIAL FIBRILLATION d1 = 126944002:
 HYPOXIC BRAIN INJURY 255080008 GASTRIC CARCINOMA SNOMED CT Medical Concepts
  19. Extracting Cause-of-death Information from Free-text Death Certificates | Dr. Bevan

    Koopman Feature Extraction 10 A) HYPOXIC BRAIN INJURY B) GASTRIC CARCINOMA WITH GASTRECTOMY C) ATRIAL FIBRILLATION d1 = 126944002:
 HYPOXIC BRAIN INJURY 255080008 GASTRIC CARCINOMA SNOMED CT Medical Concepts
  20. Extracting Cause-of-death Information from Free-text Death Certificates | Dr. Bevan

    Koopman Feature Extraction 10 A) HYPOXIC BRAIN INJURY B) GASTRIC CARCINOMA WITH GASTRECTOMY C) ATRIAL FIBRILLATION d1 = 126944002:
 HYPOXIC BRAIN INJURY 255080008 GASTRIC CARCINOMA HYPOXIC BRAIN INJURY ... GASTRIC CARCINOMA ... 1 1 1 0 1 1 1 ~ d1 = 126944002: HYPOXIC BRAIN INJURY 255080008: GASTRIC CARCINOMA 1 SNOMED CT Medical Concepts
  21. Extracting Cause-of-death Information from Free-text Death Certificates | Dr. Bevan

    Koopman Feature Extraction 10 A) HYPOXIC BRAIN INJURY B) GASTRIC CARCINOMA WITH GASTRECTOMY C) ATRIAL FIBRILLATION d1 = 126944002:
 HYPOXIC BRAIN INJURY 255080008 GASTRIC CARCINOMA HYPOXIC BRAIN INJURY ... GASTRIC CARCINOMA ... 1 1 1 0 1 1 1 ~ d1 = 126944002: HYPOXIC BRAIN INJURY 255080008: GASTRIC CARCINOMA 1 ... = 1 0 0 1 0 0 0 0 SNOMED CT Medical Concepts
  22. Extracting Cause-of-death Information from Free-text Death Certificates | Dr. Bevan

    Koopman Model Training & Classification 11 Support Vector Machine
  23. Extracting Cause-of-death Information from Free-text Death Certificates | Dr. Bevan

    Koopman Model Training & Classification 11 Gastric Carcinoma non-Gastric Carcinoma Support Vector Machine
  24. Extracting Cause-of-death Information from Free-text Death Certificates | Dr. Bevan

    Koopman Model Training & Classification 11 Gastric Carcinoma non-Gastric Carcinoma ? Un-classified death certificate Support Vector Machine
  25. Extracting Cause-of-death Information from Free-text Death Certificates | Dr. Bevan

    Koopman Model Training & Classification 11 Gastric Carcinoma non-Gastric Carcinoma Support Vector Machine
  26. Extracting Cause-of-death Information from Free-text Death Certificates | Dr. Bevan

    Koopman System Workflow Death certificate Real-time feed
  27. Extracting Cause-of-death Information from Free-text Death Certificates | Dr. Bevan

    Koopman System Workflow Death certificate Real-time feed A) HYPOXIC BRAIN INJURY B) GASTRIC CARCINOMA WITH ... C) ATRIAL FIBRILLATION Feature Extraction
  28. Extracting Cause-of-death Information from Free-text Death Certificates | Dr. Bevan

    Koopman System Workflow Death certificate Real-time feed Support Vector Machines Classification A) HYPOXIC BRAIN INJURY B) GASTRIC CARCINOMA WITH ... C) ATRIAL FIBRILLATION Feature Extraction
  29. Extracting Cause-of-death Information from Free-text Death Certificates | Dr. Bevan

    Koopman System Workflow Death certificate Real-time feed Support Vector Machines Classification A) HYPOXIC BRAIN INJURY B) GASTRIC CARCINOMA WITH ... C) ATRIAL FIBRILLATION Feature Extraction Cause of death(s) ICD codes
  30. Extracting Cause-of-death Information from Free-text Death Certificates | Dr. Bevan

    Koopman 1. Disease Surveillance • Project with NSW Ministry of Health • Aim: Extract cause-of-death stats for Diseases of Interest: 
 Influenza, HIV, Pneumonia and Diabetes. • Data: ~7 years of NSW Death certificate; ~340,142 certificates • Tasks: 1. Identify if certificates contains Disease of Interest 2. Identify specific ICD-10 pertaining to Disease of Interest • e.g., Viral pneumonia vs. Bacterial pneumonia • Non-insulin-dependent vs. Insulin-dependent diabetes • Empirical evaluation on ‘unseen’ set of labelled certificates. • Precision, Recall (Sensitivity) and F-measure 14
  31. Extracting Cause-of-death Information from Free-text Death Certificates | Dr. Bevan

    Koopman 15 0% 25% 50% 75% 100% Pneumonia Diabetes Influenza HIV Precision (PPV) Recall (Sensitivity) F-measure 1. Disease Surveillance n=43,947 n=29,791 n=192 n=777
  32. Extracting Cause-of-death Information from Free-text Death Certificates | Dr. Bevan

    Koopman 2. Cancer Classification for Cancer Registries 16 Automatic  Classification  of  Diseases      |    Bevan  Koopman   • Project with Cancer Institute NSW • Aim: Extract cause-of-death stats for different types of cancer • Data: 10 years of NSW Death certificate; ~447,336 certificates • Tasks: 1. Classify cancer as underlying cause of death death 2. Classify ~80 different cancer class - very common (Lung) to very rare (Placenta) • Empirical evaluation on ‘unseen’ set of labelled certificates.
  33. Extracting Cause-of-death Information from Free-text Death Certificates | Dr. Bevan

    Koopman 2. Cancer Classification for Cancer Registries 17 Most common Less common
  34. Extracting Cause-of-death Information from Free-text Death Certificates | Dr. Bevan

    Koopman Queensland Cancer Control Analysis • Developed a system in collaboration with Queensland Cancer Control Analysis Team (QCCAT) • Real-time classification of pathology reports: • Identify notifiable cancers • Identify specific characteristics of the cancer • Produce structured report of cancer cases 19
  35. Extracting Cause-of-death Information from Free-text Death Certificates | Dr. Bevan

    Koopman Queensland Cancer Control Analysis • Developed a system in collaboration with Queensland Cancer Control Analysis Team (QCCAT) • Real-time classification of pathology reports: • Identify notifiable cancers • Identify specific characteristics of the cancer • Produce structured report of cancer cases 19
  36. Extracting Cause-of-death Information from Free-text Death Certificates | Dr. Bevan

    Koopman Radiology Search • In collaboration with Princess Alexandra Hospital, Brisbane • Customised Radiology Search Engine for ~2 million radiology reports • Summary statistics to aid research related activities 20
  37. Extracting Cause-of-death Information from Free-text Death Certificates | Dr. Bevan

    Koopman Radiology Search • In collaboration with Princess Alexandra Hospital, Brisbane • Customised Radiology Search Engine for ~2 million radiology reports • Summary statistics to aid research related activities 20
  38. Extracting Cause-of-death Information from Free-text Death Certificates | Dr. Bevan

    Koopman Conclusions 21 • Death certificates may provide a valuable insight into population cause-of-death information. • Need specific methods to overcome challenges of natural language. • Clinical Natural Language Processing and Machine Learning. • General approaches applied to different diseases and clinical reports. • Interested to hear more about YOUR problems managing clinical natural language (and how we might help).
  39. ADD  BUSINESS  UNIT/FLAGSHIP  NAME Thank you Australian e-Heath Research Centre

    Bevan Koopman
 PostDoctoral Research Fellow @bevan_koopman e [email protected] w http://aehrc.com