Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Machine Learning for Social Good A case study ...

Machine Learning for Social Good A case study in the health-care sector

Jorge Saldivar

December 17, 2019
Tweet

More Decks by Jorge Saldivar

Other Decks in Technology

Transcript

  1. Machine Learning for Social Good Dr. Jorge Saldivar Barcelona Supercomputing

    Center (BSC) DataBeers BCN 17-12-2019 Image source: https://sites.google.com/site/icml2016data4goodworkshop/
  2. Supporting Proactive Diabetes Screenings to Improve Health Outcomes Team: Benjamin

    Ackerman - Kaleigh Clary - Jorge Saldivar - William Wang - Katy Dupre - Adolfo De Unánue - Rayid Ghani
  3. Supporting Proactive Diabetes Screenings to Improve Health Outcomes | Data

    Science for Social Good Fellowship 2018 | dssgfellowship.org Partner A non-profit national network of more than 40 community health centers serving the least resourced members of their communities
  4. Supporting Proactive Diabetes Screenings to Improve Health Outcomes | Data

    Science for Social Good Fellowship 2018 | dssgfellowship.org 23 years old Obese (BMI 30) Family history of diabetes Hypertension
  5. Supporting Proactive Diabetes Screenings to Improve Health Outcomes | Data

    Science for Social Good Fellowship 2018 | dssgfellowship.org 40-70 years old BMI ≥ 25 Federal screening guidelines
  6. Supporting Proactive Diabetes Screenings to Improve Health Outcomes | Data

    Science for Social Good Fellowship 2018 | dssgfellowship.org 40-70 years old BMI ≥ 25 How well do the screening guidelines do? ✓ ✓ Meet the criteria of the guidelines Patient ~50%
  7. Supporting Proactive Diabetes Screenings to Improve Health Outcomes | Data

    Science for Social Good Fellowship 2018 | dssgfellowship.org 40-70 years old BMI ≥ 25 Our Goal ✓ ✓
  8. Supporting Proactive Diabetes Screenings to Improve Health Outcomes | Data

    Science for Social Good Fellowship 2018 | dssgfellowship.org External American Community Survey (ACS) Data Demographics Visits Medications Lab results Diagnoses De-identified EHR • 1.1 million patients • 24 health centers • ~ 8 million visits
  9. Supporting Proactive Diabetes Screenings to Improve Health Outcomes | Data

    Science for Social Good Fellowship 2018 | dssgfellowship.org Defining Type II Diabetes Cases ICD Diagnoses Medication (metformin) 2 A1C Tests > 6.4 82,960 cases (7.2% of patient population) - ICD diagnoses - 2 A1c Tests > 6.4 - Metformin Rx
  10. Supporting Proactive Diabetes Screenings to Improve Health Outcomes | Data

    Science for Social Good Fellowship 2018 | dssgfellowship.org Prediction Time - Patient visit Patient Id Visit Date Feature 1 Feature 2 ... Feature N 1 2015-01-01 ... ... ... ... 2 2014-12-11 ... ... ... ... 3 2013-05-10 ... ... ... ... 2 2012-06-05 ... ... ... ... 3 2011-07-05 ... ... ... ...
  11. Supporting Proactive Diabetes Screenings to Improve Health Outcomes | Data

    Science for Social Good Fellowship 2018 | dssgfellowship.org Label Data start date 01/03/2006 Data End date 06/15/2018 Label = True Develop diabetes within the next 3 years 3 years Visit date 01/01/2014 Diagnosis date 01/01/2016 Window end date 12/31/2017
  12. Supporting Proactive Diabetes Screenings to Improve Health Outcomes | Data

    Science for Social Good Fellowship 2018 | dssgfellowship.org Features - Age at visit - BMI at visit - Sex - Race - Family history of T2D - De-identified HC location - Smoking status - Blood Pressure (SBP, DBP, Categorical) - Hospitalization in prior visit - Diagnosis of comorbidities (e.g. sleep apnea) - Number of meds prescribed in the past 6 months, 1 year - Avg. BMI in the past 6 months, 1 year - Gini Index - Median Household Income by zip code Raw from Alliance External (ACS) Computed Aggregates
  13. Supporting Proactive Diabetes Screenings to Improve Health Outcomes | Data

    Science for Social Good Fellowship 2018 | dssgfellowship.org Staging, features and labels Raw data External data Cleaned data Store predictions, metrics, model ID Trained Model Train Test Predictions Metrics Technical Solution
  14. Supporting Proactive Diabetes Screenings to Improve Health Outcomes | Data

    Science for Social Good Fellowship 2018 | dssgfellowship.org Staged data Store predictions, metrics, model ID Trained Model Train Test Predictions Metrics Create a table of cross-validation time splits Split data into train/test set (for one time split) Impute, generate more features Train the model Generate predictions on the test set Calculate metrics (precision & recall @k) Store predictions and results Technical Solution
  15. Supporting Proactive Diabetes Screenings to Improve Health Outcomes | Data

    Science for Social Good Fellowship 2018 | dssgfellowship.org How well do the guidelines do? 53%
  16. Supporting Proactive Diabetes Screenings to Improve Health Outcomes | Data

    Science for Social Good Fellowship 2018 | dssgfellowship.org How well does our model do? 63% Random Forest, 10000 estimators, maximum depth 10
  17. Supporting Proactive Diabetes Screenings to Improve Health Outcomes | Data

    Science for Social Good Fellowship 2018 | dssgfellowship.org How well does our model do? 63% 53% 15 Random Forest, 10000 estimators, maximum depth 10
  18. Supporting Proactive Diabetes Screenings to Improve Health Outcomes | Data

    Science for Social Good Fellowship 2018 | dssgfellowship.org How well does our model do? 63% 53% 74% 15 25 Random Forest, 10000 estimators, maximum depth 10
  19. Supporting Proactive Diabetes Screenings to Improve Health Outcomes | Data

    Science for Social Good Fellowship 2018 | dssgfellowship.org Predict risk of Type II Diabetes Personalize Screening Decisions Connect to interventions and services Prevent diabetes and improve health
  20. Supporting Proactive Diabetes Screenings to Improve Health Outcomes | Data

    Science for Social Good Fellowship 2018 | dssgfellowship.org Thank you! Benjamin Ackerman Kaleigh Clary Jorge Saldivar William Wang Katy Dupre Adolfo De Unánue Rayid Ghani
  21. Supporting Proactive Diabetes Screenings to Improve Health Outcomes | Data

    Science for Social Good Fellowship 2018 | dssgfellowship.org Label Develop diabetes within the next 3 years Start date 01/03/2006 End date 06/15/2018 Visit date 01/01/2014 Label = False
  22. Supporting Proactive Diabetes Screenings to Improve Health Outcomes | Data

    Science for Social Good Fellowship 2018 | dssgfellowship.org Label Data start date 01/03/2006 Label = False Develop diabetes within the next 3 years 3 years Visit date 01/01/2014 Diagnosis date 02/01/2018 Window end date 12/31/2017
  23. Supporting Proactive Diabetes Screenings to Improve Health Outcomes | Data

    Science for Social Good Fellowship 2018 | dssgfellowship.org Label Develop diabetes within the next 3 years Start date 01/03/2006 End date 06/15/2018 Visit date 01/01/2014 Label = NULL Diagnosis date 01/01/2013
  24. Supporting Proactive Diabetes Screenings to Improve Health Outcomes | Data

    Science for Social Good Fellowship 2018 | dssgfellowship.org Label Develop diabetes within the next 3 years Start date 01/03/2006 Data end date 06/15/2018 Visit date 01/01/2016 Label = NULL Diagnosis date? 01/01/2019 3 years
  25. Total visits USPSTF visits screening recommendation (20%) T2D cases detected

    (53%) DSSG visits screening recommendation (20%) T2D cases detected (63%) Label Prediction Score 1 0.93 1 0.87 0 0.81 1 0.79 0 0.77 ... ... 0 0.21 0 0.15 0 0.09 k% most probable
  26. Supporting Proactive Diabetes Screenings to Improve Health Outcomes | Data

    Science for Social Good Fellowship 2018 | dssgfellowship.org Train Set Test Set Row Span Timestamp of first row Timestamp of last row Label Span Timestamp of first label Timestamp of last label Row Span Label Span 2006-01-01 - 2006-12-31 2006-01-01 2006-12-31 2006-01-01 - 2009-12-31 2006-01-01 2009-12-31 2010-01-01 - 2010-12-31 2010-01-01 - 2013-12-31 2007-01-01 - 2007-12-31 2007-01-01 2007-12-31 2007-01-01 - 2010-12-31 2007-01-01 2010-12-31 2011-01-01 - 2011-12-31 2011-01-01 - 2014-12-31 2008-01-01 - 2008-12-31 2008-01-01 2008-12-31 2008-01-01 - 2011-12-31 2008-01-01 2011-12-31 2012-01-01 - 2012-12-31 2012-01-01 - 2015-12-31 2009-01-01 - 2009-12-31 2009-01-01 2009-12-31 2009-01-01 - 2012-12-31 2009-01-01 2012-12-31 2013-01-01 - 2013-12-31 2013-01-01 - 2016-12-31 2010-01-01 - 2010-12-31 2010-01-01 2010-12-31 2010-01-01 - 2013-12-31 2010-01-01 2013-12-31 2014-01-01 - 2014-12-31 2014-01-01 - 2017-12-31 Train/Test Splits
  27. Supporting Proactive Diabetes Screenings to Improve Health Outcomes | Data

    Science for Social Good Fellowship 2018 | dssgfellowship.org
  28. Supporting Proactive Diabetes Screenings to Improve Health Outcomes | Data

    Science for Social Good Fellowship 2018 | dssgfellowship.org Time from Initial Visit to T2D Diagnosis 42% of T2D cases!
  29. Supporting Proactive Diabetes Screenings to Improve Health Outcomes | Data

    Science for Social Good Fellowship 2018 | dssgfellowship.org Staging, features and labels Raw data (EHR) External data (ACS, ICD) Cleaned data Store predictions, metrics, model ID Trained Model Train Test Predictions Metrics Configs Config File (.yaml)
  30. Type 2 diabetes, a serious public health problem 410 millions

    of people world wide 2 Associated medical conditions 4 Heart disease Kidney failure Blindness Stroke 14% of adults in US have the disease 3 (45 million) [2] Whiting DR, Guariguata L, Weil C, Shaw J. IDF diabetes atlas: global estimates of the prevalence of diabetes for 2011 and 2030. [3] Menke A, Casagrande S, Geiss L, Cowie CC. Prevalence of and Trends in Diabetes Among Adults in the United States, 1988–2012. [4] Center for Disease and Prevention. National diabetes statistic report: estimates of diabetes and its burden in the United States, 2014. Atlanta, GA: US Department of Health and Human Services, Center for Disease and Prevention, 2014.
  31. Supporting Proactive Diabetes Screenings to Improve Health Outcomes | Data

    Science for Social Good Fellowship 2018 | dssgfellowship.org Start date: 01/01/2006 End date: 06/15/2018 Train Test Label Span Label Span Train Test Label Span Label Span Train Test Label Span Label Span Train Test Label Span Label Span Train Test Label Span Label Span Train/Test Splits