Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Science Meetup, Feb 2017

Avatar for Tristan Bergh Tristan Bergh
February 01, 2017

Data Science Meetup, Feb 2017

A talk I presented at the Cape Town Data Science Meetup, February 2017

Avatar for Tristan Bergh

Tristan Bergh

February 01, 2017
Tweet

Other Decks in Science

Transcript

  1. DATA ▸ NHS service-provider ▸ 60 000+ patient coverage ▸

    Elderly patients (65 years old and older) ▸ 4 months’ data, 12 000 inpatient records ▸ Training on 8 000 records, testing on 4 000 records ▸ ICD10 Diagnosis Codes, Episode count, bed day count
  2. METHODOLOGY ▸ 4 months’ data, across all hospitals in coverage

    area ▸ Using a gradient-boosted regression tree in R ▸ Removed low frequency occurrences of diagnosis codes (droplevels function) ▸ Excluded inpatients who were treated but not admitted
  3. LEARNING POINTS ▸ levels - gbm can only handle up

    to 1028 factor levels ▸ NHS diagnostic codes ▸ mortality ▸ variable rankings ▸ consultants played a part too
  4. RESULTS ▸ AUC ~ 85% to 89% on outcome: likely

    to stay in bed > 2 days ▸ Model built on 12 000 records, trained on 8000, tested on 4000. ▸ AUC reduced as model run on 3, 4 and more bed-day thresholds ▸ AUC of about 75% achieved at 18 days. Smaller proportion of outcome variables compromised modeling, as expected.
  5. RESULTS ▸ Age, Gender, days since last admission not influential

    ▸ (knowing what I know now, perhaps a scaling of age and days might help)
  6. DISCUSSION ▸ Mortality: how many of the patients vacated a

    bed because they died? ▸ There’s an odd twist in the data: in between recovering enough to go home or dying, the survivors are the ones we are identifying. ▸ What are the operational procedures we can implement to decrease stays in bed?