Slide 1

Slide 1 text

Automating Sepsis Diagnosis Stephen Thomas
 BMED 6517 - Spring 2019

Slide 2

Slide 2 text

40 features 8 vital signs 26 lab results 6 dem ographic values 40,336 patients sepsis label 70% missing data 2932 sepsis positive patients 8 to 336 hourly measurements

Slide 3

Slide 3 text

Data Preparation 40 features 8 vital signs 26 lab results 6 dem ographic values 40,336 patients sepsis label 70% missing data 2932 sepsis positive patients 8 to 336 hourly measurements 12,237 patients 28,099 patients 2064 sepsis positive patients 30% Holdout For Testing Impute Temporal Data by Linear Interpolation/ Extrapolation Balance Training Data by Undersampling Augment Temporal Data with Rates of Change Normalize and Zero-Fill 34 tem poral variables velocity acceleration Δ(X) Δ²(X) Training Set: 199,686 observations × 108 features Test Set: 469,332 observations × 108 features 19,669 sepsis positive labels

Slide 4

Slide 4 text

30% Holdout 40 features 8 vital signs 26 lab results 6 dem ographic values 40,336 patients sepsis label 70% missing data 2932 sepsis positive patients 8 to 336 hourly measurements 12,237 patients 28,099 patients 2064 sepsis positive patients

Slide 5

Slide 5 text

Impute Temporal Data 12,237 patients 28,099 patients 2064 sepsis positive patients Linear Interpolation & Extrapolation

Slide 6

Slide 6 text

Balance Training Data Undersample sepsis negative 26,035 → 2064

Slide 7

Slide 7 text

Capture Temporal History 34 tem poral variables velocity acceleration Δ(X) Δ²(X)

Slide 8

Slide 8 text

Normalize / Zero-Fill Training Set: 199,686 observations × 108 features Test Set: 469,332 observations × 108 features 19,669 sepsis positive labels

Slide 9

Slide 9 text

Is Data Separable?

Slide 10

Slide 10 text

Is Data Separable?

Slide 11

Slide 11 text

LSTM Network Results Sepsis Negative Sepsis Positive Predicted Class True Class 297 1172 571 10197 0 0.2 0.4 0.6 0.8 1 False prediction rate 0 0.2 0.4 0.6 0.8 1 True prediction rate Sepsis Negative Sepsis Positive

Slide 12

Slide 12 text

Results • All models very sensitive to overfitting • Problem is worthy of computational challenge Utility Notes Support Vector Machine 0.00 Gaussian kernel, other hyper-parameters had minimal effect Random Forest 0.02 Large leaf size and minimal splits to minimize overfitting Long Short-Term Memory Recurrent Neural Network 0.35 Limited epochs and relatively high learning rate to minimize overfitting