$30 off During Our Annual Pro Sale. View Details »

Machine Learning to Predict Mortality and Critical Events : Model Development and Validation

February 26, 2021

Machine Learning to Predict Mortality and Critical Events : Model Development and Validation



Machine Learning to Predict Mortality and Critical Events in a Cohort of Patients With COVID-19 in New York City: Model Development and Validation

Published on 06.11.20
Vaid A, Somani S, Russak AJ, De Freitas JK, Chaudhry FF, Paranjpe I, Johnson KW, Lee SJ, Miotto R, Richter F, Zhao S, Beckmann ND, Naik N, Kia A, Timsina P, Lala A, Paranjpe M, Golden E, Danieletto M, Singh M, Meyer D, O'Reilly PF, Huckins L, Kovatch P, Finkelstein J, Freeman RM, Argulian E, Kasarskis A, Percha B, Aberg JA, Bagiella E, Horowitz CR, Murphy B, Nestler EJ, Schadt EE, Cho JH, Cordon-Cardo C, Fuster V, Charney DS, Reich DL, Bottinger EP, Levin MA, Narula J, Fayad ZA, Just AC, Charney AW, Nadkarni GN, Glicksberg BS


February 26, 2021

More Decks by harunashi

Other Decks in Science


  1. Machine Learning to Predict Mortality and Critical Events in a

    Cohort of Patients With COVID-19 in New York City: Model Development and Validation ニューヨーク市におけるCOVID-19入院患者の 死亡、重大イベント予測モデルの開発、その検証
  2. 論文 Machine Learning to Predict Mortality and Critical Events in

    a Cohort of Patients With COVID-19 in New York City: Model Development and Validation Published on 06.11.20 doi:10.2196/24018 いろんなことがちゃんと書かれている、きっちりやってる
  3. Contents • Introduction • Method • Result1 • Introduction:SHAP •

  4. Introduction • however, efforts have been limited by small sample

    sizes, lack of generalization to diverse populations, disparities in feature missingness, and potential for bias. →Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal doi: https://doi.org/10.1136/bmj.m1328
  5. Introduction Systematic review of covid-19 prediction models • 421 titles

    were screened, and 169 studies describing 232 prediction models were included. • This review indicates that almost all published prediction models are poorly reported, and at high risk of bias such that their reported predictive performance is probably optimistic.
  6. Introduction Systematic review of covid-19 prediction models

  7. Methods • Study Design

  8. Methods • Study Design 2020/3/15 2020/5/1 2020/5/22 Retrospective Prospective train

    MSH n=1514 val OH n=2201 val MSH n=175 val OH n=208 MSH : Mount Sinai Hospital OH : Other Hospitals data train Retrospective MSH val 1(内部検証) Retrospective MSH val 2 Retrospective OH val 3 Prospective MSH val 4 Prospective OH
  9. Methods • Study Population

  10. Methods • Study Data • the first laboratory value in

    a 36-hour window period was used as the representative laboratory value on admission. • data below the 0.5th percentile and above the 99.5th percentile were removed
  11. Methods • Definition of Outcomes • death versus survival or

    discharge through time horizons of 3, 5, 7, and 10 days. • Critical illness was defined as discharge to hospice, intubation ≤48 hours prior to intensive care unit (ICU) admission, ICU admission, or death.
  12. Methods • Model Development, Selection, and Experimentation • primary model

    was the Extreme Gradient Boosting (XGBoost) • Hyperparameter tuning was performed by randomized grid searching directed toward maximizing the F1 score metric over 5000 discrete grid options • Ten-fold stratified cross-validation was performed • To generate confidence intervals for the internal validation set, training and testing was performed for 500 bootstrap iterations with a unique randomly generated seed for the train-test data splits.
  13. Methods • Model Development, Selection, and Experimentation • we generated

    two predictive models as a baseline, namely logistic regression (LR) and LR with L1 regularization(LASSO) • Features with >30% missingness were dropped, and k-nearest neighbors (kNN, k=5) was used to impute missing data
  14. Result • Features • 欠測99%とかもある。


    MSH 1514 40~182 0.026~0.121 OH 2201 135~494 0.061~0.224 PROSPECTIVE MSH 175 2~8 0.011~0.054 PROSPECTIVE OH 208 3~15 0.014~0.08
  16. Result Outcome Proportion critical event EXPERIMENT N POSITIVE OUTCOMES OUTCOME

    PROPORTION MSH 1514 322~496 0.213~0.329 OH 2201 414~777 0.188~0.353 PROSPECTIVE MSH 175 25~28 0.143~0.188 PROSPECTIVE OH 208 34~41 0.163~0.219
  17. Result

  18. Result • unimputed XGBoost model Mortality • Prospective validation at

    MSH presented a new set of challenges for all the models because of the generally lower number of outcomes and larger class imbalance for mortality prediction for the shorter time intervals. Validation AUC-ROC MSH > MSH 0.84~0.90 MSH > OH 0.84~0.88 MSH > PROSPECTIVE MSH 0.85~0.96 MSH > PROSPECTIVE OH 0.68~0.88
  19. Result • unimputed XGBoost model • critical event Validation AUC-ROC

    MSH > MSH 0.79~0.81 MSH > OH 0.78~0.81 MSH > PROSPECTIVE MSH 0.72~0.78 MSH > PROSPECTIVE OH 0.74~0.77
  20. Break • ひと段落。 • 予測モデルができてうれしい。 • 前向き研究でも妥当性があってよい感じ。 • でも、何を根拠に予測しているのか気になる。

  21. Model Feature Importance SHAP SHapley Additive exPlanations

  22. SHAP • 機械学習が予測した結果に対して、各説明変数が出力に対して どれだけ影響を及ぼしたかを算出する方法

  23. SHAP • ぜひ元記事参照を… • https://logmi.jp/tech/articles/322738 • https://christophm.github.io/interpretable-ml-book/ • 高校生からのゲーム理論

  24. 予測値の平均と変数の影響 • 「機械学習が出力する予測値すべての平均」と「あるデータに 対する予測値」の差は、各説明変数の影響に分解できる。 年収予測モデル A B ・・・ Z 300万

    1000万 ・・・ 500万 平均値=450万 平均値と予測の差=説明変数の影響 -150万 +550万 +50万
  25. 予測の平均値と予測の差は何か? 予測の平均=450万 Bさんの年収=1000万 450万 0 1000万 年齢 資格 役職 転職

  26. SHAP & game theory • 協力ゲーム理論 • 複数のプレイヤーが参加するゲームで、スコアをどのように 分配するべきか? •

    機械学習に応用すると、 「説明変数が予測にどれだけ影響したか?」がわかる。
  27. 協力ゲーム理論 A君、B君、C君で働くと利益24万が発生する どう分割するべきか?

  28. 限界貢献度 • 1人でも、2人でも、3人でも働ける • 利益は単純な足し算にならない 参加者 報酬額 A君 6万 B君

    4万 C君 2万 A+B君 20万 A+C君 15万 B+C君 10万 A+B+C君 24万
  29. 限界貢献度 • 参加者による報酬増額分を求める • 報酬増額分は誰が参加しているかに依存する =報酬は参加する順番に依存する 参加パターン 報酬総額 A参加による最大増額=限界貢献度 誰もいない→A君

    6万 6万 Bだけ→A+B 20万 16万 Cだけ→A+C 15万 13万 BとC→A+B+C 24万 14万 参加者 報酬額 A君 6万 B君 4万 C君 2万 A+B君 20万 A+C君 15万 B+C君 10万 A+B+C君 24万
  30. SHAP Value • 参加順の影響を打ち消したい →すべての順番で限界貢献度を求め平均=SHAP Value 参加順 Aの限界貢献度 Bの限界貢献度 Cの限界貢献度

    A→B→C 6万 14万 4万 A→C→B 6万 9万 9万 B→A→C 16万 4万 4万 B→C→A 14万 4万 6万 C→A→B 13万 9万 2万 C→B→A 14万 8万 2万 SHAP Value 11.5万 8万 4.5万
  31. ゲーム理論の応用 • 各変数の予測値に対するSHAP Valueを求めたい • 予測値に対する各変数の限界貢献度をどう求めるか? • 特徴量jが入った時と入っていないときの予測値の差を 限界貢献度とする •

  32. 機械学習への応用 • 変数をひとつずつ追加して各変数の限界貢献度を求める • 特徴量は(X 1 ,X 2 ,X 3

    )の3つ あるデータは(x 1 ,x 2 ,x 3 )という値を持つ
  33. 各変数ごとの限界貢献度を求める ・なにもわからない(X 1 ,X 2 ,X 3 )→予測値の期待値 ・ x

    1 を代入→予測値の増加分= X 1 の限界貢献度= ・ x 2 を代入→予測値の増加分= X 2 の限界貢献度= ・ x 3 を代入→予測値の増加分= X 3 の限界貢献度= Δ𝑥1 Δ𝑥2 Δ𝑥3
  34. 各変数ごとのSHAP Valueを求める 𝐸[𝑓 𝑥1 , 𝑥2 , 𝑋3 ] 𝐸[𝑓

    𝑥1 , 𝑥2 , 𝑥3 ] E[𝑓 𝑋1 , 𝑋2 , 𝑋3 ] 𝐸[𝑓 𝑥1 , 𝑋2 , 𝑋3 ] Δ𝑥1 Δ𝑥2 Δ𝑥3 • 限界貢献度は代入する順番で変わるので すべての順序で計算して平均する
  35. Result • Model Feature Importance • For mortality, both high

    and low values for age, anion gap, C- reactive protein, and LDH • For critical event prediction, the presence of acute kidney injury and both high and low levels of lactate dehydrogenase (LDH), respiratory rate, and glucose were strong drivers • It is encouraging that many of the features with high importance in the primary XGBoost model were also prioritized in the LASSO classifier, suggesting the robustness of the predictive ability of these features.
  36. Result • mortality

  37. Result • critical event

  38. Discussion • Along these lines, we found that our imputation

    strategy generally hindered the performance of the XGBoost model. • this corroboration of the features learned by XGBoost and highlighted by the SHAP analysis with the findings from pathophysiological principles and more recent correlative studies exploring patients with COVID-19 gives additional credibility to these findings.
  39. Limitations • Although the restriction of using data at admission

    encourages the use of this model in patient triage, events during a patient’s hospital stay after admission may drive their clinical course away from the prior probability, which cannot be captured by baseline admission features. • patients admitted to the hospital later in the crisis benefited from improved patient care protocols from experiential learning … which is demonstrated by the lower critical event and mortality rate in the prospective validation data set
  40. Limitations • all five hospitals operate in a single health

    system, system- wide protocols in laboratory order sets and management protocols were an additional source of bias that may lower external validity. • notable drawback is its bias toward continuous features instead of categorical ones
  41. Others • Transparent reporting of a multivariable prediction model for

    individual prognosis or diagnosis (TRIPOD): the TRIPOD Statement https://bmcmedicine.biomedcentral.com/articles/10.1186/s12916-014-0241-z • Bias in random forest variable importance measures: Illustrations, sources and a solution https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-8-25