Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Machine Learning to Predict Mortality and Critical Events : Model Development and Validation

harunashi
February 26, 2021

Machine Learning to Predict Mortality and Critical Events : Model Development and Validation

新型コロナウイルスによる死亡、重大イベントの予測をするXGboostモデルの作成など。多施設合同前向きで検証までやっているところがすごい。

各項目書き方が丁寧で、読んでいて非常に勉強になりました。

特徴重要度を得るSHAPについても紹介してみました。(複数記事のまとめなおしです)
Machine Learning to Predict Mortality and Critical Events in a Cohort of Patients With COVID-19 in New York City: Model Development and Validation

Published on 06.11.20
doi:10.2196/24018
Vaid A, Somani S, Russak AJ, De Freitas JK, Chaudhry FF, Paranjpe I, Johnson KW, Lee SJ, Miotto R, Richter F, Zhao S, Beckmann ND, Naik N, Kia A, Timsina P, Lala A, Paranjpe M, Golden E, Danieletto M, Singh M, Meyer D, O'Reilly PF, Huckins L, Kovatch P, Finkelstein J, Freeman RM, Argulian E, Kasarskis A, Percha B, Aberg JA, Bagiella E, Horowitz CR, Murphy B, Nestler EJ, Schadt EE, Cho JH, Cordon-Cardo C, Fuster V, Charney DS, Reich DL, Bottinger EP, Levin MA, Narula J, Fayad ZA, Just AC, Charney AW, Nadkarni GN, Glicksberg BS

harunashi

February 26, 2021
Tweet

More Decks by harunashi

Other Decks in Science

Transcript

  1. Machine Learning to Predict Mortality and
    Critical Events in a Cohort of Patients With
    COVID-19 in New York City: Model
    Development and Validation
    ニューヨーク市におけるCOVID-19入院患者の
    死亡、重大イベント予測モデルの開発、その検証

    View full-size slide

  2. 論文
    Machine Learning to Predict Mortality and Critical Events in a
    Cohort of Patients With COVID-19 in New York City: Model
    Development and Validation
    Published on 06.11.20
    doi:10.2196/24018
    いろんなことがちゃんと書かれている、きっちりやってる

    View full-size slide

  3. Contents
    • Introduction
    • Method
    • Result1
    • Introduction:SHAP
    • Result2

    View full-size slide

  4. Introduction
    • however, efforts have been limited by small sample sizes,
    lack of generalization to diverse populations, disparities in
    feature missingness, and potential for bias.
    →Prediction models for diagnosis and prognosis of covid-19:
    systematic review and critical appraisal
    doi: https://doi.org/10.1136/bmj.m1328

    View full-size slide

  5. Introduction
    Systematic review of covid-19 prediction models
    • 421 titles were screened, and 169 studies describing 232 prediction
    models were included.
    • This review indicates that almost all published prediction models
    are poorly reported, and at high risk of bias such that their reported
    predictive performance is probably optimistic.

    View full-size slide

  6. Introduction
    Systematic review of covid-19 prediction models

    View full-size slide

  7. Methods
    • Study Design

    View full-size slide

  8. Methods
    • Study Design
    2020/3/15 2020/5/1 2020/5/22
    Retrospective Prospective
    train MSH n=1514
    val OH n=2201
    val MSH n=175
    val OH n=208
    MSH : Mount Sinai Hospital
    OH : Other Hospitals
    data
    train Retrospective MSH
    val 1(内部検証) Retrospective MSH
    val 2 Retrospective OH
    val 3 Prospective MSH
    val 4 Prospective OH

    View full-size slide

  9. Methods
    • Study Population

    View full-size slide

  10. Methods
    • Study Data
    • the first laboratory value in a 36-hour window period was used as
    the representative laboratory value on admission.
    • data below the 0.5th percentile and above the 99.5th percentile
    were removed

    View full-size slide

  11. Methods
    • Definition of Outcomes
    • death versus survival or discharge through time horizons of 3, 5, 7,
    and 10 days.
    • Critical illness was defined as discharge to hospice, intubation ≤48
    hours prior to intensive care unit (ICU) admission, ICU admission, or
    death.

    View full-size slide

  12. Methods
    • Model Development, Selection, and Experimentation
    • primary model was the Extreme Gradient Boosting (XGBoost)
    • Hyperparameter tuning was performed by randomized grid
    searching directed toward maximizing the F1 score metric over
    5000 discrete grid options
    • Ten-fold stratified cross-validation was performed
    • To generate confidence intervals for the internal validation set,
    training and testing was performed for 500 bootstrap iterations
    with a unique randomly generated seed for the train-test data splits.

    View full-size slide

  13. Methods
    • Model Development, Selection, and Experimentation
    • we generated two predictive models as a baseline, namely logistic
    regression (LR) and LR with L1 regularization(LASSO)
    • Features with >30% missingness were dropped, and k-nearest
    neighbors (kNN, k=5) was used to impute missing data

    View full-size slide

  14. Result
    • Features
    • 欠測99%とかもある。

    View full-size slide

  15. Result
    Outcome Proportion
    mortality
    EXPERIMENT N POSITIVE
    OUTCOMES
    OUTCOME
    PROPORTION
    MSH 1514 40~182 0.026~0.121
    OH 2201 135~494 0.061~0.224
    PROSPECTIVE
    MSH
    175 2~8 0.011~0.054
    PROSPECTIVE
    OH
    208 3~15 0.014~0.08

    View full-size slide

  16. Result
    Outcome Proportion
    critical event
    EXPERIMENT N POSITIVE
    OUTCOMES
    OUTCOME
    PROPORTION
    MSH 1514 322~496 0.213~0.329
    OH 2201 414~777 0.188~0.353
    PROSPECTIVE
    MSH
    175 25~28 0.143~0.188
    PROSPECTIVE
    OH
    208 34~41 0.163~0.219

    View full-size slide

  17. Result
    • unimputed XGBoost model
    Mortality
    • Prospective validation at MSH presented a new set of challenges
    for all the models because of the generally lower number of
    outcomes and larger class imbalance for mortality prediction for
    the shorter time intervals.
    Validation AUC-ROC
    MSH > MSH 0.84~0.90
    MSH > OH 0.84~0.88
    MSH > PROSPECTIVE MSH 0.85~0.96
    MSH > PROSPECTIVE OH 0.68~0.88

    View full-size slide

  18. Result
    • unimputed XGBoost model
    • critical event
    Validation AUC-ROC
    MSH > MSH 0.79~0.81
    MSH > OH 0.78~0.81
    MSH > PROSPECTIVE MSH 0.72~0.78
    MSH > PROSPECTIVE OH 0.74~0.77

    View full-size slide

  19. Break
    • ひと段落。
    • 予測モデルができてうれしい。
    • 前向き研究でも妥当性があってよい感じ。
    • でも、何を根拠に予測しているのか気になる。

    View full-size slide

  20. Model Feature Importance
    SHAP
    SHapley Additive exPlanations

    View full-size slide

  21. SHAP
    • 機械学習が予測した結果に対して、各説明変数が出力に対して
    どれだけ影響を及ぼしたかを算出する方法

    View full-size slide

  22. SHAP
    • ぜひ元記事参照を…
    • https://logmi.jp/tech/articles/322738
    • https://christophm.github.io/interpretable-ml-book/
    • 高校生からのゲーム理論

    View full-size slide

  23. 予測値の平均と変数の影響
    • 「機械学習が出力する予測値すべての平均」と「あるデータに
    対する予測値」の差は、各説明変数の影響に分解できる。
    年収予測モデル
    A B ・・・ Z
    300万 1000万 ・・・ 500万
    平均値=450万
    平均値と予測の差=説明変数の影響
    -150万 +550万
    +50万

    View full-size slide

  24. 予測の平均値と予測の差は何か?
    予測の平均=450万
    Bさんの年収=1000万
    450万
    0 1000万
    年齢 資格
    役職
    転職

    View full-size slide

  25. SHAP & game theory
    • 協力ゲーム理論
    • 複数のプレイヤーが参加するゲームで、スコアをどのように
    分配するべきか?
    • 機械学習に応用すると、
    「説明変数が予測にどれだけ影響したか?」がわかる。

    View full-size slide

  26. 協力ゲーム理論
    A君、B君、C君で働くと利益24万が発生する
    どう分割するべきか?

    View full-size slide

  27. 限界貢献度
    • 1人でも、2人でも、3人でも働ける
    • 利益は単純な足し算にならない
    参加者 報酬額
    A君 6万
    B君 4万
    C君 2万
    A+B君 20万
    A+C君 15万
    B+C君 10万
    A+B+C君 24万

    View full-size slide

  28. 限界貢献度
    • 参加者による報酬増額分を求める
    • 報酬増額分は誰が参加しているかに依存する
    =報酬は参加する順番に依存する
    参加パターン 報酬総額 A参加による最大増額=限界貢献度
    誰もいない→A君 6万 6万
    Bだけ→A+B 20万 16万
    Cだけ→A+C 15万 13万
    BとC→A+B+C 24万 14万
    参加者 報酬額
    A君 6万
    B君 4万
    C君 2万
    A+B君 20万
    A+C君 15万
    B+C君 10万
    A+B+C君 24万

    View full-size slide

  29. SHAP Value
    • 参加順の影響を打ち消したい
    →すべての順番で限界貢献度を求め平均=SHAP Value
    参加順 Aの限界貢献度 Bの限界貢献度 Cの限界貢献度
    A→B→C 6万 14万 4万
    A→C→B 6万 9万 9万
    B→A→C 16万 4万 4万
    B→C→A 14万 4万 6万
    C→A→B 13万 9万 2万
    C→B→A 14万 8万 2万
    SHAP Value 11.5万 8万 4.5万

    View full-size slide

  30. ゲーム理論の応用
    • 各変数の予測値に対するSHAP Valueを求めたい
    • 予測値に対する各変数の限界貢献度をどう求めるか?
    • 特徴量jが入った時と入っていないときの予測値の差を
    限界貢献度とする
    • 「特徴量jがわからないときの予測値」は期待値で計算

    View full-size slide

  31. 機械学習への応用
    • 変数をひとつずつ追加して各変数の限界貢献度を求める
    • 特徴量は(X
    1
    ,X
    2
    ,X
    3
    )の3つ
    あるデータは(x
    1
    ,x
    2
    ,x
    3
    )という値を持つ

    View full-size slide

  32. 各変数ごとの限界貢献度を求める
    ・なにもわからない(X
    1
    ,X
    2
    ,X
    3
    )→予測値の期待値
    ・ x
    1
    を代入→予測値の増加分= X
    1
    の限界貢献度=
    ・ x
    2
    を代入→予測値の増加分= X
    2
    の限界貢献度=
    ・ x
    3
    を代入→予測値の増加分= X
    3
    の限界貢献度=
    Δ𝑥1
    Δ𝑥2
    Δ𝑥3

    View full-size slide

  33. 各変数ごとのSHAP Valueを求める
    𝐸[𝑓 𝑥1
    , 𝑥2
    , 𝑋3
    ]
    𝐸[𝑓 𝑥1
    , 𝑥2
    , 𝑥3
    ]
    E[𝑓 𝑋1
    , 𝑋2
    , 𝑋3
    ]
    𝐸[𝑓 𝑥1
    , 𝑋2
    , 𝑋3
    ]
    Δ𝑥1
    Δ𝑥2
    Δ𝑥3
    • 限界貢献度は代入する順番で変わるので
    すべての順序で計算して平均する

    View full-size slide

  34. Result
    • Model Feature Importance
    • For mortality, both high and low values for age, anion gap, C-
    reactive protein, and LDH
    • For critical event prediction, the presence of acute kidney injury
    and both high and low levels of lactate dehydrogenase (LDH),
    respiratory rate, and glucose were strong drivers
    • It is encouraging that many of the features with high importance in
    the primary XGBoost model were also prioritized in the LASSO
    classifier, suggesting the robustness of the predictive ability of
    these features.

    View full-size slide

  35. Result
    • mortality

    View full-size slide

  36. Result
    • critical event

    View full-size slide

  37. Discussion
    • Along these lines, we found that our imputation strategy
    generally hindered the performance of the XGBoost model.
    • this corroboration of the features learned by XGBoost and
    highlighted by the SHAP analysis with the findings from
    pathophysiological principles and more recent correlative
    studies exploring patients with COVID-19 gives additional
    credibility to these findings.

    View full-size slide

  38. Limitations
    • Although the restriction of using data at admission
    encourages the use of this model in patient triage, events
    during a patient’s hospital stay after admission may drive
    their clinical course away from the prior probability, which
    cannot be captured by baseline admission features.
    • patients admitted to the hospital later in the crisis benefited
    from improved patient care protocols from experiential
    learning … which is demonstrated by the lower critical
    event and mortality rate in the prospective validation data
    set

    View full-size slide

  39. Limitations
    • all five hospitals operate in a single health system, system-
    wide protocols in laboratory order sets and management
    protocols were an additional source of bias that may lower
    external validity.
    • notable drawback is its bias toward continuous features
    instead of categorical ones

    View full-size slide

  40. Others
    • Transparent reporting of a multivariable prediction model for
    individual prognosis or diagnosis (TRIPOD): the TRIPOD
    Statement
    https://bmcmedicine.biomedcentral.com/articles/10.1186/s12916-014-0241-z
    • Bias in random forest variable importance measures:
    Illustrations, sources and a solution
    https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-8-25

    View full-size slide