Slide 1

Slide 1 text

Machine Learning to Predict Mortality and Critical Events in a Cohort of Patients With COVID-19 in New York City: Model Development and Validation ニューヨーク市におけるCOVID-19入院患者の 死亡、重大イベント予測モデルの開発、その検証

Slide 2

Slide 2 text

論文 Machine Learning to Predict Mortality and Critical Events in a Cohort of Patients With COVID-19 in New York City: Model Development and Validation Published on 06.11.20 doi:10.2196/24018 いろんなことがちゃんと書かれている、きっちりやってる

Slide 3

Slide 3 text

Contents • Introduction • Method • Result1 • Introduction:SHAP • Result2

Slide 4

Slide 4 text

Introduction • however, efforts have been limited by small sample sizes, lack of generalization to diverse populations, disparities in feature missingness, and potential for bias. →Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal doi: https://doi.org/10.1136/bmj.m1328

Slide 5

Slide 5 text

Introduction Systematic review of covid-19 prediction models • 421 titles were screened, and 169 studies describing 232 prediction models were included. • This review indicates that almost all published prediction models are poorly reported, and at high risk of bias such that their reported predictive performance is probably optimistic.

Slide 6

Slide 6 text

Introduction Systematic review of covid-19 prediction models

Slide 7

Slide 7 text

Methods • Study Design

Slide 8

Slide 8 text

Methods • Study Design 2020/3/15 2020/5/1 2020/5/22 Retrospective Prospective train MSH n=1514 val OH n=2201 val MSH n=175 val OH n=208 MSH : Mount Sinai Hospital OH : Other Hospitals data train Retrospective MSH val 1(内部検証) Retrospective MSH val 2 Retrospective OH val 3 Prospective MSH val 4 Prospective OH

Slide 9

Slide 9 text

Methods • Study Population

Slide 10

Slide 10 text

Methods • Study Data • the first laboratory value in a 36-hour window period was used as the representative laboratory value on admission. • data below the 0.5th percentile and above the 99.5th percentile were removed

Slide 11

Slide 11 text

Methods • Definition of Outcomes • death versus survival or discharge through time horizons of 3, 5, 7, and 10 days. • Critical illness was defined as discharge to hospice, intubation ≤48 hours prior to intensive care unit (ICU) admission, ICU admission, or death.

Slide 12

Slide 12 text

Methods • Model Development, Selection, and Experimentation • primary model was the Extreme Gradient Boosting (XGBoost) • Hyperparameter tuning was performed by randomized grid searching directed toward maximizing the F1 score metric over 5000 discrete grid options • Ten-fold stratified cross-validation was performed • To generate confidence intervals for the internal validation set, training and testing was performed for 500 bootstrap iterations with a unique randomly generated seed for the train-test data splits.

Slide 13

Slide 13 text

Methods • Model Development, Selection, and Experimentation • we generated two predictive models as a baseline, namely logistic regression (LR) and LR with L1 regularization(LASSO) • Features with >30% missingness were dropped, and k-nearest neighbors (kNN, k=5) was used to impute missing data

Slide 14

Slide 14 text

Result • Features • 欠測99%とかもある。

Slide 15

Slide 15 text

Result Outcome Proportion mortality EXPERIMENT N POSITIVE OUTCOMES OUTCOME PROPORTION MSH 1514 40~182 0.026~0.121 OH 2201 135~494 0.061~0.224 PROSPECTIVE MSH 175 2~8 0.011~0.054 PROSPECTIVE OH 208 3~15 0.014~0.08

Slide 16

Slide 16 text

Result Outcome Proportion critical event EXPERIMENT N POSITIVE OUTCOMES OUTCOME PROPORTION MSH 1514 322~496 0.213~0.329 OH 2201 414~777 0.188~0.353 PROSPECTIVE MSH 175 25~28 0.143~0.188 PROSPECTIVE OH 208 34~41 0.163~0.219

Slide 17

Slide 17 text

Result

Slide 18

Slide 18 text

Result • unimputed XGBoost model Mortality • Prospective validation at MSH presented a new set of challenges for all the models because of the generally lower number of outcomes and larger class imbalance for mortality prediction for the shorter time intervals. Validation AUC-ROC MSH > MSH 0.84~0.90 MSH > OH 0.84~0.88 MSH > PROSPECTIVE MSH 0.85~0.96 MSH > PROSPECTIVE OH 0.68~0.88

Slide 19

Slide 19 text

Result • unimputed XGBoost model • critical event Validation AUC-ROC MSH > MSH 0.79~0.81 MSH > OH 0.78~0.81 MSH > PROSPECTIVE MSH 0.72~0.78 MSH > PROSPECTIVE OH 0.74~0.77

Slide 20

Slide 20 text

Break • ひと段落。 • 予測モデルができてうれしい。 • 前向き研究でも妥当性があってよい感じ。 • でも、何を根拠に予測しているのか気になる。

Slide 21

Slide 21 text

Model Feature Importance SHAP SHapley Additive exPlanations

Slide 22

Slide 22 text

SHAP • 機械学習が予測した結果に対して、各説明変数が出力に対して どれだけ影響を及ぼしたかを算出する方法

Slide 23

Slide 23 text

SHAP • ぜひ元記事参照を… • https://logmi.jp/tech/articles/322738 • https://christophm.github.io/interpretable-ml-book/ • 高校生からのゲーム理論

Slide 24

Slide 24 text

予測値の平均と変数の影響 • 「機械学習が出力する予測値すべての平均」と「あるデータに 対する予測値」の差は、各説明変数の影響に分解できる。 年収予測モデル A B ・・・ Z 300万 1000万 ・・・ 500万 平均値=450万 平均値と予測の差=説明変数の影響 -150万 +550万 +50万

Slide 25

Slide 25 text

予測の平均値と予測の差は何か? 予測の平均=450万 Bさんの年収=1000万 450万 0 1000万 年齢 資格 役職 転職

Slide 26

Slide 26 text

SHAP & game theory • 協力ゲーム理論 • 複数のプレイヤーが参加するゲームで、スコアをどのように 分配するべきか? • 機械学習に応用すると、 「説明変数が予測にどれだけ影響したか?」がわかる。

Slide 27

Slide 27 text

協力ゲーム理論 A君、B君、C君で働くと利益24万が発生する どう分割するべきか?

Slide 28

Slide 28 text

限界貢献度 • 1人でも、2人でも、3人でも働ける • 利益は単純な足し算にならない 参加者 報酬額 A君 6万 B君 4万 C君 2万 A+B君 20万 A+C君 15万 B+C君 10万 A+B+C君 24万

Slide 29

Slide 29 text

限界貢献度 • 参加者による報酬増額分を求める • 報酬増額分は誰が参加しているかに依存する =報酬は参加する順番に依存する 参加パターン 報酬総額 A参加による最大増額=限界貢献度 誰もいない→A君 6万 6万 Bだけ→A+B 20万 16万 Cだけ→A+C 15万 13万 BとC→A+B+C 24万 14万 参加者 報酬額 A君 6万 B君 4万 C君 2万 A+B君 20万 A+C君 15万 B+C君 10万 A+B+C君 24万

Slide 30

Slide 30 text

SHAP Value • 参加順の影響を打ち消したい →すべての順番で限界貢献度を求め平均=SHAP Value 参加順 Aの限界貢献度 Bの限界貢献度 Cの限界貢献度 A→B→C 6万 14万 4万 A→C→B 6万 9万 9万 B→A→C 16万 4万 4万 B→C→A 14万 4万 6万 C→A→B 13万 9万 2万 C→B→A 14万 8万 2万 SHAP Value 11.5万 8万 4.5万

Slide 31

Slide 31 text

ゲーム理論の応用 • 各変数の予測値に対するSHAP Valueを求めたい • 予測値に対する各変数の限界貢献度をどう求めるか? • 特徴量jが入った時と入っていないときの予測値の差を 限界貢献度とする • 「特徴量jがわからないときの予測値」は期待値で計算

Slide 32

Slide 32 text

機械学習への応用 • 変数をひとつずつ追加して各変数の限界貢献度を求める • 特徴量は(X 1 ,X 2 ,X 3 )の3つ あるデータは(x 1 ,x 2 ,x 3 )という値を持つ

Slide 33

Slide 33 text

各変数ごとの限界貢献度を求める ・なにもわからない(X 1 ,X 2 ,X 3 )→予測値の期待値 ・ x 1 を代入→予測値の増加分= X 1 の限界貢献度= ・ x 2 を代入→予測値の増加分= X 2 の限界貢献度= ・ x 3 を代入→予測値の増加分= X 3 の限界貢献度= Δ𝑥1 Δ𝑥2 Δ𝑥3

Slide 34

Slide 34 text

各変数ごとのSHAP Valueを求める 𝐸[𝑓 𝑥1 , 𝑥2 , 𝑋3 ] 𝐸[𝑓 𝑥1 , 𝑥2 , 𝑥3 ] E[𝑓 𝑋1 , 𝑋2 , 𝑋3 ] 𝐸[𝑓 𝑥1 , 𝑋2 , 𝑋3 ] Δ𝑥1 Δ𝑥2 Δ𝑥3 • 限界貢献度は代入する順番で変わるので すべての順序で計算して平均する

Slide 35

Slide 35 text

Result • Model Feature Importance • For mortality, both high and low values for age, anion gap, C- reactive protein, and LDH • For critical event prediction, the presence of acute kidney injury and both high and low levels of lactate dehydrogenase (LDH), respiratory rate, and glucose were strong drivers • It is encouraging that many of the features with high importance in the primary XGBoost model were also prioritized in the LASSO classifier, suggesting the robustness of the predictive ability of these features.

Slide 36

Slide 36 text

Result • mortality

Slide 37

Slide 37 text

Result • critical event

Slide 38

Slide 38 text

Discussion • Along these lines, we found that our imputation strategy generally hindered the performance of the XGBoost model. • this corroboration of the features learned by XGBoost and highlighted by the SHAP analysis with the findings from pathophysiological principles and more recent correlative studies exploring patients with COVID-19 gives additional credibility to these findings.

Slide 39

Slide 39 text

Limitations • Although the restriction of using data at admission encourages the use of this model in patient triage, events during a patient’s hospital stay after admission may drive their clinical course away from the prior probability, which cannot be captured by baseline admission features. • patients admitted to the hospital later in the crisis benefited from improved patient care protocols from experiential learning … which is demonstrated by the lower critical event and mortality rate in the prospective validation data set

Slide 40

Slide 40 text

Limitations • all five hospitals operate in a single health system, system- wide protocols in laboratory order sets and management protocols were an additional source of bias that may lower external validity. • notable drawback is its bias toward continuous features instead of categorical ones

Slide 41

Slide 41 text

Others • Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD Statement https://bmcmedicine.biomedcentral.com/articles/10.1186/s12916-014-0241-z • Bias in random forest variable importance measures: Illustrations, sources and a solution https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-8-25