MERC #3, 4: PRIME論文

MERC #3, 4 Proposed Requirements for Cardiovascular Imaging-Related Machine Learning
Evaluation (PRIME): A Checklist @vin_tea01

論⽂ Sengupta PP, Shrestha S, Berthon B, et al. Proposed
Requirements for Cardiovascular Imaging-Related Machine Learning Evaluation (PRIME): A Checklist: Reviewed by the American College of Cardiology Healthcare Innovation Council. JACC: Cardiovascular Imaging. 2020;13(9):2017-2035. doi:10.1016/j.jcmg.2020.07.015

どういう論⽂？ • ⼼⾎管領域の画像に関する機械学習を使った研究が満たすべき事柄をまとめたチェックリスト • ⼼⾎管領域以外、画像処理以外のタスクでも幅広く参考になる

章⽴て元論⽂のCENTRAL ILLUSTRATIONより抜粋

Introduction • MLを⽤いた研究の諸問題 • リアルワールドデータ • ラベルの不均衡 • バイアスの混⼊、不適切な測定⽅法 •
再現可能性 • 外挿性 • 不⼗分な報告 • AIと機械学習の応⽤を標準化するために • データの準備 • モデル選択 • パフォーマンス評価

1. Designing the study plan

Reporting checklist 1. Describe the need for the application of
machine learning to the dataset 2. Describe the objectives of the machine learning analysis 3. Define the study plan 4. Describe the summary statistics of baseline data 5. Describe the overall steps of the machine learning workflow

そもそもMLを使うべきか？ • サンプルサイズが⼩さいか、⼤きくても無駄な変数が多いと overfittingしやすい • 変数が⽋けていたりモデルが簡単すぎるとunderfittingしやすい • モデルがシンプルなconventional statisticsはoverfittingはしにくいが複雑なデータに対してunderfittingしやすい
• MLは⾮構造化データの処理や特徴量選択、探索的データ分析により洞察を得るのに役⽴つ • 複雑なMLモデルは解釈可能性に難があり、因果推論に適⽤する際は注意が必要 • 因果推論にはPSマッチングなど伝統的な⼿法の⽅が適することもあるが、最近は因果推論のためのML⼿法も開発されている。

データを理解し描写する • データ形式は何か • Tabular • Images • time-series •
上記の組み合わせ • 集団を代表するデータが⼿に⼊るか • データ収集の⽅法にバイアスが含まれていないか • モデルの背景にある仮定は適切か • Learning curve • Bias and variance • Error analyses

プロセスの設計 • Inputとoutputの形式を決める • 画像に対するcroppingやCNNによる前処理 • アノテーション戦略

2. Data standardization, feature engineering, and learning

Reporting checklist 1. Describe how the data were processed in
order to make it clean, uniform, and consistent 2. Describe whether variables were normalized and if so, how this was done 3. Provide details on the fraction of missing values (if any) and imputation methods 4. Describe any feature selection processes applied 5. Identify and describe the process to handle outliers if any 6. Describe whether class imbalance existed, and which method was applied to deal with it

データフォーマット • 1⾏に1 observation, 1列に1変数を格納したN⾏M列の⾏列形式のデータ • N << M
のときwideデータ • N >> M のときtallデータ • 画像データの場合 • 画像全体の特徴をみようとするとN<<Mになりやすい • 画像の⼀部の特徴、またはDLによる要約などを⾏うとN>>Mになりやすい • 患者N⼈の512×512pixelの画像の場合N⾏512^2列

データ準備 • Cropping: 必要な部分だけ切り抜く • Resize: 画像データの⼤きさ(＝列の数)を揃える • Alignment: 解剖学的な部位と合わせる
• Noise removal • Contrastの強調: “histogram equalization” • 次元の削減: bullʼs eye(極座標表⽰) Bullʼs eyeはこちらを参照 http://www.kanazawa-heart.or.jp/disease/kensa_13.html

特徴量エンジニアリング • 次元の呪い: N << Mだと予測器の性能が落ちやすい • 対策: 各Mについて最低5つの観測を⽤意、N/Mが5~10になるようにする
• 互いに関連する変数: 関連しない変数の組だけで⼗分 • 対策: ドメイン知識による選別、PCAなどの次元削減⼿法 • 画像データでは • Handcrafted methods: ex. local binary patterns • Classical ML: ex. PCA, ISOMAP • Deep learning methods • DLではNが⼤きい必要があるのでサンプルサイズが⼩さい場合はdata augmentationや転移学習を⽤いる

変数の正規化(Normalization) • 特徴量の尺度を0~1に揃えることで予測器の性能向上や学習の⾼速化が⾒込める ※輪読会で話題になったもの • 標準化 standardization: 平均0, 分散1になるように合わせる。
⽂脈によって変わるので何を指しているか確認した⽅がよい。

⽋測処理 • ⽋測を落とすか、⽋測というラベルをつけるか、持っているデータから推定して埋めるか • 画像ではGANなどが使われる • ⽋測のメカニズムに注意: MAR, MCAR,
MNAR • 背後のバイアス(selection bias, immortal time bias)に注意 • 決まった⽅法はないが可能ならした⽅がいい。

特徴量選択 • ClassicalなML⼿法で、特徴量が多い場合は選別が必要 • Deep learningではあまり必要ない?

外れ値 • 外れ値がどうして発⽣したかに注意(measurement errorが発⽣していないか) • 外れ値に頑健なモデルを採⽤する選択肢 • 決定⽊ •
kNN • 外れ値を除外するときは外れ値と判断する基準を報告すること

クラスの不均衡 • 医療データではラベルに偏りが⽣じることが多い。 • Over sampling or under sampling •
重み付け • Sythetic data generation methods ex. SMOTE • GANを⽤いることも

データシフト • Trainingに使った群とtestingに使う群の性質が異なるとき • 共変量シフト • 事前分布のシフト • ドメインシフト •
モデル評価の前にデータシフトが必要かどうかを吟味しておく

リーケージ • Trainingに使ってはいけない情報が混じる • Trainingとtestingで同じ患者のデータが⼊っているなど。特に時系列で起こりやすい

3. Selection of machine learning models

Reporting checklist 1. Explicitly define the goal of the analysis
e.g., regression, classification, clustering 2. Identify the proper learning method used (e.g., supervised, reinforcement learning etc.) to address the problem 3. Provide explicit details on the use of simpler, complex, or ensemble models 4. Provide the comparison of complex models against simpler models if possible 5. Define ensemble methods, if used 6. Provide details on whether the model is interpretable

解釈可能性と予測性能のバランス調整 • 線形回帰モデルやシンプルな決定⽊は解釈可能性が⾼く、また variance(val dataでのerror)も⼩さいがunderfitしやすい • Bagging, boosting, stackingなどのアンサンブル⼿法は予測性能を⾼めるが解釈可能性を犠牲にする
• NNなどの複雑なモデルも解釈可能性を犠牲にして予測性能を⾼めている • ハイパラチューニングも必要 • 分析の⽬的に応じて、解釈可能性と予測性能でバランスをとる • MLモデルの挙動の研究や確率的DLなど解釈可能性を⾼める⽅法も出てきている

4. Model assessment

Reporting checklist 1. Provide a clear description of data used
for training, validation and testing 2. Describe how the model parameters were optimized (e.g., optimization technique, number of model parameters etc.)

train, validation, testに分ける • Train set: Modelの学習に⽤いる • Validation set:
ハイパラチューニングに⽤いる • Test set: 未知のデータに適⽤したときの誤差を推定する • 同じ確率分布を持つデータをランダムに分割するのが理想的 • *ホールドアウト法とも • データが不⾜していている場合 →Cross-Validation, Bootstrapping

Cross-Validation • データを重複のない複数のfoldに分ける⼿法 • k-fold • データをk個のfoldに分け、1個を評価に、k-1個を学習に使う。これを k回繰り返しスコアの平均をとる • *ラベル不均衡に対応したStratified
k-foldがKaggleなどではゴールドスタンダードとされる • Leave-one-out • サイズNのデータをN個のfoldにわけ、1個だけを評価に、残りのN-1個を学習に使う。これをN回繰り返しスコアの平均をとる • *つまりN-fold法 • *ジャックナイフ法とも。

Monte Carlo CVとBootstrapping 元論⽂より引⽤

5. Model evaluation

Reporting checklist 1. Provide the metric(s) used to evaluate the
performance of the model 2. Define the prevalence of disease and the choice of the scoring rule used 3. Report any methods used to balance the numbers of subjects in each class 4. Discuss the risk associated to misclassification

スコアを報告するための必要事項 • 分類タスクのaccuracyを例として 1. 学習と評価に使われたデータを明確に⽰す 2. モデルの要素を明記する 1. モデルのパラメータ(e.g., 初期値、特徴量、損失関数)
2. 正則化(e.g., 平滑化、ドロップアウト) 3. ハイパーパラメータ(e.g., optimizer, 学習率, early stopping) 3. 統計的性質や分布が近い別の場所のデータでテストしたスコアも報告する

その他推奨事項 • 有病率に合わせたバランス調整 • 複数モデルのスコアから平均と分散を計算(e.g. random initialization) • Cross-validationを⾏う •
臨床で使われる指標を使う • Precision, recall, F1 scoreより感度、特異度、オッズ⽐、AUC

いろいろな指標 • Confusion matrix: マルチラベルの分類タスクで作成される • Mean squared error: 回帰タスクやセグメンテーションで⽤い
られる • Dice: セグメンテーションで⽤いられる • The Hausdorff distance metrics: 物体検出 • The Bland-Altman plot: 2つの指標の⼀致具合を測る • 専⾨家との⽐較

誤分類による弊害の調査 • 特定の希少疾患で誤分類が起きやすいなどの誤分類による弊害を調査しておく

6. Model repulicability

Reporting checklist 1. Consider sharing code or scripts on a
public repository with appropriate copyright protection steps for further development and non-commercial use 2. Release a data dictionary with appropriate explanation of the variables 3. Document the version of all software and external libraries used

再現可能性を⾼めるために • コードを公開しましょう • ライセンスの話 • データを公開しましょう • 匿名化の話 •
codebook • 環境を公開しましょう • ソフトのバージョン • Docker

ライセンスの話 • 公開されたコードの利⽤可能範囲を明記 • MIT: 出典が明記されていれば利⽤や改変、公開は⾃由 • GPLv3: 利⽤しているコードもGPLv3で公開される必要がある •
詳しくは原⽂ • https://opensource.org/licenses/MIT • https://www.gnu.org/licenses/gpl-3.0.en.html

匿名化の話・codebook • 匿名化して公開 • 匿名化できない場合はIRB(地検審査委員会)の許可のもとで研究者に共有 • 擬似データを合成 cf. differential
privacy • codebook: データの説明書

環境の公開 • ソフトウェアのバージョン、計算環境を共有する • Docker: 環境共有ツール • Sphinx: ドキュメント作成⽀援ツール •
Jupyter: デモの共有

7. Reporting limitations, biases, and alternatives

Reporting checklist 1. Identify and report the relevant model assumptions
and findings 2. If well performing models were tested on a hold-out validation dataset, detail the data of that validation set with the same rigor as that of training dataset

Limitations, biases, alternativesの報告 • “All models are wrong, but some
are useful” • 機械学習モデルが依存している仮定を明記する • 複雑なモデルに対してシンプルなモデルをベンチマークとして併記する • 複数のデータセットを利⽤することで結果の信頼性を上げることができる

Summary and future directions • Automated ML • “multiomics” approach
• 臨床データとスマートデバイスのデータの融合 • 合成データの利⽤ • さらなる標準化に向けてPRIME checklistの改良を進めていく

参考⽂献 • Sengupta PP, Shrestha S, Berthon B, et al.
Proposed Requirements for Cardiovascular Imaging-Related Machine Learning Evaluation (PRIME): A Checklist: Reviewed by the American College of Cardiology Healthcare Innovation Council. JACC: Cardiovascular Imaging. 2020;13(9):2017-2035. doi:10.1016/j.jcmg.2020.07.015 • Thakur, “Approaching (Almost) Any Machine Learning Problem” https://www.amazon.co.jp/dp/B089P13QHT/ref=cm_sw_r_tw_dp_x_DiUHFb5 KBRT3E • Kaggle向けの本だが今回の論⽂と⼀致するところも多い。 • Efron and Hastie, “Computer Age Statistical Inference” https://web.stanford.edu/~hastie/CASI/index.html • MITライセンス https://opensource.org/licenses/MIT • GPLv3ライセンス https://www.gnu.org/licenses/gpl-3.0.en.html

MERC #3, 4: PRIME論文

MERC #3, 4: PRIME論文

Other Decks in Science

Featured

Transcript