MLflow Tracking を用いた実験管理 / ayniy-with-mlflow

Slide 1

Slide 1 text

atmaCup オンサイトデータコンペ#5 振り返り会 MLflow Tracking を⽤いた実験管理 u++ (@upura0) 2020年6⽉14⽇ 1

Slide 2

Slide 2 text

本発表について atmaCup#5 での MLflow Tracking を⽤いた実験管理について実験管理の課題 MLflow Trackingの紹介⾃作ライブラリ「Ayniy」への組み込み 2

Slide 3

Slide 3 text

⾃⼰紹介 Kaggle Master, PetFinder.my Adoption Prediction 1st (Team) https://www.amazon.co.jp/dp/4065190061 https://youtu.be/7-uZHFaQ2V0 https://www.getrevue.co/profile/upura 3

Slide 4

Slide 4 text

今回の取り組み public 16位・private 27位 https://upura.hatenablog.com/entry/2020/06/06/193944 4

Slide 5

Slide 5 text

攻めの最終サブ publicとprivateで chip_id が別なら"Trust CV" 全ての実験を把握できていたから実現した⼤量Stacking 5

Slide 6

Slide 6 text

実験管理の課題 6

Slide 7

Slide 7 text

コンペ中にやること次のような要素を調整しながらの実験の繰り返し Cross Validationの戦略特徴量機械学習アルゴリズム学習時パラメータ最適化する指標 7

Slide 8

Slide 8 text

以前の管理⽅法 ipynbを⼤量に作成・・・どれで/何やって/どうだったか分からない git/GitHubで差分が⾒づらい 8

Slide 9

Slide 9 text

MLflow Trackingの紹介 9

Slide 10

Slide 10 text

実験管理の⼿助けツール pip install mlflow 10

Slide 11

Slide 11 text

使い⽅ from mlflow import log_metric, log_param, log_artifact mlflow.set_experiment(exp_name) mlflow.start_run(run_name=run_name) log_param('model_name', model_name) log_param('fe_name', fe_name) log_param('train_params', params) log_param('cv_strategy', cv) log_param('evaluation_metric', evaluation_metric) log_metric('cv_score', cv_score) log_param('fold_scores', fold_scores) log_param('cols_definition', cols_definition) log_param('description', description) mlflow.end_run() 11

Slide 12

Slide 12 text

ダッシュボード mlflow ui 12

Slide 13

Slide 13 text

参考情報 https://mlflow.org/docs/latest/tracking.html ⽇本語ハイパラ管理のすすめ -ハイパーパラメータを Hydra+MLflowで管理しよう- MLflow使い始めたのでメモ Python: MLflow Tracking を使ってみる 13

Slide 14

Slide 14 text

⾃作ライブラリ「Ayniy」への組み込み 14

Slide 15

Slide 15 text

Ayniy Documentation GitHub Slide (Japanese) Sadriddin Ayni was a Tajik intellectual who wrote poetry, fiction, journalism, history and lexicography. He is regarded as Tajikistan's national poet and one of the most important writers in the country's history. https://uz.wikipedia.org/wiki/Sadriddin_Ayniy 15

Slide 16

Slide 16 text

All You Need is YAML import yaml from sklearn.model_selection import StratifiedKFold from ayniy.preprocessing.runner import Tabular from ayniy.model.runner import Runner f = open('configs/fe000.yml', 'r+') fe_configs = yaml.load(f) g = open('configs/run000.yml', 'r+') run_configs = yaml.load(g) cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=7) tabular = Tabular(fe_configs, cv) tabular.create() runner = Runner(run_configs, cv) runner.run_train_cv() runner.run_predict_cv() runner.submission() https://upura.github.io/ayniy-docs/quick_start_guide.html 16

Slide 17

Slide 17 text

print感覚で諸々を記録機械学習アルゴリズム名特徴量セットのID 学習時パラメータ Cross Validationの戦略最適化する指標 Cross Validationスコア Foldごとのスコアカラム定義（カテゴリ変数）実験概要 17

Slide 18

Slide 18 text

コンペ中の作業 . 現状のAyniyで不可能な場合は実装 . fe.yaml , run.yaml 書く . runner.py を実⾏ . mlflow uiでCVスコア⾒る . 良さげだったらsubmit 18

Slide 19

Slide 19 text

良かった点 mlflow uiでどれで/何やって/どうだったかを管理できた実験で書くのは *.yaml のみなので、git/GitHubでの差分が⾒やすくなったアイディア部分に注⼒できるようになった 19

Slide 20

Slide 20 text

おわりに 20

Slide 21

Slide 21 text

まとめ atmaCup#5 での MLflow Tracking を⽤いた実験管理について実験管理の課題 MLflow Trackingの紹介⾃作ライブラリ「Ayniy」への組み込み 21