Lock in $30 Savings on PRO—Offer Ends Soon! ⏳

機械学習OSSの変遷と未来

 機械学習OSSの変遷と未来

まつもとゆきひろ氏と考える つよいエンジニアになるためのオープンソースの使い方 - 2021/4/13
https://techplay.jp/event/849756

CADDi AI Lab Tech Lead Shunsuke Kawai

vaaaaanquish

April 13, 2022
Tweet

More Decks by vaaaaanquish

Other Decks in Technology

Transcript

  1. I AM • CADDi, inc. AI Lab Tech Lead M3,

    inc. Engineering Fellow Developers Guild Bolder’s Owner • OSS • XGBoost、LightGBM、Rust wrapper • gokart • xonsh Shunsuke Kawai (@vaaaaanquish)
  2. MACHINE LEARNING OSS LAYER conda, poetry, pipenv, … Jupyter, Streamlit,

    FastAPI, … dbt, Dagster, DVC, feast, hadoop, … k8s, TF Serving, OpenMPI, faiss, … MLflow, Airflow, gokart, … NumPy, Pandas, NetworkX, … xfeat, spaCy, torchvision, Albumentation, … sklean, LightGBM, PyTorch, TensorFlow, Optuna… Matplotlib, Seaborn, Plotly, Bokeh… SHAP, AIF360, … Package Management Application Data Management Machine Learning Pipeline Data Representation Analysis & Modeling Visualization Verification Infrastructure & Deployment Data Preprocessing
  3. MACHINE LEARNING OSS LAYER conda, poetry, pipenv, … Jupyter, Streamlit,

    FastAPI, … dbt, Dagster, DVC, feast, hadoop, … k8s, TF Serving, OpenMPI, faiss, … MLflow, Airflow, gokart, … NumPy, Pandas, NetworkX, … xfeat, spaCy, torchvision, Albumentation, … sklean, LightGBM, PyTorch, TensorFlow, Optuna… Matplotlib, Seaborn, Plotly, Bokeh… SHAP, AIF360, … Package Management Application Data Management Machine Learning Pipeline Data Representation Analysis & Modeling Visualization Verification Infrastructure & Deployment Data Preprocessing Today
  4. HISTORY OF MACHINE LEARNING OSS 1991 2000 2010 2020 Python

    0.9 Python 3.0 NumPy (1995~) PIL (1995~) Pandas (2008~) SciPy (2001~) sklean (2007~) LightGBM (2016~) XGBoost (2014~) PyTorch (2016~) TensorFlow (2015~) Optuna (2018~) spaCy (2015~) MeCab (2006~) NetworkX (2005~) xfeat (2020~) torchvision (2017~) PyTorch Lightning (2019~) Albumentation (2018~) PyCaret (2020~) autogluon (2020~) CatBoost (2017~) transformers (2018~) JAX (2020~) cudf (2020~)
  5. POPULAR MODELING TOOLS State of ML & Data Science 2021

    - Kaggle https://www.kaggle.com/kaggle-survey-2021
  6. CASE 1 https://github.com/microsoft/LightGBM/pull/2620 • XE NDCG MARTの実装 • ランキングに微分可能な損失関数 •

    高速かつ精度が高い • arXiv公開から短い期間で著者が実装 • 数式をコードに落とし込む力 • 論文をサーベイする力 https://arxiv.org/abs/1911.09798
  7. CASE 2 https://github.com/vaaaaanquish/lightgbm-rs/pull/24 • Feature Importance抽出機能追加 • モデルの解釈基準を数値化する • 実際にモデリングする過程で

    求められる事が多い • コンペティションに向けた機能追加 • ユースケースに対する理解 • wrap元のロジックの理解
  8. CASE 3 https://github.com/scikit-learn/scikit-learn/pull/16625 • Top k Accuracyの実装 • 上位N個を見て正解が何個あるか •

    Accuracy Scoreは既存実装あり • 最初のissueは2017年 • 長らく実装されていなかっただけ • 大きなOSSでもissueは山ほどある • 自身が日々使う中でのissueでも◎