$30 off During Our Annual Pro Sale. View Details »

Machine Learning (ML) Tutorial -ML as a tool for Scientists

Machine Learning (ML) Tutorial -ML as a tool for Scientists

機械学習を研究室のメンバーにインストラクションする会を企画した時の資料です.中身はほとんどありませんのでご注意ください.
経緯はこちら.https://link.medium.com/Xq3gXhwVd5

oyoroco

March 16, 2020
Tweet

More Decks by oyoroco

Other Decks in Programming

Transcript

  1. Machine Learning (ML) Tutorial
    - ML as a tool for Scientists
    200316
    @oyoroco

    View Slide

  2. Machine Learning (ML)
    Output
    ● Prediction
    Input
    ● Data

    View Slide

  3. Machine Learning (ML)
    Output
    ● Prediction
    ● Score
    Input
    ● Data (features)
    ● Task
    ○ Target
    ○ Regression or
    Classification
    ○ Metrics

    View Slide

  4. ML as a tool
    Output
    ● Prediction
    ● Score
    Input
    ● Data (features)
    ● Task
    ○ Target
    ○ Regression or
    Classification
    ○ Metrics
    Auto ML
    1. Define a good task
    2. Validate results
    3. Accelerate your work!

    View Slide

  5. Today’s goal
    1. Make a simple ML model
    2. Help your decision whether machine learning
    can be used for your works

    View Slide

  6. Ref. https://gihyo.jp/book/2019/978-4-297-10843-4
    ML flow chart
    1. Exploratory Data Analysis (EDA)
    2. Task & Metrics
    3. Feature engineering
    4. Modeling
    5. Model validation
    6. (Model tuning)
    7. (Ensemble)

    View Slide

  7. 1. Exploratory Data Analysis (EDA)
    ● Contents of the data
    ● Prediction target

    View Slide

  8. 2. Task & Metrics
    Task
    ● Regression
    ● Classification
    ○ Binary
    ○ Multi
    Evaluation metrics
    ● Regression
    ○ Root mean square error: RMSE
    ○ Mean square error: MAE
    ● Classification
    ○ Confusion matrix
    ○ Log loss
    ○ AUC
    Objective function
    ● Regression
    ○ RMSE
    ● Classification
    ○ Log loss cf.) Gradient descent, Differentiable

    View Slide

  9. 3. Feature engineering
    ● Missing values (NaN)
    ● Standardization (for regularization)
    ● Categorical features
    ○ One-hot encoding
    ○ Label encoding
    ● Dimension reduction
    ○ Principal component analysis: PCA

    View Slide

  10. 4. Modeling
    ● Choose model (& hyper parameters)
    ■ model = Model(params)
    ● Training
    ○ Data (Features), Target
    ■ model.fit(train_x, train_y)
    ● Prediction
    ○ Test data
    ■ pred = model.predict(test_x)

    View Slide

  11. 4. Modeling: Model
    ● Linear model
    ● k-nearest neighbor algorithm: kNN
    ● Random forest
    ○ Decision tree + bagging
    ● Neural network: NN
    ● Gradient boosting decision tree: GBDT
    ○ Decision tree + Gradient boosting

    View Slide

  12. 5. Model validation
    Training data Test data
    Validation data
    All data
    https://scikit-learn.org/stable/modules/cross_validation.html

    View Slide

  13. Example:
    ● 当日はここから実際の実験データを題材に,スライドとnotebookで
    実際にコードを動かしながらデモをしました.
    ● タスクは2値分類で,評価指標はAUC.
    ● データ数~1600, 特徴量~200
    ● モデルはlogistic回帰とLGBM
    ● 流れは,EDA -> タスクの設定(+AUCの説明) -> モデリング(logistic
    回帰の説明) -> train/predictののち,結果を踏まえ,もう一度EDA
    をしてから特徴量エンジニアリング -> 実験を行いました.

    View Slide

  14. ML as a tool
    1. Define a good task
    2. Validate results
    3. Accelerate your work!

    View Slide