Machine Learning (ML) Tutorial -ML as a tool for Scientists

Machine Learning (ML) Tutorial - ML as a tool for
Scientists 200316 @oyoroco

Machine Learning (ML) Output • Prediction Input • Data

Machine Learning (ML) Output • Prediction • Score Input •
Data (features) • Task ◦ Target ◦ Regression or Classiﬁcation ◦ Metrics

ML as a tool Output • Prediction • Score Input
• Data (features) • Task ◦ Target ◦ Regression or Classiﬁcation ◦ Metrics Auto ML 1. Deﬁne a good task 2. Validate results 3. Accelerate your work!

Today’s goal 1. Make a simple ML model 2. Help
your decision whether machine learning can be used for your works

Ref. https://gihyo.jp/book/2019/978-4-297-10843-4 ML ﬂow chart 1. Exploratory Data Analysis (EDA)
2. Task & Metrics 3. Feature engineering 4. Modeling 5. Model validation 6. (Model tuning) 7. (Ensemble)

1. Exploratory Data Analysis (EDA) • Contents of the data
• Prediction target

2. Task & Metrics Task • Regression • Classification ◦
Binary ◦ Multi Evaluation metrics • Regression ◦ Root mean square error: RMSE ◦ Mean square error: MAE • Classification ◦ Confusion matrix ◦ Log loss ◦ AUC Objective function • Regression ◦ RMSE • Classification ◦ Log loss cf.) Gradient descent, Differentiable

3. Feature engineering • Missing values (NaN) • Standardization (for
regularization) • Categorical features ◦ One-hot encoding ◦ Label encoding • Dimension reduction ◦ Principal component analysis: PCA

4. Modeling • Choose model (& hyper parameters) ▪ model
= Model(params) • Training ◦ Data (Features), Target ▪ model.ﬁt(train_x, train_y) • Prediction ◦ Test data ▪ pred = model.predict(test_x)

4. Modeling: Model • Linear model • k-nearest neighbor algorithm:
kNN • Random forest ◦ Decision tree + bagging • Neural network: NN • Gradient boosting decision tree: GBDT ◦ Decision tree + Gradient boosting

5. Model validation Training data Test data Validation data All
data https://scikit-learn.org/stable/modules/cross_validation.html

Example: • 当日はここから実際の実験データを題材に，スライドとnotebookで実際にコードを動かしながらデモをしました． • タスクは2値分類で，評価指標はAUC． • データ数~1600, 特徴量~200 •
モデルはlogistic回帰とLGBM • 流れは，EDA -> タスクの設定(+AUCの説明) -> モデリング(logistic 回帰の説明) -> train/predictののち，結果を踏まえ，もう一度EDA をしてから特徴量エンジニアリング -> 実験を行いました．

ML as a tool 1. Deﬁne a good task 2.
Validate results 3. Accelerate your work!

Machine Learning (ML) Tutorial -ML as a tool fo...

Machine Learning (ML) Tutorial -ML as a tool for Scientists

oyoroco

More Decks by oyoroco

Other Decks in Programming

Featured

Transcript

Machine Learning (ML) Tutorial - ML as a tool for

Machine Learning (ML) Output • Prediction Input • Data

Machine Learning (ML) Output • Prediction • Score Input •

ML as a tool Output • Prediction • Score Input

Today’s goal 1. Make a simple ML model 2. Help

Ref. https://gihyo.jp/book/2019/978-4-297-10843-4 ML ﬂow chart 1. Exploratory Data Analysis (EDA)

1. Exploratory Data Analysis (EDA) • Contents of the data

2. Task & Metrics Task • Regression • Classiﬁcation ◦

3. Feature engineering • Missing values (NaN) • Standardization (for

4. Modeling • Choose model (& hyper parameters) ▪ model

4. Modeling: Model • Linear model • k-nearest neighbor algorithm:

5. Model validation Training data Test data Validation data All

Example: • 当日はここから実際の実験データを題材に，スライドとnotebookで実際にコードを動かしながらデモをしました． • タスクは2値分類で，評価指標はAUC． • データ数~1600, 特徴量~200 •

ML as a tool 1. Deﬁne a good task 2.