Upgrade to Pro — share decks privately, control downloads, hide ads and more …

輪講_Kaggleで勝つデータ分析の技術_第2章

Yust0724
April 15, 2020

 輪講_Kaggleで勝つデータ分析の技術_第2章

輪講用のまとめ

Yust0724

April 15, 2020
Tweet

More Decks by Yust0724

Other Decks in Science

Transcript

  1. ϞσϧͷධՁͱ͸ʁ 4 test model ϞσϧͷධՁ͸2ͭ͋ΓɺͦΕͧΕͷࢦඪΛʮ໨తؔ਺ʯʮධՁࢦඪʯͱ͍͏ɻ fit predict evaluate train Leaderboard

    f(x)Λ ܭࢉ͢Δ f(x):໨తؔ਺ ύϥϝʔλ update f(x)͕࠷খͱ ͳΔ·Ͱloop f(x)Λ ܭࢉ͢Δ f(x):ධՁࢦඪ … ໨తؔ਺͸Ϟσϧʹґଘ ධՁࢦඪ͸ίϯϖʹґଘ p.87
  2. ۩ମతͳؔ਺ 5 λεΫͷछྨʹΑͬͯ࢖༻͞ΕΔؔ਺͸ҟͳΔɻ ճؼ ೋ஋ ෼ྨ ଞΫ ϥε ෼ྨ ɾRMSE

    ɾlogloss ɾmulti-class logloss ճؼ ೋ஋ ෼ྨ ଞΫ ϥε ෼ྨ ɾRMSE ɾRMSLE ɾMAE ɾR2(ܾఆ܎਺) ɾaccuracy ɾerror rate ɾF1-score, Fβ-score ɾlogloss ɾAUC ɾmulti-class accuracy ɾmean-F1, macro-F1, micro-F1 ɾquadratic weighted kappa(QWK) ɾmulti-class logloss ໨తؔ਺ ධՁࢦඪ p.62~83
  3. ໨తؔ਺ͱධՁࢦඪͷؔ܎ 6 ɾRMSE ɾlogless ɾmulti-class logloss 1.ධՁࢦඪͱಉ ؔ͡਺Λ໨తؔ ਺ʹઃఆՄೳ 2.ධՁࢦඪΛ໨

    తؔ਺ʹ਺ֶత ʹม׵Մೳ 3.ධՁࢦඪ͕0/ 1Λ༧ଌ͢Δ෼ ྨλεΫ ɾRMSE ɾlogless ɾmulti-class logloss ɾ(logม׵)RMSE ɾRMSE ɾlogless ɾmulti-class logloss ɾaccuracy ɾerror rate ɾF1-score, Fβ-score ɾFair ɾqwkΛ࿈ଓؔ਺ۙࣅ ɾRMSLE 4.ධՁࢦඪͷྨ ࣅؔ਺Λ໨తؔ ਺ʹઃఆՄೳ ɾMAE ɾQWK ໨తؔ਺ ٯlogม׵ ᮢ஋Ͱ෼ྨ 0.23 0.88 0.67 0.12 0 1 1 0 (0.60) ͦͷ·· ΄΅ͦͷ·· ධՁࢦඪ େ͖͘෼͚ͯҎԼͷ4ύλʔϯ͕͋Δɻ ৄ͘͠ղઆ ৄ͘͠ղઆ p.90
  4. ᮢ஋͸Ͳ͏΍ͬͯಋग़͢Δʁ 7 trainͰ࠷దԽ͞Εͨᮢ஋ΛٻΊɺͦΕΛ༻͍ͯtestΛ෼ྨɻ threshold 0.00 ~ 0.80 → 0 0.81

    ~ 1.80 → 1 1.81 ~ 2.50 → 2 2.51 ~ 3.00 → 3 train LGBM OptimizedRounder test LGBM OptimizedRounder prob_target target 0.45 1 1.12 2 2.90 2 2.04 0 target 0.45 1.12 2.90 2.04 ɾ࠷దͳᮢ஋ͷಋग़ ɾճؼ஋͔Β෼ྨ Λ࣮ࢪ͢Δɻ ίʔυ͸ΞϥΠ͞Μ͕·ͱΊͯ ͘Ε͍ͯΔ(*)ɻ (*)https://qiita.com/kaggle_master-arai-san/items/d59b2fb7142ec7e270a5 target 0 1 3 2 p.91,100,101