Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
輪講_Kaggleで勝つデータ分析の技術_第2章
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Yust0724
April 15, 2020
Science
160
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
輪講_Kaggleで勝つデータ分析の技術_第2章
輪講用のまとめ
Yust0724
April 15, 2020
More Decks by Yust0724
See All by Yust0724
2019 Data Science Bowl competition solution
yust0724
0
110
輪講_Kaggleで勝つデータ分析の技術_第5章
yust0724
0
99
Other Decks in Science
See All in Science
生成AIと司法書士の未来.pdf
tagtag
PRO
0
130
共生概念の整理と AIアライメントの構想
hiroakihamada
0
220
主成分分析に基づく教師なし特徴抽出法を用いたコラーゲン-グリコサミノグリカンメッシュの遺伝子発現への影響
tagtag
PRO
0
280
「遂行理論の未来」(松島斉教授最終講義記念セッションの発表資料)
shunyanoda
0
930
イロレーティングを活用した関東大学サッカーの定量的実力評価 / A quantitative performance evaluation of Kanto University Football Association using Elo rating
konakalab
0
280
Inside the Mind of an LLM
baggiponte
0
190
不動産業界における業界特化のデータ整備とAI活用 ─Vertical DataとVertical AI─
estie
1
690
先端因果推論特別研究チームの研究構想と 人間とAIが協働する自律因果探索の展望
sshimizu2006
3
940
KISHIMOTO Atsuo
genomethica
0
150
Amusing Abliteration
ianozsvald
1
210
[NLP2026 参加報告会] AI for Science まとめ / NLP2026
lychee1223
0
1.9k
ダメな自分の育て方―性格タイプの「劣等機能」から理解するニガテ克服術
ppillc
0
180
Featured
See All Featured
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
133
19k
Designing Powerful Visuals for Engaging Learning
tmiket
1
430
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
21
1.5k
Data-driven link building: lessons from a $708K investment (BrightonSEO talk)
szymonslowik
1
1.1k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
49
3.5k
[RailsConf 2023] Rails as a piece of cake
palkan
59
6.7k
KATA
mclloyd
PRO
35
15k
Noah Learner - AI + Me: how we built a GSC Bulk Export data pipeline
techseoconnect
PRO
0
210
How to Get Subject Matter Experts Bought In and Actively Contributing to SEO & PR Initiatives.
livdayseo
0
140
StorybookのUI Testing Handbookを読んだ
zakiyama
31
6.8k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
32
2.9k
How to train your dragon (web standard)
notwaldorf
97
6.7k
Transcript
ୈ2ষɹλεΫͱධՁࢦඪ 2020/04/15 Yu Sato @Yust ୈ4ճྠߨ
ࣗݾհ ▪ 20192݄͜Ζ͔ΒkaggleʹࢀՃɻ ▪ ͖ɿΫϩɺ kaggleɺαφɺ͏ͳ͗ ▪ ؾʹͳΔɿ͋ͭ
KaggleͰউͭσʔλੳͷٕज़_࣍ 3 ୈ1ষɹੳίϯϖͱ? ୈ2ষɹλεΫͱධՁࢦඪɹˡࠓίί ୈ3ষɹಛྔͷ࡞ ୈ4ষɹϞσϧͷ࡞ ୈ5ষɹϞσϧͷධՁ ୈ6ষɹϞσϧͷνϡʔχϯά ୈ7ষɹΞϯαϯϒϧ
ϞσϧͷධՁͱʁ 4 test model ϞσϧͷධՁ2ͭ͋ΓɺͦΕͧΕͷࢦඪΛʮతؔʯʮධՁࢦඪʯͱ͍͏ɻ fit predict evaluate train Leaderboard
f(x)Λ ܭࢉ͢Δ f(x):తؔ ύϥϝʔλ update f(x)͕࠷খͱ ͳΔ·Ͱloop f(x)Λ ܭࢉ͢Δ f(x):ධՁࢦඪ … తؔϞσϧʹґଘ ධՁࢦඪίϯϖʹґଘ p.87
۩ମతͳؔ 5 λεΫͷछྨʹΑͬͯ༻͞ΕΔؔҟͳΔɻ ճؼ ೋ ྨ ଞΫ ϥε ྨ ɾRMSE
ɾlogloss ɾmulti-class logloss ճؼ ೋ ྨ ଞΫ ϥε ྨ ɾRMSE ɾRMSLE ɾMAE ɾR2(ܾఆ) ɾaccuracy ɾerror rate ɾF1-score, Fβ-score ɾlogloss ɾAUC ɾmulti-class accuracy ɾmean-F1, macro-F1, micro-F1 ɾquadratic weighted kappa(QWK) ɾmulti-class logloss తؔ ධՁࢦඪ p.62~83
తؔͱධՁࢦඪͷؔ 6 ɾRMSE ɾlogless ɾmulti-class logloss 1.ධՁࢦඪͱಉ ؔ͡Λతؔ ʹઃఆՄೳ 2.ධՁࢦඪΛ
తؔʹֶత ʹมՄೳ 3.ධՁࢦඪ͕0/ 1Λ༧ଌ͢Δ ྨλεΫ ɾRMSE ɾlogless ɾmulti-class logloss ɾ(logม)RMSE ɾRMSE ɾlogless ɾmulti-class logloss ɾaccuracy ɾerror rate ɾF1-score, Fβ-score ɾFair ɾqwkΛ࿈ଓؔۙࣅ ɾRMSLE 4.ධՁࢦඪͷྨ ࣅؔΛతؔ ʹઃఆՄೳ ɾMAE ɾQWK తؔ ٯlogม ᮢͰྨ 0.23 0.88 0.67 0.12 0 1 1 0 (0.60) ͦͷ·· ΄΅ͦͷ·· ධՁࢦඪ େ͖͚ͯ͘ҎԼͷ4ύλʔϯ͕͋Δɻ ৄ͘͠ղઆ ৄ͘͠ղઆ p.90
ᮢͲ͏ͬͯಋग़͢Δʁ 7 trainͰ࠷దԽ͞ΕͨᮢΛٻΊɺͦΕΛ༻͍ͯtestΛྨɻ threshold 0.00 ~ 0.80 → 0 0.81
~ 1.80 → 1 1.81 ~ 2.50 → 2 2.51 ~ 3.00 → 3 train LGBM OptimizedRounder test LGBM OptimizedRounder prob_target target 0.45 1 1.12 2 2.90 2 2.04 0 target 0.45 1.12 2.90 2.04 ɾ࠷దͳᮢͷಋग़ ɾճؼ͔Βྨ Λ࣮ࢪ͢Δɻ ίʔυΞϥΠ͞Μ͕·ͱΊͯ ͘Ε͍ͯΔ(*)ɻ (*)https://qiita.com/kaggle_master-arai-san/items/d59b2fb7142ec7e270a5 target 0 1 3 2 p.91,100,101
ྨࣅؔͱʁ 8 ͦΕͧΕͷؔͷಛΛ௫Έɺۙࣅ͢ΔؔΛબ͢Δɻ ɾQWKͷۙࣅ ɾMAEͷۙࣅ p.101~103 p.103 ਤ2.23
͋Γ͕ͱ͏͍͟͝·ͨ͠ɻ ▪ Kaggle: @Yust ▪ Twitter: @yust_kaggle ▪ e-mail:
[email protected]