Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
2019 Data Science Bowl competition solution
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Yust0724
May 20, 2020
Programming
110
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
2019 Data Science Bowl competition solution
2019/10~2020/1に実施された、2019 Data Science Bowl compで我々のチームが実施したことを記載しました。
Yust0724
May 20, 2020
More Decks by Yust0724
See All by Yust0724
輪講_Kaggleで勝つデータ分析の技術_第2章
yust0724
0
160
輪講_Kaggleで勝つデータ分析の技術_第5章
yust0724
0
99
Other Decks in Programming
See All in Programming
Strategic Design in the Frontend: Moduliths & Micro Frontends @DDDEurope
manfredsteyer
PRO
0
130
Even G2とAWSで推しのエージェントを召喚しよう!
har1101
1
130
act1-costs.pdf
sumedhbala
0
120
SREは、MCPとSRE Agentをこう使え!
kazumax55
0
120
そのテスト、説明できますか?~LWテスト戦略FW~のご紹介
nakahara
0
170
Make SRE Operations Easier with Azure SRE Agent
kkamegawa
0
8.4k
LaravelLive Japan の裏方のすべて — 第188回 PHP勉強会@東京 (2026-06-24)
suguruooki
2
130
不変条件と整合性境界—ビジネスが決める設計判断と実現パターン / Invariants and Consistency Boundaries
nrslib
14
5.9k
任せる範囲はこう広がった / How the Scope of AI Delegation Has Expanded
nrslib
0
160
LLM本来の能力を解き放つサンドボックス技術とAI民主化への適用
yukukotani
3
4.6k
Datadog LLM Observabilityで実現する 安全なLLM Usage 管理
3150
0
120
TSKaigi Night Talks 2026_TypeScriptでサプライチェーンの整合性を型に閉じ込める
geekplus_tech
0
410
Featured
See All Featured
Building a Scalable Design System with Sketch
lauravandoore
463
34k
The Impact of AI in SEO - AI Overviews June 2024 Edition
aleyda
5
1.1k
Measuring & Analyzing Core Web Vitals
bluesmoon
9
870
Leo the Paperboy
mayatellez
7
1.9k
Building Adaptive Systems
keathley
44
3.1k
Automating Front-end Workflow
addyosmani
1370
210k
Mind Mapping
helmedeiros
PRO
1
260
Building a Modern Day E-commerce SEO Strategy
aleyda
45
9.1k
KATA
mclloyd
PRO
35
15k
The Curious Case for Waylosing
cassininazir
1
400
Un-Boring Meetings
codingconduct
0
320
Stewardship and Sustainability of Urban and Community Forests
pwiseman
0
240
Transcript
DSBίϯϖࢀઓه 2020/05/20 Yu Sato @Yust Kaggle LTձ
ࣗݾհ ▪ 20192݄͜Ζ͔ΒkaggleʹࢀՃɻ ▪ ͖ɿΫϩɺ kaggleɺαφɺ͏ͳ͗ ▪ ؾʹͳΔɿϗοτΫοΫ
2019DataScienceBowlίϯϖͱʁ 3 ࢠڙ͚ΞϓϦͷϩά͔Βɺग़͞ΕΔ՝ͷਖ਼Λਪఆ͢Δɻ train test installation_id title event_code event_data accuracy_group
0006a69f Cart Balancer 4010 4010 4010 4010 Bird Measurer NG OK NG NG 2 0 015776b4 Bird Measurer 4010 Mushroom Sorter OK 3 తมɻ 3ɿ1ճͰOK 2ɿ2ճͰOK 1ɿ3ճҎ্ͰOK 0ɿall NG ? ࠷ޙͷtitleͷ accuracy_groupΛਪఆ
ํ 4 2िؒͱ͍͏ݶΒΕͨظؒͰɺޮతʹਐΊΔํΛཱͯͨɻ ɾFE ɹ- kernelಛྔΛར༻ɻϦʔΫ͍ͯ͠Δͷ͕ͳ͍͔νΣοΫɻ ɹ- null importanceͰ༗ޮͳಛྔΛબɻ ɾmodel
ɹ- LGBMͷmodelΛseed averagingͰΞϯαϯϒϧɻ ɹ- qwkͷᮢ࠷దԽɻ ɾfinal submit ɹ- publicLB, cv͕ߴ͍ͷΛ1ͭͣͭબɻ
My Team Solution 5 Feature Engineering Feature Selection LGBM LGBM
LGBM Threshold Ensemble train, test submit - kernel + my team - Null importance - KFold - Seed Averaging - Optimized Rounder - Optimized coefficients - public:0.532 / private:0.545 Threshold Threshold
FeatureEngineering (kernel) 6 targetͱͷ૬͕ؔۃʹߴ͍ͷʢ=OverfitͷՄೳੑ͕͋Δʣ͕ͳ͍͔Λ֬ೝɻ →ۃʹߴ͍ͷଘࡏͤͣɻ
FeatureEngineering (my team) 7 installation_idͷɺ࣮ࢪͨ͠Ξηεϝϯτͷͱਖ਼Λूܭɻ 1st → 4ճʹOK 2nd →
2ճʹOK 3rd → 1ճͰOK 1st → 4ճશͯNG ࣍1ճͰOKʹͳΓͦ͏ ࣍શͯNG͔ɺ ྑͯ͘ճʹOK͔ʁ
FeatureSelection 8 targetΛshuffleͨ͠ͷͱ࣮ࡍͷtargetͰFeatureImportanceΛൺֱ͢Δɻ →1000ݸͷಛྔͷ͏ͪɺ200ݸΛͬͨɻ ࢀߟɿhttps://www.kaggle.com/ogrellier/feature-selection-with-null-importances ࣮ࡍͷσʔλͷͱ͖ͷ ΈɺImportance͕ߴ͍ →ॏཁͳಛྔ Shuffleͯ͠Importance ͕มΘΒͳ͍
→ϊΠζಛྔ
ᮢͰྨ Threshold 9 qwkͷ༧ଌɺ·ͣճؼͰਪఆͦ͠ͷΛᮢͰྨͨ͠ɻ(kaggleຊɹɹɹ p.100ࢀর) ͦͷࡍͷᮢͷܾఆํ๏Λݕ౼ͨ͠ɻ target 0.45 1.12 2.90
2.04 0.80 1.77 target 0 1 3 2 0 1 threshold 0.00 ~ 0.80 → 0 0.81 ~ 1.80 → 1 1.81 ~ 2.50 → 2 2.51 ~ 3.00 → 3 f_1 f_2 f_3 f_4 ճؼͰ༧ଌ
Threshold 10 trainͰ࠷దԽ͞ΕͨᮢΛٻΊɺͦΕΛ༻͍ͯtestΛྨɻ threshold 0.00 ~ 0.80 → 0 0.81
~ 1.80 → 1 1.81 ~ 2.50 → 2 2.51 ~ 3.00 → 3 train LGBM OptimizedRounder test LGBM OptimizedRounder prob_target target 0.45 1 1.12 2 2.90 2 2.04 0 target 0.45 1.12 2.90 2.04 ɾ࠷దͳᮢͷಋग़ ɾճؼ͔Βྨ Λ࣮ࢪ͢Δɻ ίʔυΞϥΠ͞Μ͕·ͱΊͯ ͘Ε͍ͯΔ(*)ɻ (*)https://qiita.com/kaggle_master-arai-san/items/d59b2fb7142ec7e270a5 target 0 1 3 2
submit 11 PublicLB͕࠷ߴ͍ͷɺcv͕࠷ߴ͍ͷͷ2ͭΛબɻ →cv͕࠷ߴ͍#5͕ɺPrivateLBʹ͍ͭͯ࠷ߴ͔ͬͨɻ # PublicLB cv final sub PrivateLB
kernel 1 0.550 - x 0.520 2 0.542 - 0.520 3 0.537 - 0.534 myTeam 4 0.533 0.60357 0.544 5 0.532 0.60630 x 0.545 6 0.532 0.60333 0.544 7 0.527 0.60583 0.543 ۜϝμϧGET!!
12 Kagglerʹݹ͔͘ΒΘΔ֨ݴɻ rank1ҐͷbestfittingࢯɺҎԼͷΑ͏ʹड़͍ͯΔɻ A good CV is half of success.
I won’t go to the next step if I can’t find a good way to evaluate my model. ʢྑ͍CVཱ͕֬Ͱ͖Εɺޭͷಓͱݴ͑ΔɻࢲɺϞσϧͷྑ͍ධՁํ๏͕͔Δ·Ͱɺ࣍ͷεςοϓਐ·ͳ͍ɻʣ Trust your CV.
·ͱΊ 13 ▪ ظؒͷνϟϨϯδͰɺ࠷ݶΔ͜ͱΔɻ ▪ աڈͷྨࣅίϯϖνΣοΫ͢Δɻ ▪ Trust your CV
͋Γ͕ͱ͏͍͟͝·ͨ͠ɻ ▪ Kaggle: @Yust ▪ Twitter: @yust_kaggle ▪ e-mail:
[email protected]
[ิ] ධՁࢦඪʢQWKʣ 15 ɾQuadratic Weighted KappaʢॏΈ͖Χούʣ ɾԕ͍ྨͰ֎ͨ࣌͠ͷଛࣦ͕େ͖͍ɻ ࢀߟɿhttps://bellcurve.jp/statistics/blog/14200.html