Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Speaker Deck
PRO
Sign in
Sign up for free
model_pipeline_final.pdf
Maxwell
September 18, 2018
Science
1
180
model_pipeline_final.pdf
model pipeline and others in Home Credit Default Risk competition.
Thanks to team mates.
Maxwell
September 18, 2018
Tweet
Share
More Decks by Maxwell
See All by Maxwell
Great Barrier Reef Model Pipeline: 15th place
hoxomaxwell
0
18
Lecture materials at the University of Tokyo School of Medicine
hoxomaxwell
0
19
Kaggle Hungry Geese
hoxomaxwell
1
30
HuBMAP 17th place model pipeline
hoxomaxwell
1
39
LT: Shallow Dive into Bayes Factor
hoxomaxwell
6
870
Kaggle APTOS 2019 @ U-Tokyo Med
hoxomaxwell
1
300
Cornell Birdcall 36th place solution
hoxomaxwell
2
130
Kaggle Bengali.AI 6 th place solution
hoxomaxwell
4
4.2k
Google Colaboratory Shortcuts
hoxomaxwell
2
870
Other Decks in Science
See All in Science
20220220_球体周りの流れ抗力係数4_simpleFoamで球体周りの定常流れ
kamakiri1225
0
160
ПТАиМСС – магистратура, 2 курс осень 2021 – 2.2 занятие
dscs
0
100
深層学習による自然言語処理 輪読会#1 資料
tok41
0
290
Behind the Scenes—and Science—of the Earth Observatory
jscarto
0
180
実験ノートをどう取るべきか
rinabouk
PRO
1
1.6k
ROS再入門 -Lidarセンサーを触ってみた-
miura55
0
230
AI最新論文読み会2021年11月
ecoopnet
0
200
DMLDiD
masa_asa
0
140
2年ちょっとで18kg 減量した話
kazkanda
0
310
データでスポーツを楽しもう! / Enjoy sports with data! (2021-11-30)
konakalab
0
120
Use ParaView for ISEE NLFFF database (v1.1)
hsc_nagoya
0
1k
About ISEE NLFFF database (v1.1)
hsc_nagoya
0
1.2k
Featured
See All Featured
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
212
20k
The Art of Programming - Codeland 2020
erikaheidi
32
5.8k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
12
900
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
268
11k
Fantastic passwords and where to find them - at NoRuKo
philnash
25
1.5k
Java REST API Framework Comparison - PWX 2021
mraible
PRO
11
4.6k
Music & Morning Musume
bryan
35
4.2k
Happy Clients
brianwarren
89
5.5k
Side Projects
sachag
449
37k
Code Review Best Practice
trishagee
41
6.8k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
151
12k
5 minutes of I Can Smell Your CMS
philhawksworth
196
18k
Transcript
ikiri_DS Model PipeLine 600+1 ( LB804 ) FEATURES 1000+1 (
LB803 ) meta app meta bur Kernel GP Nejumi features Tereka features + LGBM 5 3 tosh 5 + CatBoost 5 2 1 + LGBM * 4 3 1 + CNN 7 Residual 2 + ExtTree 4 3 1 Residual 1 ( corrected with residual regression ) Blending CV 0.8094 Adversarial Stochastic Blending CV 0.8096 Adversarial Stochastic Blending CV 0.81050 * model drawn in next page + NN 1 3 ONODERA Maxwell Nejumi Tereka RK 1 2 3 4 5 6 7 Branden features 8 Branden + NN 1 3 takuoko features 9 Angus features 10 takuoko nejumi feature Angus + Res2 + LGBM 1 6 + Res1 + LGBM 1 6 1 or 2 or 5 + LGBM 1 or 2 or 5 + CatBoost or + LGBM 5 1 or 2 5 + LGBM 8 + LGBM 9 + LGBM 10 Adversarial Stochastic Blending CV : 0.8061 29.Aug.2018 Tam Tam features 11 + LGBM 11 + RGF 1 + LGBM 11 + RNN 7 1 * using hidden layer as additional features to correct residuals. + CNN 7 + hidden + Res3 + LGBM 1 6 + RGF 1 + Res2 + LGBM 1 6 + LGBM 5 RK features 12 + LGBM 12 1 or 2 12 + LGBM 8 1 or 2 8 + LGBM 3 1 5 or 3 2 5 + LGBM 8 1 12 or 8 2 12 Public 0.8085 17 th Private 0.8017 18 th + LGBM 8 + LGBM 9 + LGBM 10 Ireko DAE 13 Ireko8 + NN 1 13 + NN 1 + NN 1 13 Nejumi prediction Public 0.8093 10 th Private 0.8016 18 th Public 0.8080 23 th Private 0.8028 14 th + RNN 7 1 Public 0.8110 3 rd Private 0.8042 5 th Giba Post Processing Public 2nd 0.81241 Private 2nd 0.80561 Home Credit Default Risk partial partial partial + LGBM 8 1 or 2 8 or 12 + LGBM 3 1 or 2 3 or 12 3 + LGBM 6 1 Residual 3 + hidden + LGBM 1 6' or 6' 1 + LGBM 6' 2 Blending
ikiri_DS Model PipeLine 600+1 ( LB804 ) FEATURES 1000+1 (
LB803 ) meta app meta bur Kernel GP Nejumi features Tereka features tosh + LGBM * 4 3 1 + CNN 7 Residual 2 Residual 1 ( corrected with residual regression ) Blending CV 0.8085 Adversarial Stochastic Blending CV 0.8085 Adversarial Stochastic Blending CV 0.8097 * model drawn in next page ONODERA Maxwell Nejumi Tereka RK 1 2 3 4 5 6 7 Branden features 8 Branden + NN 1 3 takuoko features 9 Angus features 10 takuoko nejumi feature Angus + Res2 + LGBM 1 6 + Res1 + LGBM 1 6 + LGBM 8 + LGBM 9 + LGBM 10 Adversarial Stochastic Blending CV : 0.8061 29.Aug.2018 Tam Tam features 11 + LGBM 11 + LGBM 11 + RNN 7 1 * using hidden layer as additional features to correct residuals. + CNN 7 + hidden + Res3 + LGBM 1 6 + RGF 1 + Res2 + LGBM 1 6 + LGBM 5 RK features 12 + LGBM 12 1 or 2 12 + LGBM 8 1 or 2 8 Public 0.8071 26 th Private 0.8009 37 th + LGBM 8 + LGBM 9 + LGBM 10 Ireko DAE 13 Ireko8 + NN 1 13 + NN 1 + NN 1 13 Nejumi prediction Public 0.8082 23 th Private 0.8022 18 th Public 0.8080 23 th Private 0.8028 14 th Public 0.8099 7 th Private 0.8040 6 th Giba Post Processing Home Credit Default Risk partial + LGBM 8 1 12 or 8 2 12 partial 1 or 2 + LGBM + LGBM 6 1 Residual 3 + hidden + LGBM 1 6' or 6' 1 + LGBM 6' 2 Blending + ExtTree 4 3 1 + NN 1 3 + RGF 1 + LGBM 4 3 2 + XGB 4 3 1 + NN 1 + RNN 7 1 + hidden + Res3 + LGBM 1 6 + Res1 + LGBM 1 6 + hidden + Res4 + LGBM 1 6 stacking with LGBM CV 0.8080 Public 0.8070 / Private 0.8015 Stacking prediction Stacking + LGBM 3 1 or 2 3
application bureau bureau balance AUC : 0.683 (SEED71) 0.683 (SEEDs
avg) AUC 0.772 (SEED71) 0.773 (SEEDs avg) XGBoost app meta feature XGBoost prev meta feature 229 features 300 features all data stacking-like Light GBM 5 stratified fold ( shuffle = True ) 5 / 8 SEEDs rank averaged SEED : 71 for model fit SEED : 710, 711, 712, 713, 714 ( 715, 716, 717 ) for OOF prediction hyper parameter tuned for 603 features (reflected on meta features) XGBoost bureau meta feature ONODERA BASIC FEATURES 600 features NEJUMI FEATURES ( interest rate ) 1 feature 603 ( 604 ) features Local CV 0.80641 Public LB / Private LB 0.80569 / 0.79853 100 th / 105 th AUC 0.710 (SEED71) 0.712 (SEEDs avg) previous inst POS_CASH credit 952 features Local CV 0.80646 LB 0.804 ( ~ 0.805 ) Maxwell 603 ( 604 ) selected features based on ONODERA criteria w/o feature selection Stacking-like Light GBM