Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Kaggle日記について

fkubota
June 02, 2021

 Kaggle日記について

Kaggle日記をなぜ作ったのかとか話してます。

Kaggle日記の詳細はこちら↓

Kaggle日記という戦い方: https://zenn.dev/fkubota/articles/3d8afb0e919b555ef068

fkubota

June 02, 2021
Tweet

More Decks by fkubota

Other Decks in Technology

Transcript

 1. Kaggle೔هʹ͍ͭͯ
  ྺ࢙ͱ͔എܠͱ͔
  fkubota

  View Slide

 2. ࣗݾ঺հ 02
  fkubota (Twitter, Kaggle)
  - όϯυϧΧʔυͷձࣾ(ΧϯϜ)ͰػցֶशΤϯδχΞ
  - Kaggle Expert
  - ԭೄग़਎(౦ژʹग़͖ͯͯ3೥൒)
  - ෺ཧֶՊग़਎(ڧ૬ؔిࢠܥɺ4.2KͷӷମϔϦ΢Ϝʹ࣓ੑମಥͬࠐΜͰͨ)
  - ϓϩάϥϛϯάྺ͸2೥൒͙Β͍
  - झຯ
  - ૣى͖(4࣌൒ىচ)
  - Kaggle
  - ίʔώʔɺϏʔϧɺ΢ΟεΩʔ
  - ಡॻɺ෺ཧɺ఩ֶ

  View Slide

 3. ࠓ೔࿩͢͜ͱ
  Kaggle೔هͬͯԿʁ ॳظʹ
  ίϯϖਏ͔ͬͨ࿩
  ͔ͩΒ
  Kaggle೔ه࢝Ίͨ
  ͱΓ͋͑ͣͲΜͳ΋ͷ͔
  ؆୯ʹ঺հ͠·͢ɻ
  هࣄ͕͋ΔΑɻ(url)
  ίϯϖ࢝Ίͨࠒɺશવ΍
  Δؾى͖ͳ͔ͬͨͷͰɺ
  ͦͷ๊͍࣌͑ͯͨ໰୊Λ
  ͪΐͬͱ੔ཧɻ
  Kaggle೔هΛ࢝Ίͨཧ༝Λ
  ఻͑Ε͹ΑΓ࢖͍΍͘͢ͳ
  Δ͔ͳͱɻ
  ͦ΋ͦ΋Kaggle೔هͰ͋Δ
  ඞཁ͢Βͳ͍ͱ͍͏࿩ɻ
  ओʹϏΪφʔʹ޲͚ͯʂ
  03

  View Slide

 4. Kaggle೔هͬͯԿʁ

  View Slide

 5. ͜Μͳײ͡ʂ
  Kaggle೔ه
  ࣮ݧܭը
  ࣮ݧ݁Ռͱߟ࡯
  ࿦จ/ࢀߟจݙ
  Discussion
  KaggleCode
  ΞΠσΞ
  KaggleࢀՃதʹੜ·ΕΔ৘ใ͸શͯKaggle೔هʹ
  ྫ) ௗίϯϖͷKaggle೔ه
  05

  View Slide

 6. Kaggleͷେมͳͱ͜
  difficult
  toooooo long
  Կ͔ͯͨ͠๨ΕͪΌͬͨ
  ໋໊نଇͷഁ୼
  Ϟνϕଓ͔ͳ͍Α...
  ΞΠσΞͷރׇ
  too long໰୊΁ͷରࡦ
  06

  View Slide

 7. Kaggle೔ه͕औΓ૊ΜͰ͍Δ͜ͱ
  Ϟνϕҡ࣋ ϑΝΠϧ໊Ͱ؅ཧ͠ͳ͍
  ೴ͷϝϞϦͰ؅ཧ͠ͳ͍ τϨʔαϏϦςΟ
  ৄࡉ͸ʮKaggle೔هͱ͍͏ઓ͍ํʯΛݟ͍ͯͩ͘͞m(_ _)m
  ௕͍ͷʹ΄ͱΜͲ͏·͍͔͘ͳ
  ͍ͱ͔஍ࠈ͡ΌͶʔ͔ɻ
  Ͳ͏ͤഁ୼͢ΔΜͰ͠ΐʁ
  ͳΒ࢝Ί͔Β؅ཧ͢Δ͜ͱΛ͋
  ͖ΒΊΑ͏ͥɻ
  3ϲ݄΋͋ΔΜͩΑʁ
  ֮͑ΒΕΔΘ͚͕ͳ͍ɻ
  ݴ͍͍͚ͨͩɻ
  07

  View Slide

 8. ॳظʹίϯϖਏ͔ͬͨ࿩

  View Slide

 9. Kaggleʹڵຯ࣋ͪ࢝Ίͨ࣌
  ڵຯ
  ਓ޻஌ೳΛڝ͍͍͋
  ͳ͕Βֶ΂Δͩͱʂʁ
  ΍Δ͔͠Ͷ͐ʂ
  ׬શཧղ
  ܾఆ໦ͱSVMΛཧղͨ͠ɻ
  ͑ʁsklearnͰ͙͢࢖͑Δͷʁ
  ༨༟͡ΌΜww
  ੩ऐ
  ܾఆ໦ͱSVMࢼ͔ͨ͠
  Β΋͏΍Δ͜ͱͳ͍Αʁ
  1αϒͰऴྃʂ
  σΟεΧογϣϯ΋Կॻ͍ͯΔ
  ͔Θ͔ΒΜ͜͠ΕҎ্͸ਏ͍ʂ
  09

  View Slide

 10. ޛͬͨ๻
  ·ͩ...ͦͷ࣌Ͱ͸ͳ͍
  10

  View Slide

 11. ۭനͷ2ϲ݄
  ͦͷ͕࣌๚ΕΔͷΛษڧ͠ͳ͕Β଴ͪ·ͨ͠
  11

  View Slide

 12. ຊ౰ʹޛͬͨ๻
  ͦͷ࣌ ͸Ұੜ΍ͬͯ͜ͳ͍
  12

  View Slide

 13. ࣮ફ͸΍ͬͺҧ͏ΜͩΖ͏ͳ͊
  ͓ษڧ ࣮ફ
  ࣮ફͰֶ͔͠΂ͳ͍͜ͱͬͯ͋ΔΑͶ͐
  - EDA
  - ϦʔΫ
  - CVͷ੾Γํ
  - ಛ௃ྔબ୒
  - ͳͲͳͲ...
  Θ͔Δ͚Ͳ͠ΜͲ͍΋Μ͸͠ΜͲ͍
  ޲͍ͯͳ͍ͷ͔ͳʁ
  13

  View Slide

 14. ίϛοτྔ
  ָ͠͞
  Կֶ͔Ϳ͍͍࣌ͬͯͩͨ͜Μͳײ͡ۂઢ
  0
  φχίϨ?
  φχίϨ?
  φχίϨ?
  ݴޠԽΉ͔͍͚ͣ͠Ͳ...
  - ஌ࣝͷମܥԽ͕࢝·Δ?
  - ఺ͱ఺͕ܨ͕Γ࢝ΊΔ?
  - Θ͔Βͳ͍͕Θ͔Δ?
  - ήʔϜͷϧʔϧ͕Θ͔Δʁ
  - ….Έ͍ͨͳײ͡ʁ
  ͋͘·Ͱ๻ͷܦݧྫͳΜͰ͕͢ 14

  View Slide

 15. Կ͕ݴ͍͍͔ͨͱݴ͏ͱ
  ָ͠͞
  0
  ͻͱ·ͣ͜͜ΛͲ͏ʹ͔
  ৐Γӽ͑Α͏ͱࢥͬͨ
  ͜͜Ͱ޲͍͍ͯͳ͍ͱ൑அ͢Δͷ
  ͸΋͍ͬͨͳ͍͔ͳ͊
  ͋͘·Ͱ๻͕ͦ͏ࢥ͚ͬͨͬͯͩͶʂʂ
  15
  φχίϨظ

  View Slide

 16. ໰୊͕গ͠໌֬ʹͳͬͯ
  ίϯϖ͠ΜͲ͍
  Ͳ͏͠Α͏໰୊
  16
  φχίϨظΛ
  Ͳ͏৐Γӽ͑Δ͔໰୊

  View Slide

 17. ͔ͩΒKaggle೔ه͸͡Ίͨ

  View Slide

 18. ετϨεʹͳΓͦ͏ͳ΋ͷ
  ໋໊نଇͷഁ୼ ࡞ۀ࠶։ίετ ΞΠσΞͷރׇ
  ࢀߟจݙͷ؅ཧ
  outputϑΝΠϧͷ؅ཧ
  ࠶ݱੑ͕ͳ͍
  18
  2೔ۭ͚ͪΌͬͯ࠶։͠Α
  ͏ͱࢥ͚ͬͨͲΊΜͲ͘͞
  ͍͔Β໌೔΍Ζ͏ɻ
  ΋͏ΞΠσΞ͕ਚ͖·ͨ͠ɻ
  A.pyಈ͔ͯ͠Ͱ͖ͨa.csv
  Λ࢖ͬͯB.pyΛಈ͔ͤ͹࠶
  ݱͰ͖Δ͸ͣͩΑɻ
  ·͋ɺͰ͖ͳ͔ͬͨΜͰ͢
  ͚ͲͶ:)
  ࢀߟʹͳΓͦ͏ͳ࿦จ΍Βه
  ࣄ΍ΒΛอଘ͚ͨ͠ͲͲΕΛ
  ಡΜͰͲΕΛಡΜͰͳ͍ͷ͔
  Θ͔ΒΜɻ
  exp_5fold.ipynb
  exp_6fold_3.ipynb
  exp_5fold_seed42.ipynb
  exp_5fold_nn_v5.ipynb
  exp_4fold_nn_prepro.ipynb
  exp_5fold_nn_postproc_v2.ipynb
  exp_5fold_fix_bug.ipynb
  ແݶϧʔϓ
  ͜ͷ
  ultra_super_feature.csv
  ͸Ͳ͏΍ͬͯ࡞ΒΕͨΜͰ͢
  ͔Ͷʁ
  Todo Doing Done

  View Slide

 19. ରࡦͰ͖Ε͹ԿͰ΋ྑ͍
  લϖʔδʹ্͛ͨΑ͏ͳ໰୊ʹͰ͖Δ͚ͩແཧͳ͘ରॲ͢Δ
  ͨΊʹKaggle೔هͱ͍͏ํ๏ΛऔΓ·ͨ͠
  ͨͩɺͪΐͬͱ஫ҙͳͷͰ͕͢
  - Kaggle೔هͰ͋Δඞཁ͸ͳ͍͠
  - ͦ΋ͦ΋ࡉ͔͍໰୊ʹͦΕͧΕରॲΛແཧͯ͢͠Δඞཁ
  ΋ͳ͍͠
  - ͳΜͳΒ͠ΜͲ͍ͳΒKaggle΍Βͳ͍બ୒ࢶ΋͋Δ
  ͷͰ๻ͷҙݟ͸બ୒ࢶͷҰ͙ͭΒ͍ʹड͚औ͍ͬͯͩ͘͞
  19

  View Slide

 20. ෼ࢠίϯϖͷࢥ͍ग़ ॳࢀઓͷίϯϖʂʂ
  Kaggle೔هσϏϡʔʂʂ
  ࣌ؒ
  public LB
  (score)
  0 pandasͬͯ
  φϯμεʁ
  3ϲ݄
  ಔϝμϧϥΠϯ
  groupby
  ŧŔŕŪżŞƂŜŽūŘ
  oofͬͯԿʂʁ
  lightGBMʁʁʁ
  ςʔϒϧ͕ͳΜͰ
  ͨ͘͞Μ͋Δͷʂʁ
  ٙ໰͕ͨ·ͬͨࠒʹͨ·ͨ·
  ΧϨʔ͞Μͷ
  ʮKaggleͷνϡʔτϦΞϧʯ
  ʹग़ձ͍׬શʹཧղ͢Δ
  ࢒Γ2೔͙Β͍
  ͰॳΊͯ
  ಔϝμϧݍ಺ʹ
  ͜ͷ࣌ظ͸codeͱdiscussionΛશ෦ݟͯͱΓ͋
  ͑ͣ࢖͏ͱ͍͏ͷΛߦ͍είΞ্͕͕ͬͨɻ
  ͨͩެ։ϊʔτϒοΫະຬɻ
  ͜͜·ͰஷΊ͍ͯͨΦϦδφϧͷ
  ΞΠσΞͷ࣮૷Λ͸͡Ίͨɻ
  ͱʹ͔࣮͘૷଎౓͕஗͍ɻ
  15ݸఔ౓ࢼͯ͠
  3ݸ͙Β͍͕౰ͨͬͨɻ
  ࠷ޙͷ7೔ͰΞϯαϯϒ
  ϧΛษڧͯ͠ࢼͨ͠ɻ
  ಔϝμϧ(9%)
  ϑΟχογϡʂ
  20

  View Slide

 21. Thanks :)

  View Slide