Upgrade to Pro — share decks privately, control downloads, hide ads and more …

機械学習を用いた長期戦・短期決戦の差異分析と成績補正 / Analyze Difference...

機械学習を用いた長期戦・短期決戦の差異分析と成績補正 / Analyze Differences between Long-Term and Short-Term on Baseball using Machine Learning

※Speaker Deckの仕様で、途中でサイズが変わるPDFに対応していないようです。ダウンロードしてご覧ください。

ひろしまQuest2020#stayhome【アイデア部門】
https://signate.jp/competitions/277/summary

ソースコード
https://github.com/upura/signate-hiroshima-quest-idea

Shotaro Ishihara

October 02, 2020
Tweet

More Decks by Shotaro Ishihara

Other Decks in Technology

Transcript

  1. എܠɾ໨త  • Χʔϓ͸୹ظܾઓ͕ۤख ˞ηϦʔά࿈೼ͷ೥ͷઓ੷ • ϨΪϡϥʔɿউഊ෼ʢউ཰ʣ • ͏ͪަྲྀઓҎ֎ɿউഊ෼ʢউ཰ʣ •

    ͏ͪަྲྀઓɿউഊ෼ʢউ཰ʣ • ϙετɿউഊ෼ʢউ཰ʣ • ௕ظઓɾ୹ظܾઓͷҧ͍Λೝࣝͨ͠ઓ͍ํ͕ݤͱͳΔ • ೥ͷ೔ຊγϦʔζޙʹ͸ɺ໺ؒબख΋྆ऀͷҧ͍ʹͭ ͍ͯޠ͍ͬͯΔ
  2. ෼ੳͷ࿮૊Έ  આ໌ม਺ Xɿ Ұٿ͝ͱͷঢ়گ ౤ख*% ଧऀ*% ٿछ ίʔε ...

    "# $ 00000001 0000000a ετϨʔτ ਅΜத 1 00000001 0000000a ϑΥʔΫ ӈԼ 1 00000001 0000000b εϥΠμʔ ࠨ 0 ໨తม਺ yɿ ަྲྀઓ͔൱͔ -JHIU(#. ʮަྲྀઓ͔൱͔ʯͷ෼ྨثΛ܇࿅ ೥ͷσʔλͰ"6$Λୡ੒ ಛ௃ྔͷॏཁ౓΍༧ଌ஋Λ׆༻͍ͯ͘͠ ˞໨తม਺͸ʮϙετγʔζϯ͔൱͔ʯʹ͢΂͖ʹ΋ࢥ͑Δ͕ɺϙετγʔζϯલʹ෼͔ Δ৘ใͰ෼ੳ͢Δ͜ͱͰɺϙετγʔζϯதͷࡃ഑ʹ׆͔͢͜ͱ͕ՄೳͰ͋Δ
  3. ୹ظܾઓͬΆ͍ଧऀ  • ٿ͝ͱͷ༧ଌ஋Λଧऀ୯ҐͰूܭ • ༧ଌ஋͕େ͖͍΄Ͳ௕ظઓͬΆ͍ • ༧ଌ஋͕খ͍͞΄Ͳ୹ظܾઓͬΆ͍ • ࣮ࡍͷΫϥΠϚοΫεγϦʔζͰɺग़ྥ཰

    ׂΛ௒͑ͨଧऀ͸ɺ৽Ҫɾాதɾ੢઒ɾ όςΟελɾদࢁɾؙ άϥϑԼ෦ʹू·͍ͬͯΔ • άϥϑ্෦ͷঙ࢘ɾఱ୩ɾؠຊɾখۼ͸ɺ ग़৔͢Δ΋ग़ྥ͕ͳ͔ͬͨ IUUQTOQCKQCJTTUBUTJECT@DIUNM
  4. ੒੷ิਖ਼  • ଧ੮͝ͱͷ݁Ռʹ୹ظܾઓ౓߹͍Λֻ͚߹ΘͤΔ ྫ͑͹ɺग़ྥ͔ͨ͠൱͔  ʹ֬཰Λֻ͚Δ ؠຊͷ୹ظܾઓͷ ิਖ਼ࡁΈग़ྥ཰͸ ൺֱత௿Ίʹग़Δ

    Լਫྲྀ͸ى༻ͯ͠ ΋໘ന͔ͬͨʁ ΫϥΠϚοΫε γϦʔζग़৔ͳ͠   ˞ࠓճఏڙͷσʔλ͔Β؆қతʹ࡞੒ͨͨ͠Ίग़ྥ཰ʹଟগͷޡࠩΛؚΉ
  5. ·ͱΊ  • ΧʔϓΛ೔ຊҰʹ͢Δ΂͘ɺۤखͱ͢Δ୹ظܾઓͷԿ͕௕ظ ઓͱҟͳΔ͔ػցֶशͰ෼ੳͨ͠ • ෼ੳ݁ՌΛߟ࡯͠ɺ୹ظܾઓͰ׆༂͠͏Δબखͷಛఆ΍੒੷ ิਖ਼ͷํ๏ΛఏҊͨ͠ • ຊݚڀͷߩݙ

     ୹ظܾઓ͔൱͔ͷ෼ྨثΛ࡞Δͱ͍͏໰୊ઃܭͷఏҊ  σʔλʹجͮ͘୹ظܾઓͷಛ௃Λ໌Β͔ʹͨ͠఺  ୹ظܾઓͷࡃ഑ʹ׆༻Ͱ͖Δ஌ݟΛಋ͍ͨ఺
  6. _∞f“í(D_w &˚Ì z&nÓp⌃êh⇣>‹c u++ / Shotaro Ishihara Abstract ,vgo ´¸◊íÂ,

    kYãyO ÊKhYãÌ z&nULw &hpj ãnKí⌃êYã⇥wSÑko §A&K&K íÓÑ phYã_∞f“n⌃^hí◆ÙW y¥œnÕŶÑà,$íBÅ_⇥⌃êPúí⇤flW Ì z&g;çWFãxKnyöÑ⇣ >‹cnπ’í–HW_⇥ 1 oXÅk Â, kjãko w ìkè_äüΩUåãÏ ÆÂȸ∑¸∫Ûg͸∞n⌦M3¡¸‡keä ] nån›π»∑¸∫Û Øȧfi√Øπ∑͸∫˚ Â,∑͸∫ í›aúO≈ÅLBã⇥Ɉq↵´ ¸◊o2016⌧2018tk͸∞3#áíú_W_Çn n 2016˚2018toÂ,∑͸∫ 2017toØȧ fi√Øπ∑͸∫gWåfÂ, í⇤W_⇥ 2016⌧18tn´¸◊n&>í!k:Y⇥ • ÏÆÂȸ⇢259›162W8⌃ ›á0.615 • Fa§A& ⇢229›139W7⌃ ›á0.622 • Fa§A&⇢30›23W1⌃ ›á0.566 • ›π»⇢9›12W1⌃ ›á0.428 ÏÆÂȸ∑¸∫Ûgo3tìg›á0.615íác _L ›π»∑¸∫Ûgo0.428khi~c_⇥ Ï ÆÂȸ∑¸∫Ûn-ko ÷n͸∞n6¡¸‡ h3f Zd&F§A&Ç+~åfDã⇥§A&Ç nµhpjã¯Khpf n&FhDF✏sg .nÌ z&hãjYShLÔ˝gBã⇥´¸◊ n3tìnÏÆÂȸ∑¸∫Ûn⇣>í§A&K&K g⌃Qãh ›ák önÓLBãh∫çgM_⇥ Ì z&í›aúO_Åko w &hnUDí çXW_«MLuhjãh⇤Hâåã⇥ 2018tn ,∑͸∫åko´¸◊nŒì˚exKÇ!⇧nU DkdDfûcfDã[1]⇥ 2 –HK’ ,vgo «¸øk˙eMÌ z&ny¥í⌫ D˙W Ì z&g;çWFãxKnyöh⇣>‹ cπ’n–Hk÷äDÄ⇥–HK’nÇÅíFigure 1k:Y⇥ –HK’go SIGNATEgã¨Uå_ rçW ~Quest2020stayhome⇢◊ÌŒ⇤«¸øí(D_M ⇤à, [2]n«¸øª√»í(Df §A&K&K n⌃^hí◆ÙYã⇥¨ p y¥œ XhWf o ⇤Thn∂¡ ÓÑ pyhWfo §A& Figure 1: –HK’nÇÅ K&K í(Dã⇥ fio´¸◊nñπkÀcf⇤ Hã_Å —˚͸∞n¡¸‡ În˛&«¸øo d W_⇥«¸ø ìo⇤.˚≥¸πn≈1L+~ åfDã2017tn1tìhW_⇥ ›π»∑¸∫ÛK&K gojO §A&K& K íÓÑ phW_no üõn;(∑¸Ûí⇤ H__ÅgBã⇥ ãM⇧nπL finňk Ù WfDãàFk HãL ›π»∑¸∫ÛnPúo ›π»∑¸∫ÛLÀ~ãMk⌃KâjDng ›π »∑¸∫Û-n«Mk;KYShL Ô˝gBã⇥ ]n_ŧA&íÌ z&hãjW ›π»∑¸∫ ÛíMkÏÆÂȸ∑¸∫ÛnPúKâ πjÂã íóâåãàFj†Dí–HW_⇥ _∞f“¢Î¥Í∫‡hWfo ∆¸÷Îb✏n «¸øíqF⌦gÿD'˝ízÓYãhÂâåfD ã˛M÷¸π∆£Û∞zö(n LightGBM [3]í °(Yã⇥ LightGBMoy¥œnÕŶíó˙Yã _˝ÇôHfJä fin(ksWfDã⇥ ,vn¢.o!n3πgBã⇥ • Ì z&K&Kn⌃^hí\ãhDFOL- n–H • «¸øk˙eDfÌ z&ny¥í âKkY ãπ • y¥œnÕŶÑà,$í;(W Ì z&n «Mk;(gMãÂãí Oπ
  7. 3 ü◆ 3.1 y¥œnæ˚ f“hn◆ÙkS_cf «¸ø✏) Leakage [4]h|påãOLk˛ÊYãyO y¥œíæ˚Y ã≈ÅLBã⇥«¸ø✏)ho

    f“nõk,e âjDoZn≈1í SkcfW~FShg* n«¸øk˛YãN('L↵Lã˛aí✏sYã⇥ üõk fi–õUå_«¸øª√»Önhfn´ ȇhy¥œhWf)(W_4 ⌃^n⌥⇡gB ã AUC [5]g 'n1íT⇣W_⇥«¸øª√»o f“(˚∆π»(k4:1g⌃rW f“Bkok = 5n §Ó⌧<íüΩWfDã⇥ 5dn⌃^hnsGí BÑj∆π»(«¸øª√»k˛Yãà,$hW _⇥ Sn4 ny¥œnÕŶíFigure 2k:Y⇥ Figure 2: «¸ø✏)WfDãy¥œnã ¡¸‡íyögMãIDL⌦MkefDãShL⌃ Kã⇥SåoãHp€¸‡¡¸‡L—˚͸∞n¡ ¸‡n4 §A&hDFShLπ◆k$ögMã _ÅgBã⇥ XOÂÿn≈1Ç⌦MkefDãL Såo§A&L ön ì-k~h~cfüΩUå fDã_ÅgBã⇥ Såâny¥œoXk'˝í⌘⌦YãÓÑn_Å ko (`L fin(koiUjD⇥ÕŶí ∫çWjLâ SnàFj«¸ø✏)kjäóãy ¥œo«¸øª√»Kâ÷ädD_⇥ 3.2 BÑjy¥œhPú BÑk°(W_22ny¥œí!k:Y⇥Sny ¥œí(Dff“hí◆ÙW ∆π»(n«¸øª √»g'˝í∫KÅ_hSç AUCg0.801hDFP úíó_⇥jJœ§—¸—È·¸øjins0kd Dfo%–˙YãΩ¸π≥¸…g∫çgMã⇥ • ⇤. • ï⇤Mn:fl • §ÀÛ∞ÖS-p • S-Öï⇤p • ïKï⇤ÊÛ • ïKyr • ïK{⌃ • ïKf Ö˛&S⇧p • ïKf Öï⇤p • ïK§ÀÛ∞Öï⇤p • S⇧S-ÊÛ • S⇧f ÖS-p • ◊ϧM€¸‡¡¸‡óπp • ◊ϧM¢¶ß§¡¸‡óπp • ◊ϧM¢¶»p • ◊ϧM‹¸Îp • ◊ϧMπ»È§Øp • ◊ϧMp⇧∂¡ • ïKnï⇤nÊÛ • ïKnS⇤nÊÛ • S⇧nï⇤nÊÛ • S⇧nS⇤nÊÛ
  8. Figure 3: y¥œnÕŶ 4 ⇤fl 4.1 y¥œnÕŶ y¥œnÕŶíFigure 3k:Y⇥ ãHp

    ïKnï⇤p˚S⇧p˚{⌃ o Ì z&goïK§„L⇢DShí:⌃WfDã⇥~ _ ◊ϧMóπp Ñ ï⇤Mn:fl˚⇤. o Ì z&jâgonóπæ⌘Ñï⇤æ⌘LBãh⇤ flgMã⇥ S⇧nS-p o Ì z&go„SL ⇢DπjiLÛögMã⇥ SnàFk–HK’kàcf Så~gö'Ñk op÷UåfD_Ì z&ny¥íöœÑk⌫D˙ YShLÔ˝hjc_⇥ 4.2 Ì z&g;çWFãxKnyö §Ó⌧<nN↵g f“(n«¸øª√»]å^ åk˛Wf f“hnà,$LóâåfDã⇥ S n1⇤Thnà,$í´¸◊nS⇧XMg∆ W_P úíFigure 4k:Y⇥à,$ow &¶ Dí:Wf Jä ✏UD{iÌ z&h^<W_∂¡g◊ϸ W_xKgBãh Hã⇥ ∞È’↵ËkMnYãS⇧o w &n-gÇÌ z&k—D∂¡↵g&cfJä s8⇢än;ç Figure 4: S⇧XMnà,$n-.$ Lãº~åãxK`h⇤Hâåã⇥ 2017tn´¸ ◊oØȧfi√Øπ∑͸∫gDeNAkW⌫W_L h5&⇢óg˙Aá3rÖHnS⇧o ∞ï¥ixK ˚0-ÉxK˚›ç¨xK˚–∆£πøxK˚ ~q‹sxK˚8sixK`c_⇥Sn6xKo∞È ’↵Ëk∆~cfDãh⌃Kã⇥ πg ∞È’⌦ Ënѯº∫xK˚)7ó ŒxK˚©,¥’xK ˚✏™Ú_xKo ˙4YãÇ˙Aáo0khi~c _[6]⇥ 4.3 ⇣>‹cπ’n–H M¿goÌ z&g;çWFãxKíyöW_L ⌃^hnà,$n'✏níp÷WÏÆÂȸ∑¸ ∫Ûn⇣>í⇤ngMfDjD⇥,¿go ÏÆ ȸ∑¸∫Ûgn⇣>í‹cYãπ’í–HYã⇥ ⇣>nLPhWfo˙AáíqD S-Thn˙ AW_K&KnPúkÌ z&¶ Dn∫áíõQ ã⇥⌃^hnà,$ow &¶ Dí:WfJä 1Kâ OShgÌ z&¶ DhãjW_⇥ ãHp hBãxKLh5S-nFa n2S- g˙AW ]å^ånÌ z&¶ DL0.8h0.7n4 í⇤Hã⇥SnhM ˙Aáo0.4g ‹c ˙ Aá o0.3kjã⇥ (1 ⇥0.8 + 1 ⇥ 0.7 + 0 + 0 + 0)/5 = 0.3
  9. Sn⌥⇡o ÇhÇhnÏÆÂȸ∑¸∫Ûn⇣ >í˙k Ì z&¶ DL'MDxK{i'MD $L˙ã⇥ 2017tn´¸◊nS⇧nÏÆÂȸ∑¸∫Ûn˙ Aáh‹c ˙AáíFigure

    5k:Y⇥ Figure 5: 2017tnÏÆÂȸ∑¸∫Ûn˙Aáh‹ c ˙Aá ∞È’⌦Ëíããh ©,xKn‹c ˙A áL‘⇤ÑNÅk˙fDãh∫çgMã⇥ ´¸◊ o2017tnDeNAhnØȧfi√Øπ∑͸∫g ,4&n1πí˝Fmfin!{ÄAn}_k„Sg© ,xKíw(W_⇥Púoz/ä /kBèä P @óπíjHZSnf í=hWf2›3W ¢…– Û∆¸∏+Ä gãKíKQâå_⇥Sn4bL∑ ͸∫ízöeQ_h⌥XWfDã⇠ãÇBã[7]⇥ ∞È’↵Ëgo ↵4A⇥xKo‹c ˙Aá L‘⇤ÑÿÅk˙fDã⇥ 2017tnØȧfi√Øπ ∑͸∫n˙4ojKc_L Sn∞È’íããh w(Yãxû¢ÇX(W_àFk⇤Hâåã⇥ jJSn˙Aáo fi–õUå_«¸øª√»K â S-åk¢¶»pLóHfDãK&KhDFa ˆg!◆Ñk\⇣W_⇥µË ÈBjif n å nË⌃í⇤ngMfJâZ ⇢⌘n§Óí+Ä⇥ Sn⇣>‹cn⇤Hπo ˙Aá`QgojO OPS (On–base plus slugging) [8]Ñ WHIP (Walks plus Hits per Inning Pitched) [9]jin⌥⇡kÇi (gMã⇥ 5 P÷ ,vgo ´¸◊íÂ, kYãyO ÊKh YãÌ z&nULw &hpjãK_∞f“g⌃ êW_⇥wSÑko §A&K&K íÓÑ ph Yã_∞f“n⌃^hí◆ÙW y¥œnÕÅ¶Ñ à,$íBÅ_⇥⌃êPúí⇤flW Ì z&g; çWFãxKnyöÑ⇣>‹cnπ’í–HW_⇥ 6 F⇧9À u++ / Shotaro Ishihara: ãm⇢>g«¸ø⌃êk ìã⇥«¸ø⌃ê≥Û⁄gnfi—róL⌥pfiBä «>Kâ PythongoXÅãKaggleπø¸»÷√ Ø✏í˙HW_⇥ Sports Analyst Meetup hLW_ §ŸÛ»íK∂WfDã⇥ https://upura.github.io/ ¬⇤á. [1] åW õ†Q`c_… Â,SgüõÓí X _ɈL≈{gNTYÀc≠„Û◊, ’δ¶ Û»—Œ⇤˚MLBnœ ≥ȇµ§»— (2018), https://full-count.jp/2018/11/20/post251994. [2] rçW~Quest2020stayhome⇢◊ÌŒ⇤«¸øí (D_M⇤à,, SIGNATE - Data Science Compe- tition (2020), https://signate.jp/competitions/274. [3] Guolin Ke et al. ”LightGBM: A Highly E cient Gradient Boosting Decision Tree”. In Proceed- ings of the 31st International Conference on Neu- ral Information Processing Systems, pp. 3149-3157 (2017). [4] Shachar Kaufman et al. ”Leakage in data mining: formulation, detection, and avoidance”. In Pro- ceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data min- ing, pp. 556–563 (2011). [5] sklearn.metrics.roc auc score, scikit-learn 0.23.1 documentation (2020), https:// scikit-learn.org/stable/modules/generated/ sklearn.metrics.roc auc score.html. [6] 2017t¶Éˆq↵´¸◊↵∫ïK⇣> Øȧfi √Øπ∑͸∫’°§ Îπ∆¸∏ , NPB.jp  , Œ ⇤ _ À (2017), https://npb.jp/bis/2017/ stats/idb1s2 c.html. [7] É ˆ CSW n3d n Å ‡⇥ ª ˚ Í ¸ ∞ ã ⇧ oj\↵K⌦í1W_nK , Ÿ¸π‹¸Î ¡„ÛÕÎ(BaseBall Channel) (2017), https:// www.baseballchannel.jp/npb/40685. [8] What is a On-base Plus Slugging (OPS)?, Glos- sary (2020), http://m.mlb.com/glossary/standard- stats/on-base-plus-slugging. [9] Walks And Hits Per Inning Pitched (WHIP), Glos- sary (2020), http://m.mlb.com/glossary/standard- stats/walks-and-hits-per-inning-pitched.