Upgrade to Pro — share decks privately, control downloads, hide ads and more …

機械学習を用いた長期戦・短期決戦の差異分析と成績補正 / Analyze Differences between Long-Term and Short-Term on Baseball using Machine Learning

機械学習を用いた長期戦・短期決戦の差異分析と成績補正 / Analyze Differences between Long-Term and Short-Term on Baseball using Machine Learning

※Speaker Deckの仕様で、途中でサイズが変わるPDFに対応していないようです。ダウンロードしてご覧ください。

ひろしまQuest2020#stayhome【アイデア部門】
https://signate.jp/competitions/277/summary

ソースコード
https://github.com/upura/signate-hiroshima-quest-idea

Shotaro Ishihara

October 02, 2020
Tweet

More Decks by Shotaro Ishihara

Other Decks in Technology

Transcript

  1. ػցֶशΛ༻͍ͨ௕ظઓ
    ɾ୹ظܾઓͷࠩҟ෼ੳͱ
    ੒੷ิਖ਼
    ͻΖ͠·2VFTUΞΠσΞ෦໳
    V4IPUBSP*TIJIBSB
    ೥݄೔

    View Slide

  2. എܠɾ໨త
    • Χʔϓ͸୹ظܾઓ͕ۤख
    ˞ηϦʔά࿈೼ͷ೥ͷઓ੷
    • ϨΪϡϥʔɿউഊ෼ʢউ཰ʣ
    • ͏ͪަྲྀઓҎ֎ɿউഊ෼ʢউ཰ʣ
    • ͏ͪަྲྀઓɿউഊ෼ʢউ཰ʣ
    • ϙετɿউഊ෼ʢউ཰ʣ
    • ௕ظઓɾ୹ظܾઓͷҧ͍Λೝࣝͨ͠ઓ͍ํ͕ݤͱͳΔ
    • ೥ͷ೔ຊγϦʔζޙʹ͸ɺ໺ؒબख΋྆ऀͷҧ͍ʹͭ
    ͍ͯޠ͍ͬͯΔ

    View Slide

  3. ෼ੳͷ࿮૊Έ
    આ໌ม਺ Xɿ
    Ұٿ͝ͱͷঢ়گ
    ౤ख*% ଧऀ*% ٿछ ίʔε ... "#
    $
    00000001 0000000a ετϨʔτ ਅΜத 1
    00000001 0000000a ϑΥʔΫ ӈԼ 1
    00000001 0000000b εϥΠμʔ ࠨ 0
    ໨తม਺ yɿ
    ަྲྀઓ͔൱͔
    -JHIU(#.
    ʮަྲྀઓ͔൱͔ʯͷ෼ྨثΛ܇࿅
    ೥ͷσʔλͰ"6$Λୡ੒
    ಛ௃ྔͷॏཁ౓΍༧ଌ஋Λ׆༻͍ͯ͘͠
    ˞໨తม਺͸ʮϙετγʔζϯ͔൱͔ʯʹ͢΂͖ʹ΋ࢥ͑Δ͕ɺϙετγʔζϯલʹ෼͔
    Δ৘ใͰ෼ੳ͢Δ͜ͱͰɺϙετγʔζϯதͷࡃ഑ʹ׆͔͢͜ͱ͕ՄೳͰ͋Δ

    View Slide

  4. ෼ੳɾ׆༻ͷ؍఺
    Կ͕ҧ͍ΛੜΜͰ͍Δͷ͔ʁ
    ୹ظܾઓͬΆ͍ঢ়گͰϓϨʔ͍ͯ͠Δબखʁ
    ௕ظઓͷ੒੷Λ୹ظܾઓ൛ʹม׵͢Δͱʁ
    • ҎԼɺ෼ੳ݁Ռͷཁ఺Λܝࡌ͠·͢
    • ৄࡉ͸຤ඌʹఴ෇ٕͨ͠ज़ϨϙʔτΛ֬͝ೝ͍ͩ͘͞

    View Slide

  5. ಛ௃ྔͷॏཁ౓
    ॏཁ౓͔Βɺ୹ظܾઓͷಛ௃͕෼͔Δ
    • ౤खͷ౤ٿ਺ɾଧऀ਺ɾొ൘ॱ
    ୹ظܾઓͰ͸౤खަ୅͕ଟ͍ʁ
    • ϓϨΠલಘ఺਺
    ୹ظܾઓͳΒͰ͸ͷಘ఺܏޲ʁ
    • ౤ٿҐஔ۠Ҭɾٿछ
    ౤ٿ܏޲ʹ΋ҧ͍͕ʁ
    • ଧऀͷଧ੮਺
    ୹ظܾઓͰ͸୅ଧ͕ଟ͍ʁ

    View Slide

  6. ୹ظܾઓͬΆ͍ଧऀ
    • ٿ͝ͱͷ༧ଌ஋Λଧऀ୯ҐͰूܭ
    • ༧ଌ஋͕େ͖͍΄Ͳ௕ظઓͬΆ͍
    • ༧ଌ஋͕খ͍͞΄Ͳ୹ظܾઓͬΆ͍
    • ࣮ࡍͷΫϥΠϚοΫεγϦʔζͰɺग़ྥ཰
    ׂΛ௒͑ͨଧऀ͸ɺ৽Ҫɾాதɾ੢઒ɾ
    όςΟελɾদࢁɾؙ
    άϥϑԼ෦ʹू·͍ͬͯΔ
    • άϥϑ্෦ͷঙ࢘ɾఱ୩ɾؠຊɾখۼ͸ɺ
    ग़৔͢Δ΋ग़ྥ͕ͳ͔ͬͨ
    [email protected]

    View Slide

  7. ੒੷ิਖ਼
    • ଧ੮͝ͱͷ݁Ռʹ୹ظܾઓ౓߹͍Λֻ͚߹ΘͤΔ
    ྫ͑͹ɺग़ྥ͔ͨ͠൱͔
    ʹ֬཰Λֻ͚Δ
    ؠຊͷ୹ظܾઓͷ
    ิਖ਼ࡁΈग़ྥ཰͸
    ൺֱత௿Ίʹग़Δ
    Լਫྲྀ͸ى༻ͯ͠
    ΋໘ന͔ͬͨʁ
    ΫϥΠϚοΫε
    γϦʔζग़৔ͳ͠



    ˞ࠓճఏڙͷσʔλ͔Β؆қతʹ࡞੒ͨͨ͠Ίग़ྥ཰ʹଟগͷޡࠩΛؚΉ

    View Slide

  8. ·ͱΊ
    • ΧʔϓΛ೔ຊҰʹ͢Δ΂͘ɺۤखͱ͢Δ୹ظܾઓͷԿ͕௕ظ
    ઓͱҟͳΔ͔ػցֶशͰ෼ੳͨ͠
    • ෼ੳ݁ՌΛߟ࡯͠ɺ୹ظܾઓͰ׆༂͠͏Δબखͷಛఆ΍੒੷
    ิਖ਼ͷํ๏ΛఏҊͨ͠
    • ຊݚڀͷߩݙ
    ୹ظܾઓ͔൱͔ͷ෼ྨثΛ࡞Δͱ͍͏໰୊ઃܭͷఏҊ
    σʔλʹجͮ͘୹ظܾઓͷಛ௃Λ໌Β͔ʹͨ͠఺
    ୹ظܾઓͷࡃ഑ʹ׆༻Ͱ͖Δ஌ݟΛಋ͍ͨ఺

    View Slide

  9. _∞f“í(D_w &˚Ì z&nÓp⌃êh⇣>‹c
    u++ / Shotaro Ishihara
    Abstract ,vgo ´¸◊íÂ, kYãyO ÊKhYãÌ z&nULw &hpj
    ãnKí⌃êYã⇥wSÑko §A&K&K íÓÑ phYã_∞f“n⌃^hí◆ÙW
    y¥œnÕŶÑà,$íBÅ_⇥⌃êPúí⇤flW Ì z&g;çWFãxKnyöÑ⇣
    >‹cnπ’í–HW_⇥
    1 oXÅk
    Â, kjãko w ìkè_äüΩUåãÏ
    ÆÂȸ∑¸∫Ûg͸∞n⌦M3¡¸‡keä ]
    nån›π»∑¸∫Û ØÈ§fi√Øπ∑͸∫˚
    Â,∑͸∫ í›aúO≈ÅLBã⇥Ɉq↵´
    ¸◊o2016⌧2018tk͸∞3#áíú_W_Çn
    n 2016˚2018toÂ,∑͸∫ 2017toØÈ§
    fi√Øπ∑͸∫gWåfÂ, í⇤W_⇥
    2016⌧18tn´¸◊n&>í!k:Y⇥
    • ÏÆÂȸ⇢259›162W8⌃ ›á0.615
    • Fa§A& ⇢229›139W7⌃ ›á0.622
    • Fa§A&⇢30›23W1⌃ ›á0.566
    • ›π»⇢9›12W1⌃ ›á0.428
    ÏÆÂȸ∑¸∫Ûgo3tìg›á0.615íác
    _L ›π»∑¸∫Ûgo0.428khi~c_⇥ Ï
    ÆÂȸ∑¸∫Ûn-ko ÷n͸∞n6¡¸‡
    h3f Zd&F§A&Ç+~åfDã⇥§A&Ç
    nµhpjã¯Khpf n&FhDF✏sg
    .nÌ z&hãjYShLÔ˝gBã⇥´¸◊
    n3tìnÏÆÂȸ∑¸∫Ûn⇣>í§A&K&K
    g⌃Qãh ›ák önÓLBãh∫çgM_⇥
    Ì z&í›aúO_Åko w &hnUDí
    çXW_«MLuhjãh⇤Hâåã⇥ 2018tnÂ
    ,∑͸∫åko´¸◊nŒì˚exKÇ!⇧nU
    DkdDfûcfDã[1]⇥
    2 –HK’
    ,vgo «¸øk˙eMÌ z&ny¥í⌫
    D˙W Ì z&g;çWFãxKnyöh⇣>‹
    cπ’n–Hk÷äDÄ⇥–HK’nÇÅíFigure
    1k:Y⇥
    –HK’go SIGNATEgã¨Uå_ rçW
    ~Quest2020stayhome⇢◊ÌŒ⇤«¸øí(D_M
    ⇤à, [2]n«¸øª√»í(Df §A&K&K
    n⌃^hí◆ÙYã⇥¨ p y¥œ XhWf
    o ⇤Thn∂¡ ÓÑ pyhWfo §A&
    Figure 1: –HK’nÇÅ
    K&K í(Dã⇥ fio´¸◊nñπkÀcf⇤
    Hã_Å —˚͸∞n¡¸‡ În˛&«¸øo
    d W_⇥«¸ø ìo⇤.˚≥¸πn≈1L+~
    åfDã2017tn1tìhW_⇥
    ›π»∑¸∫ÛK&K gojO §A&K&
    K íÓÑ phW_no üõn;(∑¸Ûí⇤
    H__ÅgBã⇥ ãM⇧nπL finňk Ù
    WfDãàFk HãL ›π»∑¸∫ÛnPúo
    ›π»∑¸∫ÛLÀ~ãMk⌃KâjDng ›π
    »∑¸∫Û-n«Mk;KYShL Ô˝gBã⇥
    ]n_ŧA&íÌ z&hãjW ›π»∑¸∫
    ÛíMkÏÆÂȸ∑¸∫ÛnPúKâ πjÂã
    íóâåãàFj†Dí–HW_⇥
    _∞f“¢Î¥Í∫‡hWfo ∆¸÷Îb✏n
    «¸øíqF⌦gÿD'˝ízÓYãhÂâåfD
    ã˛M÷¸π∆£Û∞zö(n LightGBM [3]í
    °(Yã⇥ LightGBMoy¥œnÕŶíó˙Yã
    _˝ÇôHfJä fin(ksWfDã⇥
    ,vn¢.o!n3πgBã⇥
    • Ì z&K&Kn⌃^hí\ãhDFOL-
    n–H
    • «¸øk˙eDfÌ z&ny¥í âKkY
    ãπ
    • y¥œnÕŶÑà,$í;(W Ì z&n
    «Mk;(gMãÂãí Oπ

    View Slide

  10. 3 ü◆
    3.1 y¥œnæ˚
    f“hn◆ÙkS_cf «¸ø✏) Leakage
    [4]h|påãOLk˛ÊYãyO y¥œíæ˚Y
    ã≈ÅLBã⇥«¸ø✏)ho f“nõk,eÂ
    âjDoZn≈1í SkcfW~FShg*Â
    n«¸øk˛YãN('L↵Lã˛aí✏sYã⇥
    üõk fi–õUå_«¸øª√»Önhfn´
    ȇhy¥œhWf)(W_4 ⌃^n⌥⇡gB
    ã AUC [5]g 'n1íT⇣W_⇥«¸øª√»o
    f“(˚∆π»(k4:1g⌃rW f“Bkok = 5n
    §Ó⌧<íüΩWfDã⇥ 5dn⌃^hnsGí
    BÑj∆π»(«¸øª√»k˛Yãà,$hW
    _⇥
    Sn4 ny¥œnÕŶíFigure 2k:Y⇥
    Figure 2: «¸ø✏)WfDãy¥œnã
    ¡¸‡íyögMãIDL⌦MkefDãShL⌃
    Kã⇥SåoãHp€¸‡¡¸‡L—˚͸∞n¡
    ¸‡n4 §A&hDFShLπ◆k$ögMã
    _ÅgBã⇥ XOÂÿn≈1Ç⌦MkefDãL
    Såo§A&L ön ì-k~h~cfüΩUå
    fDã_ÅgBã⇥
    Såâny¥œoXk'˝í⌘⌦YãÓÑn_Å
    ko (`L fin(koiUjD⇥ÕŶí
    ∫çWjLâ SnàFj«¸ø✏)kjäóãy
    ¥œo«¸øª√»Kâ÷ädD_⇥
    3.2 BÑjy¥œhPú
    BÑk°(W_22ny¥œí!k:Y⇥Sny
    ¥œí(Dff“hí◆ÙW ∆π»(n«¸øª
    √»g'˝í∫KÅ_hSç AUCg0.801hDFP
    úíó_⇥jJœ§—¸—È·¸øjins0kd
    Dfo%–˙YãΩ¸π≥¸…g∫çgMã⇥
    • ⇤.
    • ï⇤Mn:fl
    • §ÀÛ∞ÖS-p
    • S-Öï⇤p
    • ïKï⇤ÊÛ
    • ïKyr
    • ïK{⌃
    • ïKf Ö˛&S⇧p
    • ïKf Öï⇤p
    • ïK§ÀÛ∞Öï⇤p
    • S⇧S-ÊÛ
    • S⇧f ÖS-p
    • ◊ϧM€¸‡¡¸‡óπp
    • ◊ϧM¢¶ß§¡¸‡óπp
    • ◊ϧM¢¶»p
    • ◊ϧM‹¸Îp
    • ◊ϧMπ»È§Øp
    • ◊ϧMp⇧∂¡
    • ïKnï⇤nÊÛ
    • ïKnS⇤nÊÛ
    • S⇧nï⇤nÊÛ
    • S⇧nS⇤nÊÛ

    View Slide

  11. Figure 3: y¥œnÕŶ
    4 ⇤fl
    4.1 y¥œnÕŶ
    y¥œnÕŶíFigure 3k:Y⇥
    ãHp ïKnï⇤p˚S⇧p˚{⌃ o Ì
    z&goïK§„L⇢DShí:⌃WfDã⇥~
    _ ◊ϧMóπp Ñ ï⇤Mn:fl˚⇤. o
    Ì z&jâgonóπæ⌘Ñï⇤æ⌘LBãh⇤
    flgMã⇥ S⇧nS-p o Ì z&go„SL
    ⇢DπjiLÛögMã⇥
    SnàFk–HK’kàcf Så~gö'Ñk
    op÷UåfD_Ì z&ny¥íöœÑk⌫D˙
    YShLÔ˝hjc_⇥
    4.2 Ì z&g;çWFãxKnyö
    §Ó⌧åk˛Wf f“hnà,$LóâåfDã⇥ S
    n1⇤Thnà,$í´¸◊nS⇧XMg∆ W_P
    úíFigure 4k:Y⇥à,$ow &¶ Dí:Wf
    Jä ✏UD{iÌ z&h^W_xKgBãh Hã⇥
    ∞È’↵ËkMnYãS⇧o w &n-gÇÌ
    z&k—D∂¡↵g&cfJä s8⇢än;ç
    Figure 4: S⇧XMnà,$n-.$
    Lãº~åãxK`h⇤Hâåã⇥ 2017tn´¸
    ◊oØÈ§fi√Øπ∑͸∫gDeNAkW⌫W_L
    h5&⇢óg˙Aá3rÖHnS⇧o ∞ï¥ixK
    ˚0-ÉxK˚›ç¨xK˚–∆£πøxK˚
    ~q‹sxK˚8sixK`c_⇥Sn6xKo∞È
    ’↵Ëk∆~cfDãh⌃Kã⇥ πg ∞È’⌦
    Ënѯº∫xK˚)7ó ŒxK˚©,¥’xK
    ˚✏™Ú_xKo ˙4YãÇ˙Aáo0khi~c
    _[6]⇥
    4.3 ⇣>‹cπ’n–H
    M¿goÌ z&g;çWFãxKíyöW_L
    ⌃^hnà,$n'✏níp÷WÏÆÂȸ∑¸
    ∫Ûn⇣>í⇤ngMfDjD⇥,¿go ÏÆÂ
    ȸ∑¸∫Ûgn⇣>í‹cYãπ’í–HYã⇥
    ⇣>nLPhWfo˙AáíqD S-Thn˙
    AW_K&KnPúkÌ z&¶ Dn∫áíõQ
    ã⇥⌃^hnà,$ow &¶ Dí:WfJä
    1Kâ OShgÌ z&¶ DhãjW_⇥
    ãHp hBãxKLh5S-nFa n2S-
    g˙AW ]å^ånÌ z&¶ DL0.8h0.7n4
    í⇤Hã⇥SnhM ˙Aáo0.4g ‹c ˙
    Aá o0.3kjã⇥
    (1 ⇥0.8 + 1 ⇥ 0.7 + 0 + 0 + 0)/5 = 0.3

    View Slide

  12. Sn⌥⇡o ÇhÇhnÏÆÂȸ∑¸∫Ûn⇣
    >í˙k Ì z&¶ DL'MDxK{i'MD
    $L˙ã⇥
    2017tn´¸◊nS⇧nÏÆÂȸ∑¸∫Ûn˙
    Aáh‹c ˙AáíFigure 5k:Y⇥
    Figure 5: 2017tnÏÆÂȸ∑¸∫Ûn˙Aáh‹
    c ˙Aá
    ∞È’⌦Ëíããh ©,xKn‹c ˙A
    áL‘⇤ÑNÅk˙fDãh∫çgMã⇥ ´¸◊
    o2017tnDeNAhnØÈ§fi√Øπ∑͸∫g
    ,4&n1πí˝Fmfin!{ÄAn}_k„Sg©
    ,xKíw(W_⇥Púoz/ä /kBèä P
    @óπíjHZSnf í=hWf2›3W ¢…–
    Û∆¸∏+Ä gãKíKQâå_⇥Sn4bL∑
    ͸∫ízöeQ_h⌥XWfDã⇠ãÇBã[7]⇥
    ∞È’↵Ëgo ↵4A⇥xKo‹c ˙Aá
    L‘⇤ÑÿÅk˙fDã⇥ 2017tnØÈ§fi√Øπ
    ∑͸∫n˙4ojKc_L Sn∞È’íããh
    w(Yãxû¢ÇX(W_àFk⇤Hâåã⇥
    jJSn˙Aáo fi–õUå_«¸øª√»K
    â S-åk¢¶»pLóHfDãK&KhDFa
    ˆg!◆Ñk\⇣W_⇥µË ÈBjif n å
    nË⌃í⇤ngMfJâZ ⇢⌘n§Óí+Ä⇥
    Sn⇣>‹cn⇤Hπo ˙Aá`QgojO
    OPS (On–base plus slugging) [8]Ñ WHIP (Walks
    plus Hits per Inning Pitched) [9]jin⌥⇡kÇi
    (gMã⇥
    5 P÷
    ,vgo ´¸◊íÂ, kYãyO ÊKh
    YãÌ z&nULw &hpjãK_∞f“g⌃
    êW_⇥wSÑko §A&K&K íÓÑ ph
    Yã_∞f“n⌃^hí◆ÙW y¥œnÕŶÑ
    à,$íBÅ_⇥⌃êPúí⇤flW Ì z&g;
    çWFãxKnyöÑ⇣>‹cnπ’í–HW_⇥
    6 F⇧9À
    u++ / Shotaro Ishihara: ãm⇢>g«¸ø⌃êk
    ìã⇥«¸ø⌃ê≥Û⁄gnfi—róL⌥pfiBä
    «>Kâ PythongoXÅãKaggleπø¸»÷√
    Ø✏í˙HW_⇥ Sports Analyst Meetup hLW_
    §ŸÛ»íK∂WfDã⇥ https://upura.github.io/
    ¬⇤á.
    [1] åW õ†Q`c_… Â,SgüõÓí X
    _ɈL≈{gNTYÀc≠„Û◊, ’δ¶
    Û»—Œ⇤˚MLBnœ ≥ȇµ§»— (2018),
    https://full-count.jp/2018/11/20/post251994.
    [2] rçW~Quest2020stayhome⇢◊ÌŒ⇤«¸øí
    (D_M⇤à,, SIGNATE - Data Science Compe-
    tition (2020), https://signate.jp/competitions/274.
    [3] Guolin Ke et al. ”LightGBM: A Highly E cient
    Gradient Boosting Decision Tree”. In Proceed-
    ings of the 31st International Conference on Neu-
    ral Information Processing Systems, pp. 3149-3157
    (2017).
    [4] Shachar Kaufman et al. ”Leakage in data mining:
    formulation, detection, and avoidance”. In Pro-
    ceedings of the 17th ACM SIGKDD international
    conference on Knowledge discovery and data min-
    ing, pp. 556–563 (2011).
    [5] sklearn.metrics.roc auc score, scikit-learn
    0.23.1 documentation (2020), https://
    scikit-learn.org/stable/modules/generated/
    sklearn.metrics.roc auc score.html.
    [6] 2017t¶Éˆq↵´¸◊↵∫ïK⇣> ØÈ§fi
    √Øπ∑͸∫’°§ Îπ∆¸∏ , NPB.jp
     , Œ ⇤ _ À (2017), https://npb.jp/bis/2017/
    stats/idb1s2 c.html.
    [7] É ˆ CSW n3d n Å ‡⇥ ª ˚ Í ¸ ∞ ã ⇧
    oj\↵K⌦í1W_nK , Ÿ¸π‹¸Î
    ¡„ÛÕÎ(BaseBall Channel) (2017), https://
    www.baseballchannel.jp/npb/40685.
    [8] What is a On-base Plus Slugging (OPS)?, Glos-
    sary (2020), http://m.mlb.com/glossary/standard-
    stats/on-base-plus-slugging.
    [9] Walks And Hits Per Inning Pitched (WHIP), Glos-
    sary (2020), http://m.mlb.com/glossary/standard-
    stats/walks-and-hits-per-inning-pitched.

    View Slide