Slide 1

Slide 1 text

Baseball Player Performance Prediction Using Feature Engineering with Python ⁶ R Shinichi Nakagawa(@shinyorke) PyCon JP 2020 Online 8/28 εϙʔπσʔλΛ༻͍ͨಛ௃ྔΤϯδχΞϦϯάͱ໺ٿબखͷ੒੷༧ଌ - PythonͱRΛߦͬͨΓདྷͨΓ

Slide 2

Slide 2 text

͜ͷൃද͸ • ಛ௃ྔΤϯδχΞϦϯάΛਖ਼͘͠ཧղ͠Α͏ • Python, R, SQLΈΜͳಘҙෆಘҙ͋ΔͷͰ͍͍ײ͡ʹ࢖͓͏ • ಛ௃ྔΤϯδχΞϦϯάͱػցֶशͰ⚾Λָ͠΋͏ʢॏཁʣ ͱ͍͏಺༰Ͱ͢.

Slide 3

Slide 3 text

Who am I ?ʢ͓લ୭Αʣ • Shinichi Nakagawaʢத઒ ৳Ұʣ • େ఍ͷSNSͰʮshinyorkeʢ͠ΜΑʔ͘ʣʯͱ໊৐͍ͬͯ·͢ • JX Press Corporation Senior Engineer ʢJX௨৴ࣾ γχΞɾΤϯδχΞʣ • Baseball Engineer, Data Scientist ʢ໺ੜͷ໺ٿΤϯδχΞɾσʔλαΠΤϯςΟετʣ • #Python #DataScience #Baseball⚾ #SABRmetrics #σʔλج൫ #ٕज़ސ໰

Slide 4

Slide 4 text

ʲCMʳJX௨৴ࣾ, Python΋͘΋ࣗ͘शࣨ

Slide 5

Slide 5 text

JX௨৴ࣾ #ͱ͸ • “ςΫϊϩδʔͰʮࠓى͖͍ͯΔ͜ͱʯΛ໌Β͔ʹ͢Δใಓػؔ” Λϛογϣϯͱ͢ΔใಓϕϯνϟʔͰ͢ • 300ສμ΢ϯϩʔυಥഁʂχϡʔε଎ใΞϓϦʮNewsDigestʯ ※DL਺͸2020/8/28ݱࡏͷ΋ͷ • BtoBϓϩμΫτʮFASTALERTʯʮJX௨৴ࣾ৘੎ௐࠪʯ • https://jxpress.net/

Slide 6

Slide 6 text

JX௨৴ࣾͱPython • αʔόʔαΠυ, ػցֶश, SREͳͲͳͲPythonΛੲ͔Β࢖ͬͯ·͢ • PyCon JP͸εϙϯαʔΛԿ౓͔΍ͬͯ·͢. 2016, 2017, 2019, 2020(New!) • ࠓ೥͸͞ΒʹεϐʔΧʔ͕ೋਓʂ(@YAMITZKY, @shinyorke) • Techϒϩάؤுͬͯ·͢, ಡΜͰͶ https://tech.jxpress.net/

Slide 7

Slide 7 text

Python΋͘΋ࣗ͘शࣨ #jisyupy • ʮٕज़ͱΠΠΰϋϯʯΛָ͠Έͳ͕Βʮֶࣗࣗशʯ͢Δձ. • 2017೥ʹ #rettypy ͱͯ͠ελʔτ, ͣͬͱΦʔΨφΠβʔͯ͠·͢. • 2020೥͔ΒΦʔΨφΠθʔγϣϯมߋͱ͔Ͱͪΐͬͱ͚ͩຯม. • ݱ࣌఺Ͱ͸ΦϯϥΠϯɾෆఆظ։࠵, ͍ͣΕϦΞϧ΋΍Γ͍ͨ. • ࣍ճ͸9/19 https://jisyupy.connpass.com/event/186611/

Slide 8

Slide 8 text

ʊਓਓਓਓਓਓਓਓਓਓਓਓਓਓʊ ʼɹͱͭͥΜͷ⽁ΫΠζʂʂʂɹʻ ʉY^Y^Y^Y^Y^Y^Y^Y^Y^Y^Y^Yʉ

Slide 9

Slide 9 text

େ୩ᠳฏ͞Μ, 3೥ޙͷγʔζϯຊྥଧ͸Կຊ? 1. 30ຊ 2. 32ຊ 3. 334ຊʢʁʁʁʣɹ˞ϝδϟʔͷγʔζϯه࿥͸70ຊͰ͢ ʲࢀߟ৘ใʳ ࡢ೥ʢ25ࡀʣ͸18ຊ, ͦͷલʢ24ࡀʣ͸20ຊͰͨ͠.

Slide 10

Slide 10 text

ͳΜͰ΍#ؔ܎ͳ͍΍Ζw ͜͏͍͏ΫΠζͷͨΊͷ΋ͷΛ࡞Γ·ͨ͠ ※౴͑͸ൃදͷޙ൒Ͱʂ

Slide 11

Slide 11 text

ຊ೔ͷࢼ߹⚾ • ༧બʮಛ௃ྔΤϯδχΞϦϯά #ͱ͸ʯ • ४ܾʮ⚾Ͱ͸͡ΊΔػցֶश - MLBͷଧऀ੒੷༧ଌʯ • ܾউʮ3೥ޙ, େ୩ᠳฏ͞Μ͸ԿຊͷϗʔϜϥϯΛଧͭͷ͔?ʯ

Slide 12

Slide 12 text

ಛ௃ྔΤϯδχΞϦϯά #ͱ͸ ੜσʔλΛటष͘ॲཧͯ͠ػցֶशͳΓ౷ܭͰ࢖͑ΔΑ͏ʹ͢Δ·Ͱͷ͸ͳ͠ ⚾Ͱͷಛ௃ྔΤϯδχΞϦϯάʹ͍ͭͯ΋ͪΐͬ͜ͱ৮Ε·͢

Slide 13

Slide 13 text

ಛ௃ྔΤϯδχΞϦϯά #ͱ͸ • ಛ௃ྔ͸ੜσʔλΛ਺஋ͱͯ͠දݱͨ͠΋ͷͰ͋Γ, • खݩʹ͋ΔσʔλɾϞσϧɾλεΫʹ࠷΋దͨ͠ಛ௃ྔΛ ࡞Γ্͛Δϓϩηε͕ಛ௃ྔΤϯδχΞϦϯάͰ͋Γ, • Python, Rͦͯ͠SQLͳͲͱ͍ͬͨݴޠΛϑϧ׆༻ͯ͠΍Γ͖Δ ΤϯδχΞϦϯάͷ૯߹֨ಆٕͰ͋Δ.

Slide 14

Slide 14 text

ಛ௃ྔΤϯδχΞϦϯάͱ⚾ • ਺஋ -> ਺஋ • ͦͷ··࢖͑ΔϞϊ͕ଟ͍. ྫ͑͹҆ଧ, ࢛ٿ, ࡾৼͳͲ. • ਖ਼نԽɾεέʔϦϯά͢Δ. RC, wRAA, wOBAͳͲͷηΠόʔϝτϦΫεࢦඪ. • ਺஋Ҏ֎ͷσʔλ -> ਺஋ • ར͖࿹, ଧ੮ͷࠨӈ, etc… • બखͷಛ௃ͱͳΔ༗ޮͳσʔλΛԿ͔͠ΒͷܗͰ਺஋Խ.

Slide 15

Slide 15 text

ʲྫʳଧ੮ͱར͖࿹Λಛ௃ྔʹ͢Δ • SQLʢ͜͜Ͱ͸BigQueryʣͷ৔߹ CASEจҰൃͰΠέ·͢. • ه࿥ͷܽଛͳͲ΋͋ΔͷͰ ELSEͱ͔ͷέΞ΋๨Εͣʹʂ • ͦͷଞग़਎஍ͱ͔΋͋Δ͔΋Ͷ ถࠃ: 1, ೆถ: 2, ೔ຊ: 25 …తͳ.

Slide 16

Slide 16 text

࠷దͳಛ௃ྔΛͲ͏΍ͬͯݟ͚ͭΔ? • ݟ͚ͭΔ͍ͬͯ͏ΑΓʮ໨తʹ߹Θͤͯࣗ෼Ͱ࡞͍ͬͯ͘ʯϞϊ • σʔλ෼ੳɾػցֶशΛ΍Δ໨తʹ߹ΘͤͯϞσϧͱσʔλΛ ୳͠, ࡞Γ, ৭ʑͱࢼߦࡨޡΛߦ͏. • $<΢νʹ͸σʔλ͕͍ͬͺ͍͋Δ, AI΍ΔͧʂDX΍ʂʂ༏উʂʂ →͜Ε͸యܕతͳෛ͚ࢼ߹ύλʔϯ. ಛ௃ྔʹग़དྷͳ͖Όҙຯແ͠% ɹ˞Սۭͷ࿩Ͱ͢ʢগͳ͘ͱ΋ฐࣾͰ͸ͳ͍ʣ

Slide 17

Slide 17 text

⚾ʹ͓͚Δಛ௃ྔͷߟ͑ํ - DIPS, RC, LWTS • DIPS: ౤ଧͷϓϨʔΛʮࣗ੹ʯʮଞ੹ʯʹ෼ྨ͠ѻ͏ʢԼਤΛࢀরʣ • RC: ಘ఺ೳྗΛʮʢग़ྥೳྗ + ਐྥೳྗʣ / ग़৔ػձʯͷϞσϧͰઆ໌͢Δ΍Γํ • LWTS: ϓϨʔͷҰͭҰͭΛʮಘ఺ʯʹ׵ࢉ͠, ౤ଧͷϓϨʔΛධՁ͢Δ • ʮηΠόʔϝτϦΫεʯͱ͍͏໺ٿͷ౷ܭϞσϧతͳߟ͑ํͰ͢&ৄ͘͠͸ࢲͷϒϩάʹͯ https://shinyorke.hatenablog.com/entry/sabr-metrics-batting-stats ಛ௃ ओͳࢦඪ ࣗ੹ ݸਓͷೳྗʹґଘ ύϫʔ εϐʔυ બٿ؟FUDʜ ຊྥଧ ࡾৼ ࢛ࢮٿ ଞ੹ આ໌ม਺͕ଟ͍ νʔϜ ٿ৔ ৹൑FUDʜ ࣗ੹఺ʢ๷ޚ཰ʣ ࣦࡦ

Slide 18

Slide 18 text

Python, R, SQLΛ࢖ͬͨಛ௃ྔΤϯδχΞϦϯά ཁ͢ΔʹσʔλΛྉཧ͢Δ࡞ۀ. ݴޠΛ࢖͍෼͚Ε͹͍͍ײ͡ʹͳΓ·͢.

Slide 19

Slide 19 text

ಛ௃ྔΫοΩϯάΤϯδχΞϦϯάૣݟද • Python, R, SQLͰ΍ΕΔ͜ͱʹେࠩφγʢಘҙෆಘҙ͸͋Δʣ • هड़ྔ, ؔ਺ͷ໨త, ॲཧ଎౓ʢ෼ࢄॲཧͷ༗ແʣͰ࢖͍෼͚ ࢀߟɿ https://shinyorke.hatenablog.com/entry/r-to-python ൺֱ߲໨ 1ZUIPO 3 42- هड़ྔ ʢಉ͡ࣄΛͨ͠ͱͯ͠ʣ Մ΋ͳ͘ෆՄ΋ͳ͠ σʔλΛѻ͏఺Ͱ͸ ൺֱతγϯϓϧ 1ZUIPO 3ͱൺ΂ ΍΍৑௕ ؔ਺ ʢܭࢉॲཧʣ 001తͳΞϓϩʔνଞ ϓϩάϥϚϒϧͰ͋Δ ਺ࣜతͳϞσϧΛ ਺ࣜͷ··Ͱ͖Δ ؔ਺ͰϐλΰϥεΠον ࢖͍ํΛؒҧ͑Δͱ஍ࠈ ॲཧ଎౓ ෼ࢄॲཧ ฒྻԽɾ෼ࢄॲཧͰ ૯߹తͳνϡʔχϯάڧ͍ ͋͘·Ͱܭࢉɾ౷ܭπʔϧ ॲཧ଎౓ɾ෼ࢄ͸΄Ͳ΄Ͳ %#ΤϯδϯʹΑͬͯ ଎౓ɾ෼ࢄͷߟ͕͑ҟͳΔ

Slide 20

Slide 20 text

Python, R, SQLͰ΍ΕΔ͜ͱʹେࠩφγ ݁ہ͸࢖͍෼͚͕Ұ൪େࣄ΍Ͱʂ #େ੾ͳͷͰೋ౓ݴ͏

Slide 21

Slide 21 text

⚾Ͱ͸͡ΊΔػցֶश - MLBͷଧऀ੒੷༧ଌ ⚾͓ΑͼPython, R, SQLͷಛ௃ྔΤϯδχΞϦϯάΛཧղͨ͠ͱͯ͠ ࠓճ͸ࣄྫͱͯ͠MLBʢϝδϟʔϦʔάʣͷଧऀ੒੷༧ଌʹνϟϨϯδ͠·ͨ͠

Slide 22

Slide 22 text

໺ٿͰ͸͡ΊΔػցֶश⚾ 1. Planning - ௐࠪɾاը 2. Data Engineering - σʔλऔಘ 3. Feature Engineering - ಛ௃ྔநग़ 4. Clustering - ΫϥελϦϯάʹΑΔ෼ྨ 5. Predict - ੒੷༧ଌ ࣅͨλΠτϧͷຊ͕͋Δͩͱ? ؾͷ͍ͤͰ͢Αؾͷ͍ͤʢখ੠ʣ

Slide 23

Slide 23 text

Planning - ௐࠪɾاը • ઌߦࣄྫɾ෼ੳͷௐࠪ • RͷαϯϓϧίʔυΛPythonͰࣸܦ • ΍Γ͍ͨ͜ͱΛΠϯηϓγϣϯσοΩͰ࢓༷ʹͨ͠ ϓϩδΣΫτܭըͳϑΣʔζͬͯ͜ͱͰ͢.

Slide 24

Slide 24 text

ઌߦࣄྫɾ෼ੳͷௐࠪ • ໺ٿσʔλେࠃͷΞϝϦΧͰ͸ΊͪΌͪ͘Όࣄྫ͋Δ. ϑΝϯ΍ιγϟήϢʔβʔ޲͚ͷ੒੷༧ଌαΠτ͕͋ΔϨϕϧ. • ͦͷதͰ, ʢshinyorkeతʹ΋ʣೲಘ͔ͭے͕ྑ͍ํ๏Λௐࠪɾ࠾༻ • PECOTAʢϖίλʣ • Analyzing Baseball Data with R

Slide 25

Slide 25 text

PECOTA - ࠷΋౷ܭతͳ੒੷༧ଌϞσϧ • 2003೥ʹϦϦʔεͨ͠MLBͷ੒੷༧ଌϞσϧ • ʮաڈͷࣅ͍ͯΔબखͷ੒੷ʯ͔Β༧ଌ੒੷Λࢉग़ ͳ͓۩ମతͳख๏ɾ਺ࣜ͸ඇެ։ʢߟ͑ํ͸ͪΒ΄Βॻ͍ͯ͋Δʣ • ޙʹ2008೥ถࠃେ౷ྖબڍͷউऀΛ49/50भతதͤͨ͞ ౷ܭֶऀωΠτɾγϧόʔ͕։ൃ ※ؾʹͳΔํ͸ʮγάφϧ&ϊΠζʯͱ͍͏ॻ੶ΛಡΜͰ͍ͩ͘͞

Slide 26

Slide 26 text

Analyzing Baseball Data with R • ηΠόʔϝτϦΫεΛ࢖ͬͨ໺ٿσʔλ෼ੳʹ͓͚Δఆ൪ຊ • ΞϝϦΧൃͰ2018೥ʹSecond Editionൃද, ΋ͪΖΜӳޠ • ໊લͷ௨Γ, ⚾σʔλ෼ੳͷຊͰ, ίʔυ͸͢΂ͯR த਎Λཧղ͢ΔͨΊ, RͷίʔυΛಡΈͳ͕ΒPythonʹࣸܦ

Slide 27

Slide 27 text

ʲྫʳઢܗճؼΛRͱPythonͰ΍Δͱ ࠨͷR͕ΦϦδφϧͰӈͷPython͕ࣗ෼Ͱࣸܦͨ͠΋ͷ.

Slide 28

Slide 28 text

R -> Pythonʹࣸܦͨ݁͠Ռ • RͰ΍Ζ͏͕PythonͰ΍Ζ͏͕݁Ռ͸มΘΒͳ͍, ͱཧղ • ࠓޙ࢖͏ϥΠϒϥϦʢ&ࣗ෼ͷशख़౓ʣߟ͑ͨΒPython • ͔͠͠, RͰ਺ࣜͱ͔ΊͪΌཧղͰ͖ͨͷͰRʹײँ ͪͳΈʹ౰࣌ͷ࡞ۀϩάʢ2019/11ʹ࣮ࢪʣ͸ϒϩάʹͯ͠·͢ https://shinyorke.hatenablog.com/entry/r-to-python

Slide 29

Slide 29 text

ͦͯ͠ํ਑͕ܾ·Δ • ༧ଌϞσϧ͸ωΠτɾγϧόʔͷPECOTAϞσϧͷਅࣅΛ͢Δ 1.࠷ۙ๣୳ࡧܥͷΞϧΰϦζϜͰ͍ۙબख୳͠ 2.֬཰෼෍ʢͬΆ͍ʣํ๏Ͱ༧ଌ੒੷Λ࡞Δ • ্هͷϞσϧ݁ՌΛAnalyzing Baseball Data with Rʹ͋ͬͨ ʮ೥ྸ্ͷϐʔΫΛࢉग़ʯ͢Δํ๏Ͱएׯิਖ਼

Slide 30

Slide 30 text

ݴޠԽͦͯ͠ϓϩδΣΫτͷ͸͡·Γ • ΠϯηϓγϣϯσοΩͰ࢓༷·ͱΊ ໎ࢠʹͳΒͳ͍Α͏ʹ • JiraΛ࢖ͬͯϓϩδΣΫτ؅ཧ ਐḿ؅ཧɾ࡞ۀϝϞΛ࢒ͨ͢Ί • ݁Ռతʹ͜ͷํ๏Ͱ࠷ޙ·Ͱ׬૸ ϓϩδΣΫτ໊͸ϝδϟʔͷສೳબखʮZobristʯͱ໋໊

Slide 31

Slide 31 text

Data Engineering - Dataऔಘ • ⚾σʔλͷऔಘ • ͍͍ײ͡ʹ੔ܗͯ͠Google BigQuery΁ CSV΍ଞͷܗࣜͰ͋ͬͨσʔλΛBigQueryʹ౷Ұ.

Slide 32

Slide 32 text

⚾Data is Ͳ͜& • Lahman’s Baseball Database • MLBશબखͷ௨ࢉ੒੷ɾग़਎ͷσʔλ. CSV੡. • http://www.seanlahman.com/baseball-archive/statistics/ • https://github.com/chadwickbureau/baseballdatabank • Retrosheet • ࢼ߹৘ใΛଧ੮୯ҐͰه࿥͍ͯ͠Δσʔληοτ. • https://www.retrosheet.org/ • https://github.com/chadwickbureau/retrosheet ݩʑ, shinyorke͕PyCon JP 2014Ҏདྷ͓ੈ࿩ʹͳ͍ͬͯͨσʔλͰ͢&࠷ۙ͸GitHubʹ΋͋ͬͯศརʂ

Slide 33

Slide 33 text

BigQueryʹ͢΂ͯΛूΊΔ • Lahman’s Database, RetrosheetͷσʔλΛSQLͰ࢖͑ΔϨϕϧ ͷલॲཧɾ੔ܗΛͯ͠CSVͱͯ͠อଘ • DWHͱͯ͠BigQueryΛ࠾༻, શσʔλΛΨποͱimport • impourter͸GCPۘ੡ͷBigQueryΫϥΠΞϯτͰγϡοͱ࣮૷ pip install google-cloud-bigquery

Slide 34

Slide 34 text

ʲࢀߟʳBigQueryͷίετपΓ͸& • ͋͘·Ͱࢲͷܦݧ্Ͱ͕͢, ݸਓ։ൃͰ࢖͏ఔ౓ͷσʔλྔͩͬͨΒແྉ࿮ͷൣғ಺Ͱ࢖͑·͢. ※਺10GBఔ౓, 1ΫΤϦ͋ͨΓ਺100MB͙Β͍ͷར༻ • جຊΛकΕ͹اۀϨϕϧͰ΋ޮ཰త͔ͭ௒ϥΫʹ࢖͑·͢. GCPެࣜ&৭Μͳਓ͕ݴٴ͍ͯ͠·͢. • ແବͳྻΛऔಘ͠ͳ͍, σʔλ͸Ұׅૠೖ • partition key׆༻Ͱޮ཰తͳΞΫηε • ίετ؂ࢹ&ͳΜ͔͋ͬͨΒSlack౳Ͱ௨ใ • JX௨৴ࣾͰ΋BigQueryΊͬͪΌ׆༻͍ͯ͠·͢ https://tech.jxpress.net/entry/kowakunai-bigquery

Slide 35

Slide 35 text

ग़དྷ্͕ͬͨ؀ڥ͸ͪ͜Β. BigQuery্ͷσʔλΛJupyterLab΋͘͠͸PythonεΫϦϓτͰ͍͍ײ͡ʹ΍Δ ͳΜͯ͜ͱ͸ͳ͍, PyData΍Δͱ͖ͷ͓खຊΈ͍ͨͳ؀ڥʹͳΓ·ͨ͠

Slide 36

Slide 36 text

• ಛ௃ྔͱͳΓ͑Δ஋ͷநग़ɾੜ੒ • Ϟσϧ্Ͱ͍͍ײ͡ʹ࢖͏ͨΊͷSQLهड़ • ΫϥελϦϯάɾ༧ଌͷ४උ࡞ۀ ࠓճ࠾༻ͨ͠෼ੳϞσϧɾख๏ʹ߹Θͤͯಛ௃ྔΛग़͠·ͨ͠. Feature Engineering - ಛ௃ྔநग़

Slide 37

Slide 37 text

ʲਤʳ੒੷༧ଌϞσϧ͕Ͱ͖Δ·Ͱ ಛ௃ྔநग़ -> ΫϥελϦϯά -> ༧ଌ, ͱ͍͏γϯϓϧͳྲྀΕͰ͢.

Slide 38

Slide 38 text

ಛ௃ྔͱͳΓ͑Δ஋ͷநग़ɾੜ੒ • ϕʔεͱͳΔσʔληοτ͸͢΂ͯBigQueryʹ͋Δ • λεΫʹඞཁͳಛ௃ྔ͸SQLͱPythonͷ࢖͍෼͚Ͱ͍͍ײ͡ʹ • SQLͰ׬݁͢Δ΋ͷ͸BigQueryͷViewͱͯ͠࡞Δ&࢖͏ • SQLͰ೉͍͠΋ͷ͸PythonͰதؒσʔλ࡞ͬͯޙBigQuery΁

Slide 39

Slide 39 text

SQL͔Βͷಛ௃ྔநग़ɾੜ੒ • ଧ཰, ग़ྥ཰, OPSతͳͷ͸ SQLͰܭࢉͰ͖Δ. • ΋͏ͪΐͬͱෳࡶͳࢦඪ΋. ྫ͑͹wOBAͱ͔. • ্ه͸BigQueryͰ׬݁͠·ͨ͠.

Slide 40

Slide 40 text

SQLͰ޲͔ͳ͍΋ͷΛPythonͰ • ߦϨϕϧͷܭࢉ͸SQLͰྑ͍. • ൥ࡶͳॲཧɾܭࢉ͕ೖͬͨΓ, ߦྻͰ·ͱ·ͬͨϞϊͷॲཧ͸ Pythonͱ͔R͕େಘҙ. • ࠓճ͸PandasͰ͍͍ײ͡ʹ. ݁ՌΛͦͷ··BigQueryʹimport

Slide 41

Slide 41 text

JupyterLab͔ΒBigQueryͰ͍͍ײ͡ʹ • JupyterLabΛϕʔεͱͨ͠؀ڥ • Pandas • scikit-learn • plotly • BigQuery Client͔Βͦͷ·· Dataframeʹ͍͍ͯ͠ײ͡ʹॲཧ • ͜ͷޙͷΫϥελϦϯάͱ͔͸શ෦͜Ε

Slide 42

Slide 42 text

• ʮࣅ͍ͯΔબखʯΛ෼ྨ͢ΔλεΫ • ΞϧΰϦζϜΛܾΊΔˠ࠷ऴతʹ͸ANNʹ • AnnoyʢΞϊΠʔʣͰര଎ANN ෼ྨλεΫΛ࡞Γ, ςετΛॻ͖, γϡοͱCIͰ࠶࣮ߦՄೳʹ. Clustering - ΫϥελϦϯά

Slide 43

Slide 43 text

ࣅ͍ͯΔબखΛ୳͢ɾ෼ྨ͢Δ • ༧ଌ੒੷Λʮࣅ͍ͯΔબखͷ೥ྸ͝ͱͷ੒੷͔Β͍͍ײ͡ʹग़ ͢ʯͳͷͰ, ʮࣅ͍ͯΔબखʯΛ୳͢ͷ͕࣮͸ॏཁ • ΍Γํͱͯ͠͸, ʮಛఆͷબखͱଞͷબखʯͷϢʔΫϦουڑ཭ Λࢉग़͠, ্ҐXਓͷ੒੷Λݩʹ༧ଌλεΫΛͯ͋͛͠Ε͹ྑ͍ • PECOTA͸ͦͷߟ͑ํͰ΍ͬͯΔͷͰ, ͜ΕΛͦͷ··ਅࣅ͢Δ.

Slide 44

Slide 44 text

ANNʢۙࣅ࠷ۙ๣୳ࡧʣͰڑ཭ΛٻΊΔ • ώοτ, ຊྥଧ, etc…౳ͷ୅දతͳ੒੷. ৄࡉ͸ൿີ' • ग़৔ࢼ߹਺ͱ͔஍ຯͳ੒੷. ͜Ε΋ൿີ' • ্هΛಛ௃ྔͱͯ͠ANNʢۙࣅ࠷ۙ๣୳ࡧʣΛ͔ͭͬͯ ϢʔΫϦουڑ཭Λࢉग़͠, ͍ۙબखΛूΊΔ͜ͱʹ. • ଧऀ༧ଌͱ͸ผωλͰࢼ͠, ݁Ռ্ʑͩͬͨͷͰͦͷ··࠾༻ https://shinyorke.hatenablog.com/entry/feature-faridyu-san • ࣮૷͸Annoyͱ͍͏௒ศརͳϥΠϒϥϦΛ࢖͍·ͨ͠. • ࣮ݧίʔυΛ৮ͬͯյΕΔͱΞϨͳͷͰGitHub ActionsͰAuto Test

Slide 45

Slide 45 text

AnnoyΛ࢖ͬͨANNʹΑΔڑ཭ࢉग़. ֶश͔ΒϞσϧอଘ͸ͨͬͨ͜Ε͚ͩ. σʔλ΋େ͖͘ͳ͍ͷͰඵͰऴΘΓ·ͨ͠.

Slide 46

Slide 46 text

ͪΐͬͱͨ͠ςετ݁Ռ. MLBͷएखεʔύʔࡾྥख, ϚοτɾνϟοϓϚϯʹ͍ۙબख. ݱ໾ͷڧ͍ࡾྥख, աڈͷ໊બखͱ͢΂͍ͯۙ͠ࡾྥख͕ग़͖ͯ·ͨ͠.

Slide 47

Slide 47 text

• ֬཰෼෍Ͱ͍͍ײ͡ͳ੒੷ʢͬΆ͍ʣ਺஋Λࢉग़ • ೥ྸ͝ͱͷ੒௕ʢਰ͑ʣͬΆ͍νϡʔχϯά • StreamlitͰ͍͍ײ͡ʹPresentation ੒੷ʢ෩ʣͷ਺ࣈ͕ग़ͨΒΰʔϧ. Predict - ੒੷༧ଌ

Slide 48

Slide 48 text

༧ଌ੒੷ͷग़͠ํ • “Xબखͷ34ʙ37ࡀͷ੒੷͸, Xʹࣅ͍ͯΔબख܈Yͷ 34ʙ37ࡀͷ੒੷ʹࣅͯ͘Δ“ …ͱ͍͏ͷ͕PECOTAͷߟ͑ํ. • ࠓճͷϓϩδΣΫτʮzobristʯ΋PECOTAͱಉ͡ߟ͑ํΛ࠾༻. • ANNͰग़ͨ͠ϢʔΫϦουڑ཭Λݩʹ, Xʹ͍ۙબखΛϐοΫ Ξοϓ, ೥ྸผͷ੒੷Λ͍͍ײ͡ʹαϚͬͯٻΊΔΑ͏ʹͨ͠.

Slide 49

Slide 49 text

࣮ࡍͲ͏΍ͬͯ΍͔ͬͨ? 1. બखXʹࣅ͍ͯΔબखYʢෳ਺ਓʣͷ25ࡀ࣌఺ͷ੒੷ΛूΊΔ. 2. 1.ͷ੒੷σʔλΛݩʹ, ʮڧ͍ɾී௨ɾऑ͍ʯతͳlabelΛ෇͚Δ. 3. 1.Λtraining data, 2.Λlabelͱͨ͠෼ྨλεΫΛ࣮ࢪ 4. બखXͷ25ࡀ੒੷σʔλΛ࢖ͬͯ༧ଌ. 5. ฦ͖ͬͯͨlabelͱಉ͡label͕෇͍ͨબखͷ੒੷Λݩʹ༧ଌ੒੷Λ࡞੒. ෼ྨ͸φΠʔϒϕΠζ, ࣮૷͸scikit-learnͰΤΠοͱ΍ͬͨʢίʔυ͸ׂѪʣ

Slide 50

Slide 50 text

࠷ޙͷӅ͠ຯ - ೥ྸʹΑΔ੒௕ͱਰ͑ • ए͍બख͸ࠓޙ੒௕͢ΔՄೳੑ͕ߴ͍ʢͱ͸ݶΒͳ͍ʣ • 30Λ௒͑ͨબख͸ਰ͑ΔՄೳੑ͕ߴ͍ʢͱ͸ݶΒͳ͍ʣ • அݴ͸Ͱ͖ͳ͍͕܏޲͸͋Γͦ͏? • ͱ͍͏Ծઆͷ΋ͱ, ೥ྸ͝ͱOPSฏۉ஋Λ ֬཰෼෍ͷӅ͠ຯʹೖΕͯΈ·ͨ͠

Slide 51

Slide 51 text

ʲྫʳͱ͋ΔࡾྥखʢʹࣅͨબखʣͷOPSਪҠ ઌ΄ͲͷANNͷྫͱಉ͘͡, MLBͷएखεʔύʔࡾྥख, ϚοτɾνϟοϓϚϯʹ͍ۙબखͷOPSฏۉ. 29ʙ30ࡀ͕୩ʹͳͬͯΔͷ͕͓Θ͔Γ͍͚ͨͩΔͩΖ͏͔?

Slide 52

Slide 52 text

StreamlitʹΑΔര଎σϞ։ൃ • ϓϨθϯ༻ʹStreamlitͰσϞΛ࡞ͬͨ. https://www.streamlit.io/ • جຊతʹ͸JupyterͰॻ͍ͨ΍ͭΛ ͪΐͬ͜ͱϦϑΝΫλϦϯά. • PandasͷDataframe΋plotlyͷάϥϑ΋ Jupyter͔ΒͷίϐʔͰ͍͍ײ͡ʹ͍͚Δʂ • ػցֶशϓϩδΣΫτతʹ͸ϓϨθϯଞ APIઃܭͷ͖ͨͨ୆ͱͯ͠࢖͑ͦ͏.

Slide 53

Slide 53 text

ʲ׬੒ʳϚοτɾνϟοϓϚϯʢ3Bʣͷ༧ଌ੒੷ 27ࡀҎ͕߱༧ଌ੒੷. ͪͳΈʹ2020೥ʢ27ʣ͸162ࢼ߹͋Δલఏʢ࣮ࡍ͸60ࢼ߹ʣ ͳΜ͔, ͜͏͍͏બख͍ͦ͏͡Όͳ͍Ͱ͔͢??? ↑ଧ཰ͷ༧ଌ஋ ↓҆ଧ਺ɾຊྥଧɾଧ఺ͷ༧ଌ஋

Slide 54

Slide 54 text

͜ͷ௨Γͷ༧ଌʹͳΔ͔͸ 3ʙ5೥ޙͷ౴͑߹ΘͤʹͳΔ ͕, ͦΕͬΆ͍਺ࣈ͸ग़ͨͷͰྑͦ͞͏(

Slide 55

Slide 55 text

ͦ͏͍͑͹೔ຊΛ୅ද͢ΔϝδϟʔϦʔΨʔ& ౤खͱͯ͠͸खज़ͷӨڹ͋ͬͯ࢒೦Έ͋Γ·͕͢ ଧͭํΊͪΌͪ͘Όઈ޷ௐͰ͢ΑͶ)

Slide 56

Slide 56 text

΋ͪΖΜௐ΂͖ͯ·ͨ͠Α⚾ • ࡢ೥ʢ2019೥ʣͷ੒੷Λ༧ଌσʔλͱͯ͠, 26ʙ29ࡀͷଧܸ੒੷Λ༧ଌ. • ଧ཰, ຊྥଧ, ଧ఺ΛՄࢹԽ, ͖ͬ͞ͷΫΠζ#ͷ౴͑΋͋Γ·͢. • ͳ͓, 26ࡀʢࠓ೥ʣ͸162ࢼ߹͋ͬͨͱͯ͠, ͷਪଌ஋ ※ࠓ೥ͷϝδϟʔϦʔά͸60ࢼ߹ఔ౓ͳͷͰ1/3͙Β͍ʹͳΔ͸ͣ

Slide 57

Slide 57 text

Ohtani Sanͷଧ཰ਪҠʢ༧ଌ஋ʣ 30ࡀͰ3ׂ͍ۙΩϟϦΞϋΠͷଧ཰Λ࢒͢໛༷ʁ

Slide 58

Slide 58 text

Ohtani Sanͷ҆ଧɾຊྥଧɾଧ఺༧ଌ 26ࡀҎ߱ӈݞ্͕Γ, 30ࡀͰ໺खͱͯ͠ΩϟϦΞϋΠʹ !

Slide 59

Slide 59 text

Ohtani Sanͷ༧૝ຊྥଧ ΫΠζͷ౴͑͸32ຊͰͨ͠

Slide 60

Slide 60 text

ਅ໘໨ͳ⚾ݟղΛݴ͏ͱ… • ଧ఺ɾ҆ଧΛؚΊͨ༧ଌ੒੷, ·͋·͋༗Γಘͦ͏ͳ਺ࣈ. • ͨͩ, ࣮ݱ͢ΔͨΊʹ͸نఆଧ੮ʹ౸ୡ͢Δඞཁ͋Γͦ͏. ※େ୩ᠳฏ͸Ұ౓΋نఆଧ੮౸ୡͨ͜͠ͱφγʢ೔ຊ࣌୅΋ʣ • ϑΟδΧϧతͳೳྗ஋ʢଧٿ଎౓ͳͲʣ͕͔ͳΓΠέͯΔͷͰ ༧ଌ஋Ҏ্ͷ਺ࣈΛୟ͖ग़͢Մೳੑ΋&

Slide 61

Slide 61 text

zobristͰ·ͩ΍ͬͯͳ͍͜ͱ • ϗʔϜٿ৔ʹΑΔ੒੷ิਖ਼. ͍ΘΏΔʮύʔΫϑΝΫλʔʯ. • τϥοΩϯάσʔλʹΑΔิਖ਼. ଧٿ଎౓ͳͲͰิਖ਼ͱ͔. • ϝδϟʔϦʔάҎ֎ͷϓϩ໺ٿϦʔάʢ࡯͠ʣ ·ͩ·͍ͩ͡ΕΔϙΠϯτ͸͍͔ͭ͋͘Γͦ͏&

Slide 62

Slide 62 text

݁ͼ • ಛ௃ྔΤϯδχΞϦϯάͱϓϩάϥϛϯά • ػցֶशϓϩδΣΫτͱݸਓ։ൃ • ΋͏ͪΐͬͱઌͷ͸ͳ͠

Slide 63

Slide 63 text

ಛ௃ྔΤϯδχΞϦϯάͱϓϩάϥϛϯά ͿͬͪΌ͚ͳΜͰ΋͍͍Ͱ͕͢, ղ͖͍ͨςʔϚ࣍ୈͰݴޠ͸มΘΔ. • ͪΐͬͱ࢛ͨ͠ଇԋࢉͳΒϓϩάϥϛϯά͠ͳͯ͘΋SpreadsheetͰOK • σʔλϕʔε࢖͑ΔϚϯͳΒSQLͰ͍͍͍͍ͩͨײ͡ʹͳΓͦ͏ • ౷ܭɾ਺ࣜϞσϧͱ͔ػցֶश͕བྷΈͳΒPythonͱ͔R ʮࣗ෼͕ԿΛ͍͔ͨ͠&ʯΛཧղ্ͨ͠Ͱϓϩάϥϛϯά͠Α͏ʂ ΋ͪΖΜ, Pythonͱ͔R͡Όͳͯ͘ଞͷݴޠͰ΋ΤΤΜ΍Ͱ

Slide 64

Slide 64 text

⚾੒੷༧ଌ͸ʮݸਓ։ൃʯͳػցֶशϓϩδΣΫτ ຊ֨తͳελʔτ͸2020೥3݄Ͱ͕ͨ͠ௐؚࠪΊΔͱࡢ೥10݄͔Βελʔτ ΄΅Ұ೥͔͔ͬͯΔʮݸਓ։ൃʯͳʮػցֶशϓϩδΣΫτʯͰͨ͠.

Slide 65

Slide 65 text

ػցֶशϓϩδΣΫτͷ೉͠͞ͱݫ͠͞ • ʮϝδϟʔϦʔΨʔͷ੒੷༧ଌʯͱ͍͏໌֬ͳ՝୊ઃఆ͕͋Γ ϒϨͣʹ΍Γ͖ͬͨͷ͕੒ޭͷཁҼͩͬͨ • ๻ࣗ਎, ಛ௃ྔΤϯδχΞϦϯάʹඞཁͳυϝΠϯ஌ࣝ⚾͕͋ͬͨ • ࣮ࡍͷ࢓ࣄͷ৔߹͸…͜Μͳʹ্ख͍͘͘͜ͱ͸݁ߏ௝͍͠ͱࢥ͏. υϝΠϯ஌ࣝ, ՝୊ઃఆͷ೉͠͞, εςʔΫϗϧμʔଟ͍Α, etc… ͜ͷൃදͷ಺༰, ͦͷ··ࢀߟʹͯ͋͠ͳͨͷ࢓ࣄʹ׆͖Δอূ͸Ͱ͖·ͤΜʂ

Slide 66

Slide 66 text

• ౷ܭɾػցֶशεΩϧΛຏ͘, ΤϯδχΞϦϯάɾεΩϧΛ ৳͹͢໨తͰʮݸਓ։ൃͰػցֶशʯΛڧ͓͘͢͢Ί͠·͢ʂ • ͜ͷൃද͸ۓٸࣄଶએݴதͷࣗॗظؒʹ΄΅΍Γ͖Γ·ͨ͠. ʮDone is better than perfectʯΛStay Homeظؒͷ͓͔͛Ͱ΍Εͨ. • ͪͳΈʹ, ݸਓ։ൃΛ࠳ંͤͣଓ͚Δ࿩͸ͪΐͬͱલʹॻ͍ͨ https://shinyorke.hatenablog.com/entry/botti-development σʔλαΠΤϯςΟετͦ͜ݸਓ։ൃΛ

Slide 67

Slide 67 text

-> ࠓ೔·Ͱ ཧ૝૾ ΋͏ͪΐͬͱઌͷ͸ͳ͠ ʮେ୩ᠳฏ͞Μͷຊྥଧ༧૝͕Ͱ͖ͨʂʯͷͰޢຎߦϓϩτλΠϐϯά͸͓͠·͍ ⚾తʹ΋࢓ࣄɾݸਓͱͯ͠΋ଓ͖ͷ͸ͳ͕͋͠Γ·͢

Slide 68

Slide 68 text

ࠓճͷ੒ՌΛϏδωεɾݸਓͷValueʹ • PECOTAΛ࡞ͬͨωΠτɾγϧόʔ͸ΞϝϦΧେ౷ྖબ༧ଌͰҰ༂༗໊ʹ • zobristΛ࡞ͬͨshinyorke͞Μ͸, • Ϗδωεʢ࢓ࣄʣʹ͜ͷΞ΢τϓοτΛ׆͔͍ͯ͘͠ • ݸਓʢ໺ٿͷݚڀऀʣͱͯ͠Ҿ͖ଓ͖ݚڀ&ϓϩμΫτग़͔͢΋ • ͱ͍͏Θ͚Ͱ, ࣍ճ࡞඼ͷߏ૝͸͢Ͱʹ͋Γ·͢, ޤ͏͝ظ଴⚾

Slide 69

Slide 69 text

ήʔϜηοτ⚾ ͝ਗ਼ௌ͋Γ͕ͱ͏͍͟͝·ͨ͠. Shinichi Nakagawa(Twitter/Facebook/etc… @shinyorke)