Link
Embed
Share
Beginning
This slide
Copy link URL
Copy link URL
Copy iframe embed code
Copy iframe embed code
Copy javascript embed code
Copy javascript embed code
Share
Tweet
Share
Tweet
Slide 1
Slide 1 text
Baseball Player Performance Prediction Using Feature Engineering with Python ⁶ R Shinichi Nakagawa(@shinyorke) PyCon JP 2020 Online 8/28 εϙʔπσʔλΛ༻͍ͨಛྔΤϯδχΞϦϯάͱٿબखͷ༧ଌ - PythonͱRΛߦͬͨΓདྷͨΓ
Slide 2
Slide 2 text
͜ͷൃද • ಛྔΤϯδχΞϦϯάΛਖ਼͘͠ཧղ͠Α͏ • Python, R, SQLΈΜͳಘҙෆಘҙ͋ΔͷͰ͍͍ײ͡ʹ͓͏ • ಛྔΤϯδχΞϦϯάͱػցֶशͰ⚾Λָ͠͏ʢॏཁʣ ͱ͍͏༰Ͱ͢.
Slide 3
Slide 3 text
Who am I ?ʢ͓લ୭Αʣ • Shinichi Nakagawaʢத ৳Ұʣ • େͷSNSͰʮshinyorkeʢ͠ΜΑʔ͘ʣʯͱ໊͍ͬͯ·͢ • JX Press Corporation Senior Engineer ʢJX௨৴ࣾ γχΞɾΤϯδχΞʣ • Baseball Engineer, Data Scientist ʢੜͷٿΤϯδχΞɾσʔλαΠΤϯςΟετʣ • #Python #DataScience #Baseball⚾ #SABRmetrics #σʔλج൫ #ٕज़ސ
Slide 4
Slide 4 text
ʲCMʳJX௨৴ࣾ, Pythonࣗ͘͘शࣨ
Slide 5
Slide 5 text
JX௨৴ࣾ #ͱ • “ςΫϊϩδʔͰʮࠓى͖͍ͯΔ͜ͱʯΛ໌Β͔ʹ͢Δใಓػؔ” Λϛογϣϯͱ͢ΔใಓϕϯνϟʔͰ͢ • 300ສμϯϩʔυಥഁʂχϡʔεใΞϓϦʮNewsDigestʯ ※DL2020/8/28ݱࡏͷͷ • BtoBϓϩμΫτʮFASTALERTʯʮJX௨৴ࣾௐࠪʯ • https://jxpress.net/
Slide 6
Slide 6 text
JX௨৴ࣾͱPython • αʔόʔαΠυ, ػցֶश, SREͳͲͳͲPythonΛੲ͔Βͬͯ·͢ • PyCon JPεϙϯαʔΛԿ͔ͬͯ·͢. 2016, 2017, 2019, 2020(New!) • ࠓ͞ΒʹεϐʔΧʔ͕ೋਓʂ(@YAMITZKY, @shinyorke) • Techϒϩάؤுͬͯ·͢, ಡΜͰͶ https://tech.jxpress.net/
Slide 7
Slide 7 text
Pythonࣗ͘͘शࣨ #jisyupy • ʮٕज़ͱΠΠΰϋϯʯΛָ͠Έͳ͕Βʮֶࣗࣗशʯ͢Δձ. • 2017ʹ #rettypy ͱͯ͠ελʔτ, ͣͬͱΦʔΨφΠβʔͯ͠·͢. • 2020͔ΒΦʔΨφΠθʔγϣϯมߋͱ͔Ͱͪΐͬͱ͚ͩຯม. • ݱ࣌ͰΦϯϥΠϯɾෆఆظ։࠵, ͍ͣΕϦΞϧΓ͍ͨ. • ࣍ճ9/19 https://jisyupy.connpass.com/event/186611/
Slide 8
Slide 8 text
ʊਓਓਓਓਓਓਓਓਓਓਓਓਓਓʊ ʼɹͱͭͥΜͷ⽁ΫΠζʂʂʂɹʻ ʉY^Y^Y^Y^Y^Y^Y^Y^Y^Y^Y^Yʉ
Slide 9
Slide 9 text
େ୩ᠳฏ͞Μ, 3ޙͷγʔζϯຊྥଧԿຊ? 1. 30ຊ 2. 32ຊ 3. 334ຊʢʁʁʁʣɹ˞ϝδϟʔͷγʔζϯه70ຊͰ͢ ʲࢀߟใʳ ࡢʢ25ࡀʣ18ຊ, ͦͷલʢ24ࡀʣ20ຊͰͨ͠.
Slide 10
Slide 10 text
ͳΜͰ#ؔͳ͍Ζw ͜͏͍͏ΫΠζͷͨΊͷͷΛ࡞Γ·ͨ͠ ※͑ൃදͷޙͰʂ
Slide 11
Slide 11 text
ຊͷࢼ߹⚾ • ༧બʮಛྔΤϯδχΞϦϯά #ͱʯ • ४ܾʮ⚾Ͱ͡ΊΔػցֶश - MLBͷଧऀ༧ଌʯ • ܾউʮ3ޙ, େ୩ᠳฏ͞ΜԿຊͷϗʔϜϥϯΛଧͭͷ͔?ʯ
Slide 12
Slide 12 text
ಛྔΤϯδχΞϦϯά #ͱ ੜσʔλΛటष͘ॲཧͯ͠ػցֶशͳΓ౷ܭͰ͑ΔΑ͏ʹ͢Δ·Ͱͷͳ͠ ⚾ͰͷಛྔΤϯδχΞϦϯάʹ͍ͭͯͪΐͬ͜ͱ৮Ε·͢
Slide 13
Slide 13 text
ಛྔΤϯδχΞϦϯά #ͱ • ಛྔੜσʔλΛͱͯ͠දݱͨ͠ͷͰ͋Γ, • खݩʹ͋ΔσʔλɾϞσϧɾλεΫʹ࠷దͨ͠ಛྔΛ ࡞Γ্͛Δϓϩηε͕ಛྔΤϯδχΞϦϯάͰ͋Γ, • Python, Rͦͯ͠SQLͳͲͱ͍ͬͨݴޠΛϑϧ׆༻ͯ͠Γ͖Δ ΤϯδχΞϦϯάͷ૯߹֨ಆٕͰ͋Δ.
Slide 14
Slide 14 text
ಛྔΤϯδχΞϦϯάͱ⚾ • -> • ͦͷ··͑ΔϞϊ͕ଟ͍. ྫ͑҆ଧ, ࢛ٿ, ࡾৼͳͲ. • ਖ਼نԽɾεέʔϦϯά͢Δ. RC, wRAA, wOBAͳͲͷηΠόʔϝτϦΫεࢦඪ. • Ҏ֎ͷσʔλ -> • ར͖, ଧ੮ͷࠨӈ, etc… • બखͷಛͱͳΔ༗ޮͳσʔλΛԿ͔͠ΒͷܗͰԽ.
Slide 15
Slide 15 text
ʲྫʳଧ੮ͱར͖Λಛྔʹ͢Δ • SQLʢ͜͜ͰBigQueryʣͷ߹ CASEจҰൃͰΠέ·͢. • هͷܽଛͳͲ͋ΔͷͰ ELSEͱ͔ͷέΞΕͣʹʂ • ͦͷଞग़ͱ͔͋Δ͔Ͷ ถࠃ: 1, ೆถ: 2, ຊ: 25 …తͳ.
Slide 16
Slide 16 text
࠷దͳಛྔΛͲ͏ͬͯݟ͚ͭΔ? • ݟ͚ͭΔ͍ͬͯ͏ΑΓʮతʹ߹ΘͤͯࣗͰ࡞͍ͬͯ͘ʯϞϊ • σʔλੳɾػցֶशΛΔతʹ߹ΘͤͯϞσϧͱσʔλΛ ୳͠, ࡞Γ, ৭ʑͱࢼߦࡨޡΛߦ͏. • $<νʹσʔλ͕͍ͬͺ͍͋Δ, AIΔͧʂDXʂʂ༏উʂʂ →͜Εయܕతͳෛ͚ࢼ߹ύλʔϯ. ಛྔʹग़དྷͳ͖Όҙຯແ͠% ɹ˞ՍۭͷͰ͢ʢগͳ͘ͱฐࣾͰͳ͍ʣ
Slide 17
Slide 17 text
⚾ʹ͓͚Δಛྔͷߟ͑ํ - DIPS, RC, LWTS • DIPS: ଧͷϓϨʔΛʮࣗʯʮଞʯʹྨ͠ѻ͏ʢԼਤΛࢀরʣ • RC: ಘೳྗΛʮʢग़ྥೳྗ + ਐྥೳྗʣ / ग़ػձʯͷϞσϧͰઆ໌͢ΔΓํ • LWTS: ϓϨʔͷҰͭҰͭΛʮಘʯʹࢉ͠, ଧͷϓϨʔΛධՁ͢Δ • ʮηΠόʔϝτϦΫεʯͱ͍͏ٿͷ౷ܭϞσϧతͳߟ͑ํͰ͢&ৄ͘͠ࢲͷϒϩάʹͯ https://shinyorke.hatenablog.com/entry/sabr-metrics-batting-stats ಛ ओͳࢦඪ ࣗ ݸਓͷೳྗʹґଘ ύϫʔ εϐʔυ બٿ؟FUDʜ ຊྥଧ ࡾৼ ࢛ࢮٿ ଞ આ໌ม͕ଟ͍ νʔϜ ٿ ৹FUDʜ ࣗʢޚʣ ࣦࡦ
Slide 18
Slide 18 text
Python, R, SQLΛͬͨಛྔΤϯδχΞϦϯά ཁ͢ΔʹσʔλΛྉཧ͢Δ࡞ۀ. ݴޠΛ͍͚Ε͍͍ײ͡ʹͳΓ·͢.
Slide 19
Slide 19 text
ಛྔΫοΩϯάΤϯδχΞϦϯάૣݟද • Python, R, SQLͰΕΔ͜ͱʹେࠩφγʢಘҙෆಘҙ͋Δʣ • هड़ྔ, ؔͷత, ॲཧʢࢄॲཧͷ༗ແʣͰ͍͚ ࢀߟɿ https://shinyorke.hatenablog.com/entry/r-to-python ൺֱ߲ 1ZUIPO 3 42- هड़ྔ ʢಉ͡ࣄΛͨ͠ͱͯ͠ʣ Մͳ͘ෆՄͳ͠ σʔλΛѻ͏Ͱ ൺֱతγϯϓϧ 1ZUIPO 3ͱൺ ؔ ʢܭࢉॲཧʣ 001తͳΞϓϩʔνଞ ϓϩάϥϚϒϧͰ͋Δ ࣜతͳϞσϧΛ ࣜͷ··Ͱ͖Δ ؔͰϐλΰϥεΠον ͍ํΛؒҧ͑Δͱࠈ ॲཧ ࢄॲཧ ฒྻԽɾࢄॲཧͰ ૯߹తͳνϡʔχϯάڧ͍ ͋͘·Ͱܭࢉɾ౷ܭπʔϧ ॲཧɾࢄ΄Ͳ΄Ͳ %#ΤϯδϯʹΑͬͯ ɾࢄͷߟ͕͑ҟͳΔ
Slide 20
Slide 20 text
Python, R, SQLͰΕΔ͜ͱʹେࠩφγ ݁ہ͍͚͕Ұ൪େࣄͰʂ #େͳͷͰೋݴ͏
Slide 21
Slide 21 text
⚾Ͱ͡ΊΔػցֶश - MLBͷଧऀ༧ଌ ⚾͓ΑͼPython, R, SQLͷಛྔΤϯδχΞϦϯάΛཧղͨ͠ͱͯ͠ ࠓճࣄྫͱͯ͠MLBʢϝδϟʔϦʔάʣͷଧऀ༧ଌʹνϟϨϯδ͠·ͨ͠
Slide 22
Slide 22 text
ٿͰ͡ΊΔػցֶश⚾ 1. Planning - ௐࠪɾاը 2. Data Engineering - σʔλऔಘ 3. Feature Engineering - ಛྔநग़ 4. Clustering - ΫϥελϦϯάʹΑΔྨ 5. Predict - ༧ଌ ࣅͨλΠτϧͷຊ͕͋Δͩͱ? ؾͷ͍ͤͰ͢Αؾͷ͍ͤʢখʣ
Slide 23
Slide 23 text
Planning - ௐࠪɾاը • ઌߦࣄྫɾੳͷௐࠪ • RͷαϯϓϧίʔυΛPythonͰࣸܦ • Γ͍ͨ͜ͱΛΠϯηϓγϣϯσοΩͰ༷ʹͨ͠ ϓϩδΣΫτܭըͳϑΣʔζͬͯ͜ͱͰ͢.
Slide 24
Slide 24 text
ઌߦࣄྫɾੳͷௐࠪ • ٿσʔλେࠃͷΞϝϦΧͰΊͪΌͪ͘Όࣄྫ͋Δ. ϑΝϯιγϟήϢʔβʔ͚ͷ༧ଌαΠτ͕͋ΔϨϕϧ. • ͦͷதͰ, ʢshinyorkeతʹʣೲಘ͔ͭے͕ྑ͍ํ๏Λௐࠪɾ࠾༻ • PECOTAʢϖίλʣ • Analyzing Baseball Data with R
Slide 25
Slide 25 text
PECOTA - ࠷౷ܭతͳ༧ଌϞσϧ • 2003ʹϦϦʔεͨ͠MLBͷ༧ଌϞσϧ • ʮաڈͷࣅ͍ͯΔબखͷʯ͔Β༧ଌΛࢉग़ ͳ͓۩ମతͳख๏ɾࣜඇެ։ʢߟ͑ํͪΒ΄Βॻ͍ͯ͋Δʣ • ޙʹ2008ถࠃେ౷ྖબڍͷউऀΛ49/50भతதͤͨ͞ ౷ܭֶऀωΠτɾγϧόʔ͕։ൃ ※ؾʹͳΔํʮγάφϧ&ϊΠζʯͱ͍͏ॻ੶ΛಡΜͰ͍ͩ͘͞
Slide 26
Slide 26 text
Analyzing Baseball Data with R • ηΠόʔϝτϦΫεΛͬͨٿσʔλੳʹ͓͚Δఆ൪ຊ • ΞϝϦΧൃͰ2018ʹSecond Editionൃද, ͪΖΜӳޠ • ໊લͷ௨Γ, ⚾σʔλੳͷຊͰ, ίʔυͯ͢R தΛཧղ͢ΔͨΊ, RͷίʔυΛಡΈͳ͕ΒPythonʹࣸܦ
Slide 27
Slide 27 text
ʲྫʳઢܗճؼΛRͱPythonͰΔͱ ࠨͷR͕ΦϦδφϧͰӈͷPython͕ࣗͰࣸܦͨ͠ͷ.
Slide 28
Slide 28 text
R -> Pythonʹࣸܦͨ݁͠Ռ • RͰΖ͏͕PythonͰΖ͏͕݁ՌมΘΒͳ͍, ͱཧղ • ࠓޙ͏ϥΠϒϥϦʢ&ࣗͷशख़ʣߟ͑ͨΒPython • ͔͠͠, RͰࣜͱ͔ΊͪΌཧղͰ͖ͨͷͰRʹײँ ͪͳΈʹ࣌ͷ࡞ۀϩάʢ2019/11ʹ࣮ࢪʣϒϩάʹͯ͠·͢ https://shinyorke.hatenablog.com/entry/r-to-python
Slide 29
Slide 29 text
ͦͯ͠ํ͕ܾ·Δ • ༧ଌϞσϧωΠτɾγϧόʔͷPECOTAϞσϧͷਅࣅΛ͢Δ 1.࠷ۙ୳ࡧܥͷΞϧΰϦζϜͰ͍ۙબख୳͠ 2.֬ʢͬΆ͍ʣํ๏Ͱ༧ଌΛ࡞Δ • ্هͷϞσϧ݁ՌΛAnalyzing Baseball Data with Rʹ͋ͬͨ ʮྸ্ͷϐʔΫΛࢉग़ʯ͢Δํ๏Ͱएׯิਖ਼
Slide 30
Slide 30 text
ݴޠԽͦͯ͠ϓϩδΣΫτͷ͡·Γ • ΠϯηϓγϣϯσοΩͰ༷·ͱΊ ໎ࢠʹͳΒͳ͍Α͏ʹ • JiraΛͬͯϓϩδΣΫτཧ ਐḿཧɾ࡞ۀϝϞΛͨ͢Ί • ݁Ռతʹ͜ͷํ๏Ͱ࠷ޙ·Ͱ ϓϩδΣΫτ໊ϝδϟʔͷສೳબखʮZobristʯͱ໋໊
Slide 31
Slide 31 text
Data Engineering - Dataऔಘ • ⚾σʔλͷऔಘ • ͍͍ײ͡ʹܗͯ͠Google BigQuery CSVଞͷܗࣜͰ͋ͬͨσʔλΛBigQueryʹ౷Ұ.
Slide 32
Slide 32 text
⚾Data is Ͳ͜& • Lahman’s Baseball Database • MLBશબखͷ௨ࢉɾग़ͷσʔλ. CSV. • http://www.seanlahman.com/baseball-archive/statistics/ • https://github.com/chadwickbureau/baseballdatabank • Retrosheet • ࢼ߹ใΛଧ੮୯ҐͰه͍ͯ͠Δσʔληοτ. • https://www.retrosheet.org/ • https://github.com/chadwickbureau/retrosheet ݩʑ, shinyorke͕PyCon JP 2014Ҏདྷ͓ੈʹͳ͍ͬͯͨσʔλͰ͢&࠷ۙGitHubʹ͋ͬͯศརʂ
Slide 33
Slide 33 text
BigQueryʹͯ͢ΛूΊΔ • Lahman’s Database, RetrosheetͷσʔλΛSQLͰ͑ΔϨϕϧ ͷલॲཧɾܗΛͯ͠CSVͱͯ͠อଘ • DWHͱͯ͠BigQueryΛ࠾༻, શσʔλΛΨποͱimport • impourterGCPۘͷBigQueryΫϥΠΞϯτͰγϡοͱ࣮ pip install google-cloud-bigquery
Slide 34
Slide 34 text
ʲࢀߟʳBigQueryͷίετपΓ& • ͋͘·Ͱࢲͷܦݧ্Ͱ͕͢, ݸਓ։ൃͰ͏ఔͷσʔλྔͩͬͨΒແྉͷൣғͰ͑·͢. ※10GBఔ, 1ΫΤϦ͋ͨΓ100MB͙Β͍ͷར༻ • جຊΛकΕاۀϨϕϧͰޮత͔ͭϥΫʹ͑·͢. GCPެࣜ&৭Μͳਓ͕ݴٴ͍ͯ͠·͢. • ແବͳྻΛऔಘ͠ͳ͍, σʔλҰׅૠೖ • partition key׆༻ͰޮతͳΞΫηε • ίετࢹ&ͳΜ͔͋ͬͨΒSlackͰ௨ใ • JX௨৴ࣾͰBigQueryΊͬͪΌ׆༻͍ͯ͠·͢ https://tech.jxpress.net/entry/kowakunai-bigquery
Slide 35
Slide 35 text
ग़དྷ্͕ͬͨڥͪ͜Β. BigQuery্ͷσʔλΛJupyterLab͘͠PythonεΫϦϓτͰ͍͍ײ͡ʹΔ ͳΜͯ͜ͱͳ͍, PyDataΔͱ͖ͷ͓खຊΈ͍ͨͳڥʹͳΓ·ͨ͠
Slide 36
Slide 36 text
• ಛྔͱͳΓ͑Δͷநग़ɾੜ • Ϟσϧ্Ͱ͍͍ײ͡ʹ͏ͨΊͷSQLهड़ • ΫϥελϦϯάɾ༧ଌͷ४උ࡞ۀ ࠓճ࠾༻ͨ͠ੳϞσϧɾख๏ʹ߹ΘͤͯಛྔΛग़͠·ͨ͠. Feature Engineering - ಛྔநग़
Slide 37
Slide 37 text
ʲਤʳ༧ଌϞσϧ͕Ͱ͖Δ·Ͱ ಛྔநग़ -> ΫϥελϦϯά -> ༧ଌ, ͱ͍͏γϯϓϧͳྲྀΕͰ͢.
Slide 38
Slide 38 text
ಛྔͱͳΓ͑Δͷநग़ɾੜ • ϕʔεͱͳΔσʔληοτͯ͢BigQueryʹ͋Δ • λεΫʹඞཁͳಛྔSQLͱPythonͷ͍͚Ͱ͍͍ײ͡ʹ • SQLͰ݁͢ΔͷBigQueryͷViewͱͯ͠࡞Δ&͏ • SQLͰ͍͠ͷPythonͰதؒσʔλ࡞ͬͯޙBigQuery
Slide 39
Slide 39 text
SQL͔Βͷಛྔநग़ɾੜ • ଧ, ग़ྥ, OPSతͳͷ SQLͰܭࢉͰ͖Δ. • ͏ͪΐͬͱෳࡶͳࢦඪ. ྫ͑wOBAͱ͔. • ্هBigQueryͰ݁͠·ͨ͠.
Slide 40
Slide 40 text
SQLͰ͔ͳ͍ͷΛPythonͰ • ߦϨϕϧͷܭࢉSQLͰྑ͍. • ࡶͳॲཧɾܭࢉ͕ೖͬͨΓ, ߦྻͰ·ͱ·ͬͨϞϊͷॲཧ Pythonͱ͔R͕େಘҙ. • ࠓճPandasͰ͍͍ײ͡ʹ. ݁ՌΛͦͷ··BigQueryʹimport
Slide 41
Slide 41 text
JupyterLab͔ΒBigQueryͰ͍͍ײ͡ʹ • JupyterLabΛϕʔεͱͨ͠ڥ • Pandas • scikit-learn • plotly • BigQuery Client͔Βͦͷ·· Dataframeʹ͍͍ͯ͠ײ͡ʹॲཧ • ͜ͷޙͷΫϥελϦϯάͱ͔શ෦͜Ε
Slide 42
Slide 42 text
• ʮࣅ͍ͯΔબखʯΛྨ͢ΔλεΫ • ΞϧΰϦζϜΛܾΊΔˠ࠷ऴతʹANNʹ • AnnoyʢΞϊΠʔʣͰരANN ྨλεΫΛ࡞Γ, ςετΛॻ͖, γϡοͱCIͰ࠶࣮ߦՄೳʹ. Clustering - ΫϥελϦϯά
Slide 43
Slide 43 text
ࣅ͍ͯΔબखΛ୳͢ɾྨ͢Δ • ༧ଌΛʮࣅ͍ͯΔબखͷྸ͝ͱͷ͔Β͍͍ײ͡ʹग़ ͢ʯͳͷͰ, ʮࣅ͍ͯΔબखʯΛ୳͢ͷ͕࣮ॏཁ • Γํͱͯ͠, ʮಛఆͷબखͱଞͷબखʯͷϢʔΫϦουڑ Λࢉग़͠, ্ҐXਓͷΛݩʹ༧ଌλεΫΛͯ͋͛͠Εྑ͍ • PECOTAͦͷߟ͑ํͰͬͯΔͷͰ, ͜ΕΛͦͷ··ਅࣅ͢Δ.
Slide 44
Slide 44 text
ANNʢۙࣅ࠷ۙ୳ࡧʣͰڑΛٻΊΔ • ώοτ, ຊྥଧ, etc…ͷදతͳ. ৄࡉൿີ' • ग़ࢼ߹ͱ͔ຯͳ. ͜Εൿີ' • ্هΛಛྔͱͯ͠ANNʢۙࣅ࠷ۙ୳ࡧʣΛ͔ͭͬͯ ϢʔΫϦουڑΛࢉग़͠, ͍ۙબखΛूΊΔ͜ͱʹ. • ଧऀ༧ଌͱผωλͰࢼ͠, ݁Ռ্ʑͩͬͨͷͰͦͷ··࠾༻ https://shinyorke.hatenablog.com/entry/feature-faridyu-san • ࣮Annoyͱ͍͏ศརͳϥΠϒϥϦΛ͍·ͨ͠. • ࣮ݧίʔυΛ৮ͬͯյΕΔͱΞϨͳͷͰGitHub ActionsͰAuto Test
Slide 45
Slide 45 text
AnnoyΛͬͨANNʹΑΔڑࢉग़. ֶश͔ΒϞσϧอଘͨͬͨ͜Ε͚ͩ. σʔλେ͖͘ͳ͍ͷͰඵͰऴΘΓ·ͨ͠.
Slide 46
Slide 46 text
ͪΐͬͱͨ͠ςετ݁Ռ. MLBͷएखεʔύʔࡾྥख, ϚοτɾνϟοϓϚϯʹ͍ۙબख. ݱͷڧ͍ࡾྥख, աڈͷ໊બखͱ͍ͯۙ͢͠ࡾྥख͕ग़͖ͯ·ͨ͠.
Slide 47
Slide 47 text
• ֬Ͱ͍͍ײ͡ͳʢͬΆ͍ʣΛࢉग़ • ྸ͝ͱͷʢਰ͑ʣͬΆ͍νϡʔχϯά • StreamlitͰ͍͍ײ͡ʹPresentation ʢ෩ʣͷࣈ͕ग़ͨΒΰʔϧ. Predict - ༧ଌ
Slide 48
Slide 48 text
༧ଌͷग़͠ํ • “Xબखͷ34ʙ37ࡀͷ, Xʹࣅ͍ͯΔબख܈Yͷ 34ʙ37ࡀͷʹࣅͯ͘Δ“ …ͱ͍͏ͷ͕PECOTAͷߟ͑ํ. • ࠓճͷϓϩδΣΫτʮzobristʯPECOTAͱಉ͡ߟ͑ํΛ࠾༻. • ANNͰग़ͨ͠ϢʔΫϦουڑΛݩʹ, Xʹ͍ۙબखΛϐοΫ Ξοϓ, ྸผͷΛ͍͍ײ͡ʹαϚͬͯٻΊΔΑ͏ʹͨ͠.
Slide 49
Slide 49 text
࣮ࡍͲ͏͔ͬͯͬͨ? 1. બखXʹࣅ͍ͯΔબखYʢෳਓʣͷ25ࡀ࣌ͷΛूΊΔ. 2. 1.ͷσʔλΛݩʹ, ʮڧ͍ɾී௨ɾऑ͍ʯతͳlabelΛ͚Δ. 3. 1.Λtraining data, 2.Λlabelͱͨ͠ྨλεΫΛ࣮ࢪ 4. બखXͷ25ࡀσʔλΛͬͯ༧ଌ. 5. ฦ͖ͬͯͨlabelͱಉ͡label͕͍ͨબखͷΛݩʹ༧ଌΛ࡞. ྨφΠʔϒϕΠζ, ࣮scikit-learnͰΤΠοͱͬͨʢίʔυׂѪʣ
Slide 50
Slide 50 text
࠷ޙͷӅ͠ຯ - ྸʹΑΔͱਰ͑ • ए͍બखࠓޙ͢ΔՄೳੑ͕ߴ͍ʢͱݶΒͳ͍ʣ • 30Λ͑ͨબखਰ͑ΔՄೳੑ͕ߴ͍ʢͱݶΒͳ͍ʣ • அݴͰ͖ͳ͍͕͋Γͦ͏? • ͱ͍͏Ծઆͷͱ, ྸ͝ͱOPSฏۉΛ ֬ͷӅ͠ຯʹೖΕͯΈ·ͨ͠
Slide 51
Slide 51 text
ʲྫʳͱ͋ΔࡾྥखʢʹࣅͨબखʣͷOPSਪҠ ઌ΄ͲͷANNͷྫͱಉ͘͡, MLBͷएखεʔύʔࡾྥख, ϚοτɾνϟοϓϚϯʹ͍ۙબखͷOPSฏۉ. 29ʙ30ࡀ͕୩ʹͳͬͯΔͷ͕͓Θ͔Γ͍͚ͨͩΔͩΖ͏͔?
Slide 52
Slide 52 text
StreamlitʹΑΔരσϞ։ൃ • ϓϨθϯ༻ʹStreamlitͰσϞΛ࡞ͬͨ. https://www.streamlit.io/ • جຊతʹJupyterͰॻ͍ͨͭΛ ͪΐͬ͜ͱϦϑΝΫλϦϯά. • PandasͷDataframeplotlyͷάϥϑ Jupyter͔ΒͷίϐʔͰ͍͍ײ͡ʹ͍͚Δʂ • ػցֶशϓϩδΣΫτతʹϓϨθϯଞ APIઃܭͷ͖ͨͨͱͯ͑ͦ͠͏.
Slide 53
Slide 53 text
ʲʳϚοτɾνϟοϓϚϯʢ3Bʣͷ༧ଌ 27ࡀҎ͕߱༧ଌ. ͪͳΈʹ2020ʢ27ʣ162ࢼ߹͋Δલఏʢ࣮ࡍ60ࢼ߹ʣ ͳΜ͔, ͜͏͍͏બख͍ͦ͏͡Όͳ͍Ͱ͔͢??? ↑ଧͷ༧ଌ ↓҆ଧɾຊྥଧɾଧͷ༧ଌ
Slide 54
Slide 54 text
͜ͷ௨Γͷ༧ଌʹͳΔ͔ 3ʙ5ޙͷ͑߹ΘͤʹͳΔ ͕, ͦΕͬΆ͍ࣈग़ͨͷͰྑͦ͞͏(
Slide 55
Slide 55 text
ͦ͏͍͑ຊΛද͢ΔϝδϟʔϦʔΨʔ& खͱͯ͠खज़ͷӨڹ͋ͬͯ೦Έ͋Γ·͕͢ ଧͭํΊͪΌͪ͘ΌઈௐͰ͢ΑͶ)
Slide 56
Slide 56 text
ͪΖΜௐ͖ͯ·ͨ͠Α⚾ • ࡢʢ2019ʣͷΛ༧ଌσʔλͱͯ͠, 26ʙ29ࡀͷଧܸΛ༧ଌ. • ଧ, ຊྥଧ, ଧΛՄࢹԽ, ͖ͬ͞ͷΫΠζ#ͷ͑͋Γ·͢. • ͳ͓, 26ࡀʢࠓʣ162ࢼ߹͋ͬͨͱͯ͠, ͷਪଌ ※ࠓͷϝδϟʔϦʔά60ࢼ߹ఔͳͷͰ1/3͙Β͍ʹͳΔͣ
Slide 57
Slide 57 text
Ohtani SanͷଧਪҠʢ༧ଌʣ 30ࡀͰ3ׂ͍ۙΩϟϦΞϋΠͷଧΛ༷͢ʁ
Slide 58
Slide 58 text
Ohtani Sanͷ҆ଧɾຊྥଧɾଧ༧ଌ 26ࡀҎ߱ӈݞ্͕Γ, 30ࡀͰखͱͯ͠ΩϟϦΞϋΠʹ !
Slide 59
Slide 59 text
Ohtani Sanͷ༧ຊྥଧ ΫΠζͷ͑32ຊͰͨ͠
Slide 60
Slide 60 text
ਅ໘ͳ⚾ݟղΛݴ͏ͱ… • ଧɾ҆ଧΛؚΊͨ༧ଌ, ·͋·͋༗Γಘͦ͏ͳࣈ. • ͨͩ, ࣮ݱ͢ΔͨΊʹنఆଧ੮ʹ౸ୡ͢Δඞཁ͋Γͦ͏. ※େ୩ᠳฏҰنఆଧ੮౸ୡͨ͜͠ͱφγʢຊ࣌ʣ • ϑΟδΧϧతͳೳྗʢଧٿͳͲʣ͕͔ͳΓΠέͯΔͷͰ ༧ଌҎ্ͷࣈΛୟ͖ग़͢Մೳੑ&
Slide 61
Slide 61 text
zobristͰ·ͩͬͯͳ͍͜ͱ • ϗʔϜٿʹΑΔิਖ਼. ͍ΘΏΔʮύʔΫϑΝΫλʔʯ. • τϥοΩϯάσʔλʹΑΔิਖ਼. ଧٿͳͲͰิਖ਼ͱ͔. • ϝδϟʔϦʔάҎ֎ͷϓϩٿϦʔάʢ͠ʣ ·ͩ·͍ͩ͡ΕΔϙΠϯτ͍͔ͭ͋͘Γͦ͏&
Slide 62
Slide 62 text
݁ͼ • ಛྔΤϯδχΞϦϯάͱϓϩάϥϛϯά • ػցֶशϓϩδΣΫτͱݸਓ։ൃ • ͏ͪΐͬͱઌͷͳ͠
Slide 63
Slide 63 text
ಛྔΤϯδχΞϦϯάͱϓϩάϥϛϯά ͿͬͪΌ͚ͳΜͰ͍͍Ͱ͕͢, ղ͖͍ͨςʔϚ࣍ୈͰݴޠมΘΔ. • ͪΐͬͱ࢛ͨ͠ଇԋࢉͳΒϓϩάϥϛϯά͠ͳͯ͘SpreadsheetͰOK • σʔλϕʔε͑ΔϚϯͳΒSQLͰ͍͍͍͍ͩͨײ͡ʹͳΓͦ͏ • ౷ܭɾࣜϞσϧͱ͔ػցֶश͕བྷΈͳΒPythonͱ͔R ʮ͕ࣗԿΛ͍͔ͨ͠&ʯΛཧղ্ͨ͠Ͱϓϩάϥϛϯά͠Α͏ʂ ͪΖΜ, Pythonͱ͔R͡Όͳͯ͘ଞͷݴޠͰΤΤΜͰ
Slide 64
Slide 64 text
⚾༧ଌʮݸਓ։ൃʯͳػցֶशϓϩδΣΫτ ຊ֨తͳελʔτ20203݄Ͱ͕ͨ͠ௐؚࠪΊΔͱࡢ10݄͔Βελʔτ ΄΅Ұ͔͔ͬͯΔʮݸਓ։ൃʯͳʮػցֶशϓϩδΣΫτʯͰͨ͠.
Slide 65
Slide 65 text
ػցֶशϓϩδΣΫτͷ͠͞ͱݫ͠͞ • ʮϝδϟʔϦʔΨʔͷ༧ଌʯͱ͍͏໌֬ͳ՝ઃఆ͕͋Γ ϒϨͣʹΓ͖ͬͨͷ͕ޭͷཁҼͩͬͨ • ࣗ, ಛྔΤϯδχΞϦϯάʹඞཁͳυϝΠϯࣝ⚾͕͋ͬͨ • ࣮ࡍͷࣄͷ߹…͜Μͳʹ্ख͍͘͘͜ͱ݁ߏ͍͠ͱࢥ͏. υϝΠϯࣝ, ՝ઃఆͷ͠͞, εςʔΫϗϧμʔଟ͍Α, etc… ͜ͷൃදͷ༰, ͦͷ··ࢀߟʹͯ͋͠ͳͨͷࣄʹ׆͖ΔอূͰ͖·ͤΜʂ
Slide 66
Slide 66 text
• ౷ܭɾػցֶशεΩϧΛຏ͘, ΤϯδχΞϦϯάɾεΩϧΛ ৳͢తͰʮݸਓ։ൃͰػցֶशʯΛڧ͓͘͢͢Ί͠·͢ʂ • ͜ͷൃදۓٸࣄଶએݴதͷࣗॗظؒʹ΄΅Γ͖Γ·ͨ͠. ʮDone is better than perfectʯΛStay Homeظؒͷ͓͔͛ͰΕͨ. • ͪͳΈʹ, ݸਓ։ൃΛ࠳ંͤͣଓ͚Δͪΐͬͱલʹॻ͍ͨ https://shinyorke.hatenablog.com/entry/botti-development σʔλαΠΤϯςΟετͦ͜ݸਓ։ൃΛ
Slide 67
Slide 67 text
-> ࠓ·Ͱ ཧ૾ ͏ͪΐͬͱઌͷͳ͠ ʮେ୩ᠳฏ͞Μͷຊྥଧ༧͕Ͱ͖ͨʂʯͷͰޢຎߦϓϩτλΠϐϯά͓͠·͍ ⚾తʹࣄɾݸਓͱͯ͠ଓ͖ͷͳ͕͋͠Γ·͢
Slide 68
Slide 68 text
ࠓճͷՌΛϏδωεɾݸਓͷValueʹ • PECOTAΛ࡞ͬͨωΠτɾγϧόʔΞϝϦΧେ౷ྖબ༧ଌͰҰ༂༗໊ʹ • zobristΛ࡞ͬͨshinyorke͞Μ, • Ϗδωεʢࣄʣʹ͜ͷΞτϓοτΛ׆͔͍ͯ͘͠ • ݸਓʢٿͷݚڀऀʣͱͯ͠Ҿ͖ଓ͖ݚڀ&ϓϩμΫτग़͔͢ • ͱ͍͏Θ͚Ͱ, ࣍ճ࡞ͷߏ͢Ͱʹ͋Γ·͢, ޤ͏͝ظ⚾
Slide 69
Slide 69 text
ήʔϜηοτ⚾ ͝ਗ਼ௌ͋Γ͕ͱ͏͍͟͝·ͨ͠. Shinichi Nakagawa(Twitter/Facebook/etc… @shinyorke)