(ほぼ)Pythonだけで分析基盤を作った #kwskrb #51
by
Shinichi Nakagawa
Link
Embed
Share
Beginning
This slide
Copy link URL
Copy link URL
Copy iframe embed code
Copy iframe embed code
Copy javascript embed code
Copy javascript embed code
Share
Tweet
Share
Tweet
Slide 1
Slide 1 text
(΄΅)Python͚ͩͰੳج൫Λ࡞ͬͨϋφγ ʙPyConJP 2017นଧͪฤʙ Shinichi Nakagawa(Retty.Inc Engineer/Baseball Analyst) kawasaki.rb #051 5ಥೖLTେձ
Slide 2
Slide 2 text
ਖ਼ = ٿϢχϑΥʔϜʂ ※ࣸਅ͖ͰγΣΞ͓ئ͍͠·͢ʂ
Slide 3
Slide 3 text
Who am I ?(͓લ୭Α) • Pythonք۾ͷʮٿͷਓʯͰ͢ • Shinichi Nakagawa(@shinyorke) • Retty.Inc Engineering Manager ݉,ڕྉཧ୲ • Baseball Scientist/ٿσʔλੳऀ • #Python #SABRmetrics #ٿ౷ܭֶ #Agile #Scrum
Slide 4
Slide 4 text
Starting member(͓͠ͳ͕͖) • Kawasaki.rb5ಥೖ͓ΊͰͱ͏͍͟͝·͢ • PythonͰͭ͘ΔʮԶʑٿੳج൫ʯ • Scrapy(σʔλूΊ) • sabr + SQLAlchemy(લॲཧ) • Airflow(λεΫ੍ޚ) • ʲ࣮ફྫʳΧʔϓ͕ڧ͗͢Δཧ༝ΛwRAAͰূ໌͢Δ
Slide 5
Slide 5 text
5ಥೖ ͓ΊͰͱ͏͍͟͝·͢ʂ
Slide 6
Slide 6 text
ࢲͱ #kwskrb • ॳࢀՃ(2014/8)ɿPyCon JP 2014นଧͪLT • ೋճ(2015/9)ɿPyCon JP 2015นଧͪLT(2࿈ଓ) • ࡾճ(2016/8)ɿPyCon JP 2016นଧͪLT(3࿈ଓ) ※࡚Rubyձٞ01 LT • ࢛ճ(2016/12)ɿҿΈͨ͞ͱͨ͠͞ͰձࢀՃ • ޒճ(2017/8)ɿ PyCon JP 2017นଧͪLT(4࿈ଓ)←ΠϚίί
Slide 7
Slide 7 text
(͔ͯ͠͠) #kwskpy ?
Slide 8
Slide 8 text
ͱ͍͏Θ͚Ͱ ࠓPythonͷΛ
Slide 9
Slide 9 text
ٿΛՊֶ͢Δٕज़ PythonΛ༻͍ͨ౷ܭϥΠϒϥϦ࡞ͱੳج൫ߏங 9/8(ۚ) 10:55 a.m.-11:25 a.m. https://pycon.jp/2017/ja/schedule/presentation/15/
Slide 10
Slide 10 text
ࠓੳج൫ͱ ͪΐͬͱͨ͠ੳ&ՄࢹԽΛ ൸࿐͠·͢
Slide 11
Slide 11 text
Զʑٿੳج൫(શମ૾)
Slide 12
Slide 12 text
Զʑٿੳج൫(શମ૾) ᶃ4DSBQZ εΫϨΠϐϯά બखΛ୳ͯ͠อଘ
Slide 13
Slide 13 text
Զʑٿੳج൫(શମ૾) ᶃ4DSBQZ εΫϨΠϐϯά બखΛ୳ͯ͠อଘ ᶄલॲཧ 4"#3NFUSJDT ࢦඪܭࢉσʔλߋ৽
Slide 14
Slide 14 text
Զʑٿੳج൫(શମ૾) ᶃ4DSBQZ εΫϨΠϐϯά બखΛ୳ͯ͠อଘ ᶄલॲཧ 4"#3NFUSJDT ࢦඪܭࢉσʔλߋ৽ ᶅੳՄࢹԽ +VQZUFS σʔλΛΰχϣͬͯՄࢹԽ
Slide 15
Slide 15 text
Զʑٿੳج൫(શମ૾) ᶃ4DSBQZ εΫϨΠϐϯά બखΛ୳ͯ͠อଘ ᶄલॲཧ 4"#3NFUSJDT ࢦඪܭࢉσʔλߋ৽ ᶅੳՄࢹԽ +VQZUFS σʔλΛΰχϣͬͯՄࢹԽ "JSqPX +0#ཧ εΫϨΠϐϯάલॲཧͷ࣮ߦ੍ޚ
Slide 16
Slide 16 text
ίΞٕज़(ओʹPython) • Scrapy • લॲཧ(sabr + SQLAlchemy) • Airflow ※JupyterΈΜͳͬͯΔͱࢥ͏ͷͰলུ
Slide 17
Slide 17 text
ScrapyʙΫϩʔϥʔFW • WebαΠτͷΫϩʔϧͱεΫϨΠϐϯά,σʔλͷอଘͳ ͲΛҰؾ௨؏ʹߦ͑ΔΫϩʔϥʔFW • ΫϩʔϥʔքͷDjango/Ruby On RailsͱݺΜͰ͍͍ଘࡏ • εέδϡʔϥʔ,UserAgent,HTTP Header,μϯϩʔυͷ λΠϛϯά,Ωϟογϡetc…ඞཁʹͳΔͷ͕͋Β͔͡Ί ༻ҙ͞Ε͍ͯΔ&ύϥϝʔλͷઃఆͳͲͰ؆୯ʹઃఆՄೳ
Slide 18
Slide 18 text
લॲཧ(sabr + SQLAlchemy) • ʲ՝ʳԿग़ͯ͘ΔηΠόʔϝτϦΫεܭࢉ ຖճίʔυॻ͘ͷΞϨͩͳ͋ • ͱ͍͏Θ͚Ͱ,OPS,RC,wOBA,wRAA,ΞμϜɾμϯetc… Λܭࢉ͢ΔΫϥεΛύοέʔδʹͯ͠ެ։ • https://github.com/Shinichi-Nakagawa/sabr • εΫϨΠϐϯάͨ݁͠Ռ͔Βࢦඪܭࢉ͢ΔΑ͏ʹͨ͠ DBૢ࡞SQLAlchemy(O/R Mapper)ͰαΫοͱ։ൃ
Slide 19
Slide 19 text
SABR(Example) $ pip install sabr $ python >>> import sabr >>> from sabr.stats import Stats >>> Stats.hr9(26, 209.7) # Yu Darvish(2013) HR/9 1.1
Slide 20
Slide 20 text
AirflowʙJOBཧ • σʔλΛຖΫϩʔϧ&εΫϨΠϐϯά JOBཧ͍ΔΑͶ? • ͱ͍͏༁Ͱ,AirbnbۘͷʮAirflow(ؾྲྀ)ʯΛར༻ https://airflow.incubator.apache.org/ • ؾྲྀ(airflow)ͷ༻ʹྲྀΕͯ͑ΔΒ͍͠… ͕,ʹͱͬͯʮཚؾྲྀ(Turbulence)ʯͩͬͨw ※Կ͕ཚؾྲྀ͔PyCon JPຊ൪orࠓͷ࠙ձͰʂ • ઃఆͱ͔ಈ࡞͕Ϋι໘͍͘͞ͷͰDocker imageʹͨ͠(·ͩ։ൃத) https://hub.docker.com/r/shinyorke/airflow/
Slide 21
Slide 21 text
[ྫ]ౡଧઢͱڊਓଧઢΛൺֱ • 2017/8/20࣌ͷσʔλͰౡͱڊਓΛൺֱ • ΄΅نఆଧͷଧऀͷwRAAΛൺֱͯ͠ධՁ ˞wRAA(ଧܸߩݙ)ɿ+10Ҏ্ੌ͍,ϚΠφε(ry • #kwskrb ͷ #51 ճʹͪͳΜͰ, ౡͷ #51ͷύϑΥʔϚϯεධՁ͍ͭͰʹΔ
Slide 22
Slide 22 text
ౡVSڊਓ(نఆଧ੮Ҏ্) ౡͷࣈ͕ख,ͳ͓ڊਓ
Slide 23
Slide 23 text
ౡVSڊਓ, wRAA(ଧܸߩݙ)ΛάϥϑԽ ࠨɿౡ,ӈɿڊਓ…͕ࠩ։͖͍͗͢
Slide 24
Slide 24 text
ླͷwOBA(ॏΈ͖ग़ྥ)ͱwRAA(ଧܸߩݙ) 8/15-8/20·Ͱ,ԜΜͰ͍Δ194ଧແ҆ଧ
Slide 25
Slide 25 text
·ͱΊ • (PythonͰશ෦Ͱ͖Δͷ)ݟͯͷ௨ΓͰ͢ • ऩूˠલॲཧˠՄࢹԽΛಉ͡ݴޠͰ Ұؾ௨؏ʹ࡞ΕΔͷָ • ScrapyͱAirflowͷΈ߹ΘͤͰσʔλऩू&อଘ݁ߏΠέΔ (ͨͩ͠,Airflowͷҋਂ͍) • ౡଧઢΤά͍,ڊਓ͕ΜΕ,ླ͍͢͝ • ͳʹͱ͋Ε #kwskrb 5͓ΊͰͱ͏͍͟͝·͢ʂ
Slide 26
Slide 26 text
ήʔϜηοτʂʂʂ ͝ਗ਼ௌ͋Γ͕ͱ͏͍͟͝·ͨ͠&PyCon JP 2017Ͱ͓ձ͍͠·͠ΐ͏ʂ Shinichi Nakagawa(Twitter/Facebook/hatena:@shinyorke)