(ほぼ)Pythonだけで分析基盤を作った #kwskrb #51
by
Shinichi Nakagawa
×
Copy
Open
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Slide 1
Slide 1 text
(΄΅)Python͚ͩͰੳج൫Λ࡞ͬͨϋφγ ʙPyConJP 2017นଧͪฤʙ Shinichi Nakagawa(Retty.Inc Engineer/Baseball Analyst) kawasaki.rb #051 5ಥೖLTେձ
Slide 2
Slide 2 text
ਖ਼ = ٿϢχϑΥʔϜʂ ※ࣸਅ͖ͰγΣΞ͓ئ͍͠·͢ʂ
Slide 3
Slide 3 text
Who am I ?(͓લ୭Α) • Pythonք۾ͷʮٿͷਓʯͰ͢ • Shinichi Nakagawa(@shinyorke) • Retty.Inc Engineering Manager ݉,ڕྉཧ୲ • Baseball Scientist/ٿσʔλੳऀ • #Python #SABRmetrics #ٿ౷ܭֶ #Agile #Scrum
Slide 4
Slide 4 text
Starting member(͓͠ͳ͕͖) • Kawasaki.rb5ಥೖ͓ΊͰͱ͏͍͟͝·͢ • PythonͰͭ͘ΔʮԶʑٿੳج൫ʯ • Scrapy(σʔλूΊ) • sabr + SQLAlchemy(લॲཧ) • Airflow(λεΫ੍ޚ) • ʲ࣮ફྫʳΧʔϓ͕ڧ͗͢Δཧ༝ΛwRAAͰূ໌͢Δ
Slide 5
Slide 5 text
5ಥೖ ͓ΊͰͱ͏͍͟͝·͢ʂ
Slide 6
Slide 6 text
ࢲͱ #kwskrb • ॳࢀՃ(2014/8)ɿPyCon JP 2014นଧͪLT • ೋճ(2015/9)ɿPyCon JP 2015นଧͪLT(2࿈ଓ) • ࡾճ(2016/8)ɿPyCon JP 2016นଧͪLT(3࿈ଓ) ※࡚Rubyձٞ01 LT • ࢛ճ(2016/12)ɿҿΈͨ͞ͱͨ͠͞ͰձࢀՃ • ޒճ(2017/8)ɿ PyCon JP 2017นଧͪLT(4࿈ଓ)←ΠϚίί
Slide 7
Slide 7 text
(͔ͯ͠͠) #kwskpy ?
Slide 8
Slide 8 text
ͱ͍͏Θ͚Ͱ ࠓPythonͷΛ
Slide 9
Slide 9 text
ٿΛՊֶ͢Δٕज़ PythonΛ༻͍ͨ౷ܭϥΠϒϥϦ࡞ͱੳج൫ߏங 9/8(ۚ) 10:55 a.m.-11:25 a.m. https://pycon.jp/2017/ja/schedule/presentation/15/
Slide 10
Slide 10 text
ࠓੳج൫ͱ ͪΐͬͱͨ͠ੳ&ՄࢹԽΛ ൸࿐͠·͢
Slide 11
Slide 11 text
Զʑٿੳج൫(શମ૾)
Slide 12
Slide 12 text
Զʑٿੳج൫(શମ૾) ᶃ4DSBQZ εΫϨΠϐϯά બखΛ୳ͯ͠อଘ
Slide 13
Slide 13 text
Զʑٿੳج൫(શମ૾) ᶃ4DSBQZ εΫϨΠϐϯά બखΛ୳ͯ͠อଘ ᶄલॲཧ 4"#3NFUSJDT ࢦඪܭࢉσʔλߋ৽
Slide 14
Slide 14 text
Զʑٿੳج൫(શମ૾) ᶃ4DSBQZ εΫϨΠϐϯά બखΛ୳ͯ͠อଘ ᶄલॲཧ 4"#3NFUSJDT ࢦඪܭࢉσʔλߋ৽ ᶅੳՄࢹԽ +VQZUFS σʔλΛΰχϣͬͯՄࢹԽ
Slide 15
Slide 15 text
Զʑٿੳج൫(શମ૾) ᶃ4DSBQZ εΫϨΠϐϯά બखΛ୳ͯ͠อଘ ᶄલॲཧ 4"#3NFUSJDT ࢦඪܭࢉσʔλߋ৽ ᶅੳՄࢹԽ +VQZUFS σʔλΛΰχϣͬͯՄࢹԽ "JSqPX +0#ཧ εΫϨΠϐϯάલॲཧͷ࣮ߦ੍ޚ
Slide 16
Slide 16 text
ίΞٕज़(ओʹPython) • Scrapy • લॲཧ(sabr + SQLAlchemy) • Airflow ※JupyterΈΜͳͬͯΔͱࢥ͏ͷͰলུ
Slide 17
Slide 17 text
ScrapyʙΫϩʔϥʔFW • WebαΠτͷΫϩʔϧͱεΫϨΠϐϯά,σʔλͷอଘͳ ͲΛҰؾ௨؏ʹߦ͑ΔΫϩʔϥʔFW • ΫϩʔϥʔքͷDjango/Ruby On RailsͱݺΜͰ͍͍ଘࡏ • εέδϡʔϥʔ,UserAgent,HTTP Header,μϯϩʔυͷ λΠϛϯά,Ωϟογϡetc…ඞཁʹͳΔͷ͕͋Β͔͡Ί ༻ҙ͞Ε͍ͯΔ&ύϥϝʔλͷઃఆͳͲͰ؆୯ʹઃఆՄೳ
Slide 18
Slide 18 text
લॲཧ(sabr + SQLAlchemy) • ʲ՝ʳԿग़ͯ͘ΔηΠόʔϝτϦΫεܭࢉ ຖճίʔυॻ͘ͷΞϨͩͳ͋ • ͱ͍͏Θ͚Ͱ,OPS,RC,wOBA,wRAA,ΞμϜɾμϯetc… Λܭࢉ͢ΔΫϥεΛύοέʔδʹͯ͠ެ։ • https://github.com/Shinichi-Nakagawa/sabr • εΫϨΠϐϯάͨ݁͠Ռ͔Βࢦඪܭࢉ͢ΔΑ͏ʹͨ͠ DBૢ࡞SQLAlchemy(O/R Mapper)ͰαΫοͱ։ൃ
Slide 19
Slide 19 text
SABR(Example) $ pip install sabr $ python >>> import sabr >>> from sabr.stats import Stats >>> Stats.hr9(26, 209.7) # Yu Darvish(2013) HR/9 1.1
Slide 20
Slide 20 text
AirflowʙJOBཧ • σʔλΛຖΫϩʔϧ&εΫϨΠϐϯά JOBཧ͍ΔΑͶ? • ͱ͍͏༁Ͱ,AirbnbۘͷʮAirflow(ؾྲྀ)ʯΛར༻ https://airflow.incubator.apache.org/ • ؾྲྀ(airflow)ͷ༻ʹྲྀΕͯ͑ΔΒ͍͠… ͕,ʹͱͬͯʮཚؾྲྀ(Turbulence)ʯͩͬͨw ※Կ͕ཚؾྲྀ͔PyCon JPຊ൪orࠓͷ࠙ձͰʂ • ઃఆͱ͔ಈ࡞͕Ϋι໘͍͘͞ͷͰDocker imageʹͨ͠(·ͩ։ൃத) https://hub.docker.com/r/shinyorke/airflow/
Slide 21
Slide 21 text
[ྫ]ౡଧઢͱڊਓଧઢΛൺֱ • 2017/8/20࣌ͷσʔλͰౡͱڊਓΛൺֱ • ΄΅نఆଧͷଧऀͷwRAAΛൺֱͯ͠ධՁ ˞wRAA(ଧܸߩݙ)ɿ+10Ҏ্ੌ͍,ϚΠφε(ry • #kwskrb ͷ #51 ճʹͪͳΜͰ, ౡͷ #51ͷύϑΥʔϚϯεධՁ͍ͭͰʹΔ
Slide 22
Slide 22 text
ౡVSڊਓ(نఆଧ੮Ҏ্) ౡͷࣈ͕ख,ͳ͓ڊਓ
Slide 23
Slide 23 text
ౡVSڊਓ, wRAA(ଧܸߩݙ)ΛάϥϑԽ ࠨɿౡ,ӈɿڊਓ…͕ࠩ։͖͍͗͢
Slide 24
Slide 24 text
ླͷwOBA(ॏΈ͖ग़ྥ)ͱwRAA(ଧܸߩݙ) 8/15-8/20·Ͱ,ԜΜͰ͍Δ194ଧແ҆ଧ
Slide 25
Slide 25 text
·ͱΊ • (PythonͰશ෦Ͱ͖Δͷ)ݟͯͷ௨ΓͰ͢ • ऩूˠલॲཧˠՄࢹԽΛಉ͡ݴޠͰ Ұؾ௨؏ʹ࡞ΕΔͷָ • ScrapyͱAirflowͷΈ߹ΘͤͰσʔλऩू&อଘ݁ߏΠέΔ (ͨͩ͠,Airflowͷҋਂ͍) • ౡଧઢΤά͍,ڊਓ͕ΜΕ,ླ͍͢͝ • ͳʹͱ͋Ε #kwskrb 5͓ΊͰͱ͏͍͟͝·͢ʂ
Slide 26
Slide 26 text
ήʔϜηοτʂʂʂ ͝ਗ਼ௌ͋Γ͕ͱ͏͍͟͝·ͨ͠&PyCon JP 2017Ͱ͓ձ͍͠·͠ΐ͏ʂ Shinichi Nakagawa(Twitter/Facebook/hatena:@shinyorke)