野球を科学する技術-Pythonと統計ライブラリと分析基盤 #pyconjp

2c0947c6a28e7f771ebd9859ecf54e5c?s=47 Shinichi Nakagawa
September 08, 2017

野球を科学する技術-Pythonと統計ライブラリと分析基盤 #pyconjp

PyConJP 2017登壇資料

https://pycon.jp/2017/ja/schedule/presentation/15/

#Python #野球統計学 #セイバーメトリクス #Airflow #Scrapy

2c0947c6a28e7f771ebd9859ecf54e5c?s=128

Shinichi Nakagawa

September 08, 2017
Tweet

Transcript

  1. 3.

    Who am I ?(͓લ୭Α) • ໺ٿͷਓͰ͢(PythonΫϥελతʹ͸) • Shinichi Nakagawa(@shinyorke) •

    Retty.Inc Engineer/ڕྉཧ୲౰ • Baseball Scientist(ݸਓ׆ಈ) • Python΋͘΋ࣗ͘शࣨ #rettypy • #Python #SABRmetrics #໺ٿ౷ܭֶ #Agile #Scrum
  2. 4.

    Starting member(͓͠ͳ͕͖) • Զʑ΍͖͏෼ੳج൫ʮbradfordʯΛPythonͳͲͰ࡞ͬͨ • ෼ੳج൫ͷ͘͠Έ(Scrapy, Airflow, ౷ܭϥΠϒϥϦetc…) • ໺ٿΛՊֶ͢Δٕज़

    • ໺ٿՊֶͷجૅ஌ࣝ(Liner Weights, ಘ఺ظ଴஋,ಘ఺Ձ஋) • ʮ࢛൪ଧऀ͕ফ͑ͨνʔϜʯͷͦͷޙΛ௥͏ • ·ͱΊ
  3. 6.

    PyConJPͱ΍͖͏ͷྺ࢙ • PyConJP 2014ʮPythonͰ͸͡ΊΔ໺ٿϓϩάϥϛϯάʯ
 Infra as a CodeͱWeb Application(Django),ΠϯϓϨʔଧ཰(BABIP) •

    PyConJP 2015ʮ໺ٿHack!ʙPythonΛ༻͍ͨσʔλ෼ੳͱՄࢹԽʯ
 PyDataͱԶʑ෼ੳج൫(Jupyter,matplotlib,Docker),ΞμϜɾμϯ཰ • PyConJP 2016ʮϏοΫσʔλͱPythonͰ͸͡ΊΔ໺ٿͷ౷ܭ෼ੳʯ
 εΫϨΠϐϯάͱύοέʔδԽ(Beautifulsoup4),WHIP,౤ٿ෼ੳ • PyConJP 2017ʮ໺ٿΛՊֶ͢Δٕज़ʯˡNew!(4೥࿈ଓ4ճ໨)
 Զʑ෼ੳج൫(Scrapy, Airflow, Docker, sabr, Redash), Liner Weights, wRAA
  4. 18.

    ᶃ4DSBQZ εΫϨΠϐϯά  બख੒੷Λ୳ͯ͠อଘ ͜͜ʹ໺ٿσʔλΛूΊΔ ᶄલॲཧ 4"#3NFUSJDT  ࢦඪ஋ܭࢉσʔλߋ৽ ᶅ"JSqPX

    +0#؅ཧ  4DSBQZ ᶃ ͱલॲཧ ᶄ Λఆظ࣮ߦ ᶆ෼ੳՄࢹԽ 3FEBTI  ܾ·ͬͨϝτϦΫεΛݟΔ Զʑ΍͖͏෼ੳج൫(શମ૾)
  5. 19.

    ᶃ4DSBQZ εΫϨΠϐϯά  બख੒੷Λ୳ͯ͠อଘ ͜͜ʹ໺ٿσʔλΛूΊΔ ᶄલॲཧ 4"#3NFUSJDT  ࢦඪ஋ܭࢉσʔλߋ৽ ᶅ"JSqPX

    +0#؅ཧ  4DSBQZ ᶃ ͱલॲཧ ᶄ Λఆظ࣮ߦ ᶇ෼ੳՄࢹԽ +VQZUFS  ԾઆΛܾΊ࣮ͯݧతͳ෼ੳ Զʑ΍͖͏෼ੳج൫(શମ૾) ᶆ෼ੳՄࢹԽ 3FEBTI  ܾ·ͬͨϝτϦΫεΛݟΔ
  6. 21.

    ScrapyʙΫϩʔϥʔFW • WebαΠτͷΫϩʔϧͱεΫϨΠϐϯά,σʔλͷอଘͳ ͲΛҰؾ௨؏ʹߦ͑ΔΫϩʔϥʔFW • ΫϩʔϥʔքͷDjango/Ruby On RailsͱݺΜͰ͍͍ଘࡏ • εέδϡʔϥʔ,UserAgent,HTTP

    Header,μ΢ϯϩʔυͷ λΠϛϯά,Ωϟογϡetc…ඞཁʹͳΔ΋ͷ͕͋Β͔͡Ί ༻ҙ͞Ε͍ͯΔ&ύϥϝʔλͷઃఆͳͲͰ؆୯ʹઃఆՄೳ
  7. 26.

    ΍͖͏෼ੳج൫Scrapyઃఆ(ൈਮ) • robots.txtʹै͏ • ROBOTSTXT_OBEY=True • ಉ࣌ϦΫΤετ਺ɿ4 • CONCURRENT_REQUESTS=4 •

    ϦΫΤετִؒɿ60ඵ • DOWNLOAD_DELAY=60 • Ωϟογϡɿ൒೔ؒอ࣋(΍Γ௚͕͠ޮ͘Α͏ʹ) • HTTPCACHE_ENABLED = True • HTTPCACHE_EXPIRATION_SECS = 60 * 60 * 12 • ࢓༷: https://doc.scrapy.org/en/latest/topics/settings.html?highlight=settings.py#settings
  8. 30.

    SABR(Example) $ pip install sabr $ python >>> import sabr

    >>> from sabr.stats import Stats >>> Stats.hr9(26, 209.7) # Yu Darvish(2013) HR/9 1.1
  9. 36.

    AirflowͷਏΈ͔ΒಀΕΔ • Docker Imageʹͯ͠ެ։ͨ͠(docker pull shinyorke/airflow)
 ܁Γฦ͠࡞ͬͯյ͢ͳΒDockerͰ͠ΐʂ
 ͱ͍͏͜ͱͰ৭ʑࢼͭͭ͠,҆ఆ൛ΛDocker ImageԽͨ͠
 https://hub.docker.com/r/shinyorke/airflow/

    • ݁Ռ,ؾܰʹ࡞ͬͯյͤΔ,ߏ੒Λ࿔ΕΔ؀ڥʹ
 ཚؾྲྀʹΑΔཚΕ͸ଟগϚγʹͳͬͨ(Ұ෦όάͬΆ͍ͷ͸͋Δ͕) • ৄࡉ͸ϒϩάʹॻ͖·ͨ͠&Contribute͓଴͍ͪͯ͠·͢
 http://shinyorke.hatenablog.com/entry/airflow-docker
  10. 42.

    ໺ٿΛՊֶ͢Δٕज़ • Liner Weights(LWTS) • ಘ఺ظ଴஋(Run Expectancy) • ಘ఺Ձ஋(Run Value)

    • ಘ఺Ձ஋Λݩʹͨ͠ηΠόʔϝτϦΫεࢦඪ • wOBA(Weighted On-Base Average, ॏΈ෇͖ग़ྥ཰) • wRAA(Weighted Runs Above Average, ଧܸߩݙ౓)
  11. 46.

    wOBAͷ਺ࣜͱ࣮૷(sabrΑΓ) def woba_npb(cls, bb, hbp, _1b, _2b, _3b, hr, ab,

    sf, ibb=0, e_bb=0): """ Weighted on-base average for NPB(wOBA) http://1point02.jp/ :param bb: base on ball :param hbp: hit by pitch :param _1b: single :param _2b: double :param _3b: triple :param hr: home run :param ab: at bat :param sf: sacrifice fly :param ibb: intentional base on balls(default:0) :param e_bb: base on ball for error(default:0) :return: (float) wOBA """ u_bb = round(0.692 * float(bb-ibb), 3) u_hbp = round(float(0.73 * hbp), 3) u_e_bb = round(0.966 * float(e_bb), 3) u_h = round(0.865 * float(_1b), 3) + round(1.334 * float(_2b), 3)\ + round(1.725 * (_3b), 3) + round(2.065 * float(hr), 3) u_pa = round(float(ab + bb - ibb + hbp + sf), 3) return round((u_bb + u_hbp + u_e_bb + u_h) / u_pa, 3)
  12. 47.

    wRAA #ͱ͸ • wOBAΛಘ఺ʹ߹Θͤͯscaleͨ͠ࢦඪ • Weighted Runs Above Average(ଧऀͷଧܸߩݙ౓)ͷུ •

    ʮฏۉతͳଧऀ͕ಉ͡ଧ੮਺ཱͬͨ৔߹ʹൺ΂ͯ૿΍͠ ͨಘ఺ʯΛΠϝʔδ • wRAA=(ଧऀͷwOBA-ϦʔάwOBA) / wOBAscale * ଧ ੮਺
  13. 48.

    wRAAͷ਺ࣜͱ࣮૷(sabrΑΓ) def wraa(cls, woba, lg_woba, pa, woba_scale=1.24): """ Weighted Runs

    Above Average(wRAA) http://1point02.jp/ :param woba: weighted on-base average :param lg_woba: weighted on-base average(league average) :param pa: plate appearance :param woba_scale: weighted on-base average scale(default:1.24) :return: (float) wRAA """ return round(((woba - lg_woba) / woba_scale) * float(pa), 1)
  14. 66.

    ݁࿦ • ૯߹ܭͰ໿30%νʔϜͷwRAA͕௿Լͨ͜͠ͱ͕൑໌ • ϚΠφεཁҼ • ࢛൪ଧऀ͕ফ͑ͨ #ͦΒͦ͏Α • ໺͕ؒॱௐʹwRAAͰϚΠφε

    #ͦΒͦ͏Α • ࠓճͷՄࢹԽ͸Redashͷํ͕΍Γ΍͔ͬͨ͢ • Jupyterͱ͍͏͔Holoviews࢖͍͜ͳ͍ͨ͠ #͓ؾ࣋ͪ
  15. 68.

    ࣮ࡍͷ޿ౡ͸... ※9/7࣌఺ • 8/23-9/7ͷ੒੷ • 9উ5ഊɹ˞6࿈উத • ಘࣦ఺ࠩ +20 ※70ಘ఺,

    50ࣦ఺ • উ཰.642, ϐλΰϥεউ཰.662 • 30%ͷಘ఺ྗ௿ԼΛாফ͠ʹͨ͠ଧऀ͕͍ΔͬΆ͍
  16. 71.