長旅は疲れるけど野球とPythonは好きだ / PyLadiesTokyo-5-years-LT

長旅は疲れるけど野球とPythonは好きだ / PyLadiesTokyo-5-years-LT

#SABRmetrics #Baseball #Python #GIS

https://pyladies-tokyo.connpass.com/event/145046/

2c0947c6a28e7f771ebd9859ecf54e5c?s=128

Shinichi Nakagawa

October 19, 2019
Tweet

Transcript

  1. ௕ཱྀ͸ർΕΔ✈Ͱ͸⽁͸ʁ Shinichi Nakagawa a.k.a. @shinyorke PyLadies Tokyo 5प೥ه೦ύʔςΟʔ

  2. #PyLadiesTokyo 5प೥͓ΊͰͱ͏͍͟͝·͢ʂ ࠓ೥΋͜͏͓ͯ͠ॕ͍Ͱ͖ͯخ͍͠Ͱ͢ʢ5೥࿈ଓ5ճ໨ʣ

  3. ࠓ೔ͷ͓࿩ • ཱྀߦͱϝδϟʔϦʔάʢMLBʣ • GISʢҐஔ৘ใʣΛGeoPyͰૢΔ • Ҡಈڑ཭ͱνʔϜͷύϑΥʔϚϯε

  4. ʊਓਓਓਓਓਓਓਓਓਓʊ ʼɹಥવͷ໺ٿΫΠζɹʻ ʉY^Y^Y^Y^Y^Y^Y^Y^Yʉ

  5. ໰୊ɹ˞௚ײͰ͓౴͍͑ͩ͘͞ MLBͰ૯Ҡಈڑ཭͕Ұ൪௕͔ͬͨνʔϜͷmile͸?
 ※2018೥ɾ162ࢼ߹෼ͷूܭ, ݪଇ๺ถେ཮಺ 1. 30,000milesະຬ 2. 30,000milesҎ্ 3. 33,400milesͪΐ͏Ͳʢʁʁʁʣ

  6. ໰୊ɹ˞௚ײͰ͓౴͍͑ͩ͘͞ MLBͰ૯Ҡಈڑ཭͕Ұ൪௕͔ͬͨνʔϜͷmile͸?
 ※2018೥ɾ162ࢼ߹෼ͷूܭ, ݪଇ๺ถେ཮಺ 1. 30,000milesະຬ 2. 30,000milesҎ্ 3. 33,400milesͪΐ͏Ͳʢʁʁʁʣ

    ʁʁʁʮͳΜͰ΍ʂ)BOTIJOʂؔ܎ͳ͍΍Ζʂʂʯ
  7. ਖ਼ղ…ͷલʹ PythonͰҐஔ৘ใΛग़͠ ڑ཭ΛٻΊΔํ๏ʹ͍ͭͯ.

  8. ࢸͬͯγϯϓϧͰ͢. 1. Geocodingͯ͠ٿ৔ͷҐஔ৘ใΛग़͢.
 ۩ମతʹ͸ٿ৔໊͔ΒҢ౓ܦ౓Λग़͢. 2. 1.ͷσʔλΛݩʹ, ٿ৔ؒͷڑ཭Λग़͢ 3. 2.ΛεέδϡʔϧͱJOIN, νʔϜ͝ͱʹूܭ,

    CSVग़ྗ. ࠓ೔͸1.ͱ2.ΛPythonͰ͍͍ײ͡ʹ͢Δํ๏Λ.
 ※3.͸టष͍PandasՔۀͳͷͰࠓճ͸આ໌͠·ͤΜ
  9. GeoPyΛ࢖͍͜ͳͯ͠ Ґஔ৘ใ΋ڑ཭΋ग़͢

  10. GeoPy • PythonͰGeocoding͢Δͱ͖ͷఆ൪ϥΠϒϥϦ • ෳ਺ͷΠϯλʔωοτ஍ਤʢGoogle, Azure, OSM, etc…ʣͷAPIΛಉ͡Α͏ͳίʔυͰPython͔Βѻ͑Δ • ެࣜυΩϡϝϯτ͕ৄ͍͠ͷͰਅࣅ͢Ε͹େମ͍͚Δ

    • https://geopy.readthedocs.io/en/stable/#
  11. GeoPyͰٿ৔໊͔ΒGeocoding • MLBͷSean Lahman Databaseʹ
 ٿ৔σʔλ͕͋Δʢͳ͓, Φʔϓϯσʔλʣ • ٿ৔ͷ໊લͱ౎ࢢ໊Λ࣋ͬͯΔͷͰ, ͔ͦͬΒ

    Geocodingͯ͋͛͠Ε͹OK • શମͷ7ׂ͸͜ΕͰΠέͨ, ࢒Γ͸ख࡞ۀ(ry
  12. ʲงғؾʳGeoPyͰGeocoding import csv import time from geopy.geocoders import Nominatim #

    Geocoder(ͲͷαʔϏε࢖͏͔)ࢦఆ from geopy.exc import GeocoderTimedOut from retry import retry # ࠓճ͸OSMϕʔεͷ΋ͷΛ࢖͏ geoLocator = Nominatim(user_agent='Baseball Radar24 / 0.1 shinyorke@example.com’) # Geocoding͍ͯ͠Δͱ͜Ζ. งғؾΛݟͯRetry @retry((GeocoderTimedOut, ), delay=5, backoff=2, max_delay=4) def get_location(name, alias): loc = geoLocator.geocode(name) if not loc: loc = geoLocator.geocode(alias) return loc # ٿ৔໊ΛGeocodingͰ͖ΔΑ͏ʹͪΐͬ͜ͱ͚ͩΫϨϯδϯά def park_name(name): return name.replace('I', '').replace('II', '').replace('III', '').replace('IV', '').strip() Nominatimͱ͍͏OSMσʔλͷAPIͰGeocoding ٿ৔໊͸geocodersʹؾʹೖΒΕΔΑ͏ʁʹͪΐͬͱ͚ͩΫϨϯδϯά
  13. ʲงғؾʳGeoPyͰGeocoding # ͔ͬ͜Β࣮ߦ # ٿ৔Ϧετ values = [] with open('./datasets/baseballdatabank/Parks.csv',

    'r') as f: reader = csv.DictReader(f) for r in reader: values.append(r) # GeocodingΛͻͨ͢Β࣮ߦ locations = [] for park in values: loc = get_location(park_name(park['park.name']), park_name(park['park.alias'])) if loc: locations.append( { 'id': park['park.key'], 'name': park['park.name'], 'lat': loc.latitude, 'lng': loc.longitude, 'address': loc.address, 'state': park['state'], 'country': park['country'] } ) else: print('geo not found: ', park['park.name'], park['park.key']) # CSVʹॻ͖ࠐΈ fields = ['id', 'name', 'lat', 'lng', 'address', 'state', 'country'] with open('./datasets/parklist.csv', 'w') as f: writer = csv.DictWriter(f, fieldnames=fields) writer.writeheader() for loc in locations: writer.writerow(loc) CSVΛಡΈࠐΜͰͻͨ͢ΒGeocoding͢ΔʢલͷεϥΠυͷؔ਺Ͱʣ ͜ͷล͸ׂͱී௨ͷεΫϦϓτͩͬͨΓ͢ΔͷͰ௚ײత͔΋.
  14. ʲงғؾʳGeoPyͰೋ఺ؒڑ཭ # ڑ཭Λग़͢ from geopy.distance import great_circle, geodesic def park2park_distance_datasets(self,

    park_datasets: dict) -> list: values = [] for id1, park1 in park_datasets.items(): for id2, park2 in park_datasets.items(): if id1 == id2: continue park1_geo = (park1.get('lat'), park1.get('lng')) park2_geo = (park2.get('lat'), park2.get(‘lng')) # geodesic͕ଌ஍ઢ, great_circle͕େԁڑ཭ values.append( { 'id': f"{id1}_{id2}", 'miles_geo': geodesic(park1_geo, park2_geo).mile 'miles_circle': great_circle(park1_geo, park2_geo).mile } ) return values geopy.distanceͷؔ਺Λ࢖͏, ଌ஍ઢ, େԁڑ཭౳ϝιου͕͍͔ͭ͘. Ҿ਺͸ଌΓ͍ͨڑ཭ͷlat/lngೖͬͨtuple
  15. ໰୊ɹ˞࠶ܝɾೋ୒Ͱ͢ MLBͰ૯Ҡಈڑ཭͕Ұ൪௕͔ͬͨνʔϜͷmile͸?
 ※2018೥ɾ162ࢼ߹෼ͷूܭ, ݪଇ๺ถେ཮಺ 1. 30,000milesະຬ 2. 30,000milesҎ্ 3. 33,400milesͪΐ͏Ͳʢʁʁʁʣ

  16. ʲ౴ʳ2.ʮ30,000mileҎ্ʯ 1Ґ͕40,000ϚΠϧ, 30Ґ͕20,000ϚΠϧͪΐ͍

  17. ͪͳΈʹ্Ґ5νʔϜ 5νʔϜத5νʔϜ͕֤Ϧʔάͷ੢஍۠ ώϡʔετϯ͸ԕ͍ԕ͍γΞτϧɾΦʔΫϥϯυ૬ख͕ଟ͍

  18. ʁʁʁʮ௕ཱྀ͸͔ͭΕΔͷͰ͸ʁʯ ϝδϟʔϦʔά͸֤νʔϜઐ༻ػʢνϟʔλʔػʣͰҠಈ͍ͯ͠Δ ͱ͸͍͑, ೥ؒ40,000mileҎ্ͷҠಈͬͯπϥϛͳͷͰ͸ʁʁ

  19. ݕূํ๏ • ೥ؒͷҠಈڑ཭ͱओཁࢦඪͷϚτϦΫεΛग़͢ • উ཰ • ಘࣦ఺ࠩ • ༧ଌউ཰ʢϐλΰϥεউ཰ʣ ※ಘࣦ఺͔Βউ཰Λग़͢

    • ͳʹ͔ۙͦ͏ͳ΋ͷ͕ݟ͔ͭͬͨΒϥοΩʔ • Ռ࣮ͨͯ͠ࡍ͸…ʂʁ
  20. ʲਤʳҠಈڑ཭ͱ֤ࢦඪͷϚτϦΫε ࢦඪ͸উ཰͓Αͼϐλΰϥεউ཰, ؔ܎͋ͬͨΒ૬ؔ͋Δ͸ͣ
 ˞ϐλΰϥεউ཰ɿಘࣦ఺ࠩΛ࢖ͬͨ༧ଌউ཰Ϟσϧ

  21. ʁʁʁʮ͓͔͍͠ͷ͸͓લͷҠಈڑ཭ͩΑʯ ૬ؔ܎਺Λग़͢·Ͱ΋ͳ͍݁Ռʹʢ਒͑ʣ

  22. ऑ͍ɾਏ͍ʹڑ཭͸ؔ܎ͳ͍ ͍΍,ͳΜͱͳͦ͘Μͳ༧ײ͸ͨ͠ΜͰ͚͢ͲͶ()

  23. ͪͳΈʹ஍ҬΛՄࢹԽ͢Δͱ ຊྥଧ͕ͨ͘͞Μग़Δͱ͜Ζ,ೋྥଧʢҎԼಉจʣ Kepler.glʹCSVΛ৯ΘͤΔͱ͔͍͍ͬ͜ՄࢹԽ͕ʂ

  24. ͓ͬͱ ࣗݾ঺հ๨Εͯͨ :ukkari:

  25. Who am I?ʢ͓લ୭Αʣ • Shinichi Nakagawa(@shinyorke) • ͔ͭͯ໺ٿΤϯδχΞΛ࢓ࣄʹͯͨ͠ਓ • ઌ݄·Ͱɿʮϓϩʯͷ໺ٿΤϯδχΞ

    • ࠓ݄͔Βɿʮ໺ੜʯͷ໺ٿΤϯδχΞʢʹ෮ؼʣ • Python΋͘΋ࣗ͘शࣨʢ#rettypyʣΦʔΨφΠβʔ • Web, σʔλαΠΤϯε, Opsʹ⽁ΛPythonͰ΍Δਓ
  26. JX௨৴ࣾʢʹస৬ͯ͠·ͨ͠ʣ • ࠓ݄͔ΒʢגʣJX௨৴ࣾͷSenior Engineerʹ • σʔλج൫ΛθϩϕʔεͰ্ཱͪ͛Δ࢓ࣄ
 ʢଞ, Pythonؔ࿈ͷ͋Ε͜Εɾ࠾༻޿ใͳͲʣ • స৬ͷܦҢɾϙΤϜ౳͸ϒϩάʹͯ


    https://shinyorke.hatenablog.com/entry/it-really- could-happen
  27. JX௨৴ࣾ #ͱ͸ ؾʹͳΔํ͸ޙ΄ͲλΠϜͰʂ
 Corp: https://jxpress.net/ Twitter: @jxpress_corp

  28. #஥ؒืूத • αʔόʔαΠυɾϑϩϯτΤϯυɾػցֶश
 ৄ͘͠͸ https://jobs.jxpress.net/ • ߇͑ΊʹݴͬͯΊͬͪΌPythonͰ͢ʢ͜ͳΈʣ • Serverlessͱ͔Big Dataͱ͔ϝονϟ௅ઓͰ͖·͢

    • ॻ੶, IDE, ษڧձࢀՃඅ͸ձࣾෛ୲, #PyConJP εϙϯαʔଞ • ؾʹͳΔํ͸ੋඇ੠͔͚ͯͶʂ
  29. ͦΕͰ͸Αཱྀ͍Λ✈ PyLadies Tokyo͞Μӹʑͷ͝ൃలΛʂ Shinichi Nakagawa(Twitter/Facebook/etc… @shinyorke)

  30. ʲAppendixʳ࢖ͬͨ΋ͷҰཡ • σʔλ෼ੳ • Jupyter notebook / Jupyter Lab https://jupyter.org/

    • Pandas https://pandas.pydata.org/ • Plotly https://plot.ly/python/ • GIS • GeoPyʢGeocodingʣ https://geopy.readthedocs.io/en/stable/ • FoliumʢJupyter notebook಺஍ਤʣ https://python-visualization.github.io/folium/ • Kepler.glʢVisualizationʣ https://kepler.gl/ • ⚾ ໺ٿɹ˞͢΂ͯMLBͰ͢ • Baseball Databank https://github.com/chadwickbureau/baseballdatabank • Retrosheet https://github.com/chadwickbureau/retrosheet • Analyzing Baseball Data with Rʢॻ੶,༸ॻʣ https://www.amazon.co.jp/dp/B07KRNP2BB