Upgrade to Pro — share decks privately, control downloads, hide ads and more …

長旅は疲れるけど野球とPythonは好きだ / PyLadiesTokyo-5-years-LT

長旅は疲れるけど野球とPythonは好きだ / PyLadiesTokyo-5-years-LT

#SABRmetrics #Baseball #Python #GIS

https://pyladies-tokyo.connpass.com/event/145046/

Shinichi Nakagawa

October 19, 2019
Tweet

More Decks by Shinichi Nakagawa

Other Decks in Programming

Transcript

  1. ʲงғؾʳGeoPyͰGeocoding import csv import time from geopy.geocoders import Nominatim #

    Geocoder(ͲͷαʔϏε࢖͏͔)ࢦఆ from geopy.exc import GeocoderTimedOut from retry import retry # ࠓճ͸OSMϕʔεͷ΋ͷΛ࢖͏ geoLocator = Nominatim(user_agent='Baseball Radar24 / 0.1 [email protected]’) # Geocoding͍ͯ͠Δͱ͜Ζ. งғؾΛݟͯRetry @retry((GeocoderTimedOut, ), delay=5, backoff=2, max_delay=4) def get_location(name, alias): loc = geoLocator.geocode(name) if not loc: loc = geoLocator.geocode(alias) return loc # ٿ৔໊ΛGeocodingͰ͖ΔΑ͏ʹͪΐͬ͜ͱ͚ͩΫϨϯδϯά def park_name(name): return name.replace('I', '').replace('II', '').replace('III', '').replace('IV', '').strip() Nominatimͱ͍͏OSMσʔλͷAPIͰGeocoding ٿ৔໊͸geocodersʹؾʹೖΒΕΔΑ͏ʁʹͪΐͬͱ͚ͩΫϨϯδϯά
  2. ʲงғؾʳGeoPyͰGeocoding # ͔ͬ͜Β࣮ߦ # ٿ৔Ϧετ values = [] with open('./datasets/baseballdatabank/Parks.csv',

    'r') as f: reader = csv.DictReader(f) for r in reader: values.append(r) # GeocodingΛͻͨ͢Β࣮ߦ locations = [] for park in values: loc = get_location(park_name(park['park.name']), park_name(park['park.alias'])) if loc: locations.append( { 'id': park['park.key'], 'name': park['park.name'], 'lat': loc.latitude, 'lng': loc.longitude, 'address': loc.address, 'state': park['state'], 'country': park['country'] } ) else: print('geo not found: ', park['park.name'], park['park.key']) # CSVʹॻ͖ࠐΈ fields = ['id', 'name', 'lat', 'lng', 'address', 'state', 'country'] with open('./datasets/parklist.csv', 'w') as f: writer = csv.DictWriter(f, fieldnames=fields) writer.writeheader() for loc in locations: writer.writerow(loc) CSVΛಡΈࠐΜͰͻͨ͢ΒGeocoding͢ΔʢલͷεϥΠυͷؔ਺Ͱʣ ͜ͷล͸ׂͱී௨ͷεΫϦϓτͩͬͨΓ͢ΔͷͰ௚ײత͔΋.
  3. ʲงғؾʳGeoPyͰೋ఺ؒڑ཭ # ڑ཭Λग़͢ from geopy.distance import great_circle, geodesic def park2park_distance_datasets(self,

    park_datasets: dict) -> list: values = [] for id1, park1 in park_datasets.items(): for id2, park2 in park_datasets.items(): if id1 == id2: continue park1_geo = (park1.get('lat'), park1.get('lng')) park2_geo = (park2.get('lat'), park2.get(‘lng')) # geodesic͕ଌ஍ઢ, great_circle͕େԁڑ཭ values.append( { 'id': f"{id1}_{id2}", 'miles_geo': geodesic(park1_geo, park2_geo).mile 'miles_circle': great_circle(park1_geo, park2_geo).mile } ) return values geopy.distanceͷؔ਺Λ࢖͏, ଌ஍ઢ, େԁڑ཭౳ϝιου͕͍͔ͭ͘. Ҿ਺͸ଌΓ͍ͨڑ཭ͷlat/lngೖͬͨtuple
  4. Who am I?ʢ͓લ୭Αʣ • Shinichi Nakagawa(@shinyorke) • ͔ͭͯ໺ٿΤϯδχΞΛ࢓ࣄʹͯͨ͠ਓ • ઌ݄·Ͱɿʮϓϩʯͷ໺ٿΤϯδχΞ

    • ࠓ݄͔Βɿʮ໺ੜʯͷ໺ٿΤϯδχΞʢʹ෮ؼʣ • Python΋͘΋ࣗ͘शࣨʢ#rettypyʣΦʔΨφΠβʔ • Web, σʔλαΠΤϯε, Opsʹ⽁ΛPythonͰ΍Δਓ
  5. ʲAppendixʳ࢖ͬͨ΋ͷҰཡ • σʔλ෼ੳ • Jupyter notebook / Jupyter Lab https://jupyter.org/

    • Pandas https://pandas.pydata.org/ • Plotly https://plot.ly/python/ • GIS • GeoPyʢGeocodingʣ https://geopy.readthedocs.io/en/stable/ • FoliumʢJupyter notebook಺஍ਤʣ https://python-visualization.github.io/folium/ • Kepler.glʢVisualizationʣ https://kepler.gl/ • ⚾ ໺ٿɹ˞͢΂ͯMLBͰ͢ • Baseball Databank https://github.com/chadwickbureau/baseballdatabank • Retrosheet https://github.com/chadwickbureau/retrosheet • Analyzing Baseball Data with Rʢॻ੶,༸ॻʣ https://www.amazon.co.jp/dp/B07KRNP2BB