Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Speaker Deck
PRO
Sign in
Sign up
for free
長旅は疲れるけど野球とPythonは好きだ / PyLadiesTokyo-5-years-LT
Shinichi Nakagawa
October 19, 2019
Programming
0
69
長旅は疲れるけど野球とPythonは好きだ / PyLadiesTokyo-5-years-LT
#SABRmetrics #Baseball #Python #GIS
https://pyladies-tokyo.connpass.com/event/145046/
Shinichi Nakagawa
October 19, 2019
Tweet
Share
More Decks by Shinichi Nakagawa
See All by Shinichi Nakagawa
shinyorke
0
80
shinyorke
1
370
shinyorke
6
5.9k
shinyorke
0
330
shinyorke
0
1.4k
shinyorke
0
3.5k
shinyorke
1
3.3k
shinyorke
0
220
shinyorke
0
2.8k
Other Decks in Programming
See All in Programming
martysuzuki
1
580
madai0517
1
210
tooppoo
0
210
kilometer
4
510
larsrh
0
110
boriswilhelms
0
180
hanasuke
1
690
satoshun
0
120
kimyan
3
520
suzukiot
0
300
temoki
3
230
yotuba088
2
600
Featured
See All Featured
chrislema
173
14k
searls
204
36k
addyosmani
311
21k
philnash
9
590
afnizarnur
176
14k
morganepeng
18
1.2k
robhawkes
52
2.8k
roundedbygravity
84
7.9k
ufuk
56
5.4k
jasonvnalue
81
8.1k
lara
172
9.6k
productmarketing
6
720
Transcript
ཱྀർΕΔ✈Ͱ⽁ʁ Shinichi Nakagawa a.k.a. @shinyorke PyLadies Tokyo 5पه೦ύʔςΟʔ
#PyLadiesTokyo 5प͓ΊͰͱ͏͍͟͝·͢ʂ ࠓ͜͏͓ͯ͠ॕ͍Ͱ͖ͯخ͍͠Ͱ͢ʢ5࿈ଓ5ճʣ
ࠓͷ͓ • ཱྀߦͱϝδϟʔϦʔάʢMLBʣ • GISʢҐஔใʣΛGeoPyͰૢΔ • ҠಈڑͱνʔϜͷύϑΥʔϚϯε
ʊਓਓਓਓਓਓਓਓਓਓʊ ʼɹಥવͷٿΫΠζɹʻ ʉY^Y^Y^Y^Y^Y^Y^Y^Yʉ
ɹ˞ײͰ͓͍͑ͩ͘͞ MLBͰ૯Ҡಈڑ͕Ұ൪͔ͬͨνʔϜͷmile? ※2018ɾ162ࢼ߹ͷूܭ, ݪଇถେ 1. 30,000milesະຬ 2. 30,000milesҎ্ 3. 33,400milesͪΐ͏Ͳʢʁʁʁʣ
ɹ˞ײͰ͓͍͑ͩ͘͞ MLBͰ૯Ҡಈڑ͕Ұ൪͔ͬͨνʔϜͷmile? ※2018ɾ162ࢼ߹ͷूܭ, ݪଇถେ 1. 30,000milesະຬ 2. 30,000milesҎ্ 3. 33,400milesͪΐ͏Ͳʢʁʁʁʣ
ʁʁʁʮͳΜͰʂ)BOTIJOʂؔͳ͍Ζʂʂʯ
ਖ਼ղ…ͷલʹ PythonͰҐஔใΛग़͠ ڑΛٻΊΔํ๏ʹ͍ͭͯ.
ࢸͬͯγϯϓϧͰ͢. 1. Geocodingͯ͠ٿͷҐஔใΛग़͢. ۩ମతʹٿ໊͔ΒҢܦΛग़͢. 2. 1.ͷσʔλΛݩʹ, ٿؒͷڑΛग़͢ 3. 2.ΛεέδϡʔϧͱJOIN, νʔϜ͝ͱʹूܭ,
CSVग़ྗ. ࠓ1.ͱ2.ΛPythonͰ͍͍ײ͡ʹ͢Δํ๏Λ. ※3.టष͍PandasՔۀͳͷͰࠓճઆ໌͠·ͤΜ
GeoPyΛ͍͜ͳͯ͠ Ґஔใڑग़͢
GeoPy • PythonͰGeocoding͢Δͱ͖ͷఆ൪ϥΠϒϥϦ • ෳͷΠϯλʔωοτਤʢGoogle, Azure, OSM, etc…ʣͷAPIΛಉ͡Α͏ͳίʔυͰPython͔Βѻ͑Δ • ެࣜυΩϡϝϯτ͕ৄ͍͠ͷͰਅࣅ͢Εେମ͍͚Δ
• https://geopy.readthedocs.io/en/stable/#
GeoPyͰٿ໊͔ΒGeocoding • MLBͷSean Lahman Databaseʹ ٿσʔλ͕͋Δʢͳ͓, Φʔϓϯσʔλʣ • ٿͷ໊લͱࢢ໊Λ࣋ͬͯΔͷͰ, ͔ͦͬΒ
Geocodingͯ͋͛͠ΕOK • શମͷ7ׂ͜ΕͰΠέͨ, Γख࡞ۀ(ry
ʲงғؾʳGeoPyͰGeocoding import csv import time from geopy.geocoders import Nominatim #
Geocoder(ͲͷαʔϏε͏͔)ࢦఆ from geopy.exc import GeocoderTimedOut from retry import retry # ࠓճOSMϕʔεͷͷΛ͏ geoLocator = Nominatim(user_agent='Baseball Radar24 / 0.1 shinyorke@example.com’) # Geocoding͍ͯ͠Δͱ͜Ζ. งғؾΛݟͯRetry @retry((GeocoderTimedOut, ), delay=5, backoff=2, max_delay=4) def get_location(name, alias): loc = geoLocator.geocode(name) if not loc: loc = geoLocator.geocode(alias) return loc # ٿ໊ΛGeocodingͰ͖ΔΑ͏ʹͪΐͬ͜ͱ͚ͩΫϨϯδϯά def park_name(name): return name.replace('I', '').replace('II', '').replace('III', '').replace('IV', '').strip() Nominatimͱ͍͏OSMσʔλͷAPIͰGeocoding ٿ໊geocodersʹؾʹೖΒΕΔΑ͏ʁʹͪΐͬͱ͚ͩΫϨϯδϯά
ʲงғؾʳGeoPyͰGeocoding # ͔ͬ͜Β࣮ߦ # ٿϦετ values = [] with open('./datasets/baseballdatabank/Parks.csv',
'r') as f: reader = csv.DictReader(f) for r in reader: values.append(r) # GeocodingΛͻͨ͢Β࣮ߦ locations = [] for park in values: loc = get_location(park_name(park['park.name']), park_name(park['park.alias'])) if loc: locations.append( { 'id': park['park.key'], 'name': park['park.name'], 'lat': loc.latitude, 'lng': loc.longitude, 'address': loc.address, 'state': park['state'], 'country': park['country'] } ) else: print('geo not found: ', park['park.name'], park['park.key']) # CSVʹॻ͖ࠐΈ fields = ['id', 'name', 'lat', 'lng', 'address', 'state', 'country'] with open('./datasets/parklist.csv', 'w') as f: writer = csv.DictWriter(f, fieldnames=fields) writer.writeheader() for loc in locations: writer.writerow(loc) CSVΛಡΈࠐΜͰͻͨ͢ΒGeocoding͢ΔʢલͷεϥΠυͷؔͰʣ ͜ͷลׂͱී௨ͷεΫϦϓτͩͬͨΓ͢ΔͷͰײత͔.
ʲงғؾʳGeoPyͰೋؒڑ # ڑΛग़͢ from geopy.distance import great_circle, geodesic def park2park_distance_datasets(self,
park_datasets: dict) -> list: values = [] for id1, park1 in park_datasets.items(): for id2, park2 in park_datasets.items(): if id1 == id2: continue park1_geo = (park1.get('lat'), park1.get('lng')) park2_geo = (park2.get('lat'), park2.get(‘lng')) # geodesic͕ଌઢ, great_circle͕େԁڑ values.append( { 'id': f"{id1}_{id2}", 'miles_geo': geodesic(park1_geo, park2_geo).mile 'miles_circle': great_circle(park1_geo, park2_geo).mile } ) return values geopy.distanceͷؔΛ͏, ଌઢ, େԁڑϝιου͕͍͔ͭ͘. ҾଌΓ͍ͨڑͷlat/lngೖͬͨtuple
ɹ˞࠶ܝɾೋͰ͢ MLBͰ૯Ҡಈڑ͕Ұ൪͔ͬͨνʔϜͷmile? ※2018ɾ162ࢼ߹ͷूܭ, ݪଇถେ 1. 30,000milesະຬ 2. 30,000milesҎ্ 3. 33,400milesͪΐ͏Ͳʢʁʁʁʣ
ʲʳ2.ʮ30,000mileҎ্ʯ 1Ґ͕40,000ϚΠϧ, 30Ґ͕20,000ϚΠϧͪΐ͍
ͪͳΈʹ্Ґ5νʔϜ 5νʔϜத5νʔϜ͕֤Ϧʔάͷ۠ ώϡʔετϯԕ͍ԕ͍γΞτϧɾΦʔΫϥϯυ૬ख͕ଟ͍
ʁʁʁʮཱྀ͔ͭΕΔͷͰʁʯ ϝδϟʔϦʔά֤νʔϜઐ༻ػʢνϟʔλʔػʣͰҠಈ͍ͯ͠Δ ͱ͍͑, ؒ40,000mileҎ্ͷҠಈͬͯπϥϛͳͷͰʁʁ
ݕূํ๏ • ؒͷҠಈڑͱओཁࢦඪͷϚτϦΫεΛग़͢ • উ • ಘࣦࠩ • ༧ଌউʢϐλΰϥεউʣ ※ಘࣦ͔ΒউΛग़͢
• ͳʹ͔ۙͦ͏ͳͷ͕ݟ͔ͭͬͨΒϥοΩʔ • Ռ࣮ͨͯ͠ࡍ…ʂʁ
ʲਤʳҠಈڑͱ֤ࢦඪͷϚτϦΫε ࢦඪউ͓Αͼϐλΰϥεউ, ؔ͋ͬͨΒ૬ؔ͋Δͣ ˞ϐλΰϥεউɿಘࣦࠩΛͬͨ༧ଌউϞσϧ
ʁʁʁʮ͓͔͍͠ͷ͓લͷҠಈڑͩΑʯ ૬ؔΛग़͢·Ͱͳ͍݁Ռʹʢ͑ʣ
ऑ͍ɾਏ͍ʹڑؔͳ͍ ͍,ͳΜͱͳͦ͘Μͳ༧ײͨ͠ΜͰ͚͢ͲͶ()
ͪͳΈʹҬΛՄࢹԽ͢Δͱ ຊྥଧ͕ͨ͘͞Μग़Δͱ͜Ζ,ೋྥଧʢҎԼಉจʣ Kepler.glʹCSVΛ৯ΘͤΔͱ͔͍͍ͬ͜ՄࢹԽ͕ʂ
͓ͬͱ ࣗݾհΕͯͨ :ukkari:
Who am I?ʢ͓લ୭Αʣ • Shinichi Nakagawa(@shinyorke) • ͔ͭͯٿΤϯδχΞΛࣄʹͯͨ͠ਓ • ઌ݄·ͰɿʮϓϩʯͷٿΤϯδχΞ
• ࠓ݄͔ΒɿʮੜʯͷٿΤϯδχΞʢʹ෮ؼʣ • Pythonࣗ͘͘शࣨʢ#rettypyʣΦʔΨφΠβʔ • Web, σʔλαΠΤϯε, Opsʹ⽁ΛPythonͰΔਓ
JX௨৴ࣾʢʹస৬ͯ͠·ͨ͠ʣ • ࠓ݄͔ΒʢגʣJX௨৴ࣾͷSenior Engineerʹ • σʔλج൫ΛθϩϕʔεͰ্ཱͪ͛Δࣄ ʢଞ, Pythonؔ࿈ͷ͋Ε͜Εɾ࠾༻ใͳͲʣ • స৬ͷܦҢɾϙΤϜϒϩάʹͯ
https://shinyorke.hatenablog.com/entry/it-really- could-happen
JX௨৴ࣾ #ͱ ؾʹͳΔํޙ΄ͲλΠϜͰʂ Corp: https://jxpress.net/ Twitter: @jxpress_corp
#ؒืूத • αʔόʔαΠυɾϑϩϯτΤϯυɾػցֶश ৄ͘͠ https://jobs.jxpress.net/ • ߇͑ΊʹݴͬͯΊͬͪΌPythonͰ͢ʢ͜ͳΈʣ • Serverlessͱ͔Big Dataͱ͔ϝονϟઓͰ͖·͢
• ॻ੶, IDE, ษڧձࢀՃඅձࣾෛ୲, #PyConJP εϙϯαʔଞ • ؾʹͳΔํੋඇ͔͚ͯͶʂ
ͦΕͰΑཱྀ͍Λ✈ PyLadies Tokyo͞Μӹʑͷ͝ൃలΛʂ Shinichi Nakagawa(Twitter/Facebook/etc… @shinyorke)
ʲAppendixʳͬͨͷҰཡ • σʔλੳ • Jupyter notebook / Jupyter Lab https://jupyter.org/
• Pandas https://pandas.pydata.org/ • Plotly https://plot.ly/python/ • GIS • GeoPyʢGeocodingʣ https://geopy.readthedocs.io/en/stable/ • FoliumʢJupyter notebookਤʣ https://python-visualization.github.io/folium/ • Kepler.glʢVisualizationʣ https://kepler.gl/ • ⚾ ٿɹ˞ͯ͢MLBͰ͢ • Baseball Databank https://github.com/chadwickbureau/baseballdatabank • Retrosheet https://github.com/chadwickbureau/retrosheet • Analyzing Baseball Data with Rʢॻ੶,༸ॻʣ https://www.amazon.co.jp/dp/B07KRNP2BB