Slide 1

Slide 1 text

௕ཱྀ͸ർΕΔ✈Ͱ͸⽁͸ʁ Shinichi Nakagawa a.k.a. @shinyorke PyLadies Tokyo 5प೥ه೦ύʔςΟʔ

Slide 2

Slide 2 text

#PyLadiesTokyo 5प೥͓ΊͰͱ͏͍͟͝·͢ʂ ࠓ೥΋͜͏͓ͯ͠ॕ͍Ͱ͖ͯخ͍͠Ͱ͢ʢ5೥࿈ଓ5ճ໨ʣ

Slide 3

Slide 3 text

ࠓ೔ͷ͓࿩ • ཱྀߦͱϝδϟʔϦʔάʢMLBʣ • GISʢҐஔ৘ใʣΛGeoPyͰૢΔ • Ҡಈڑ཭ͱνʔϜͷύϑΥʔϚϯε

Slide 4

Slide 4 text

ʊਓਓਓਓਓਓਓਓਓਓʊ ʼɹಥવͷ໺ٿΫΠζɹʻ ʉY^Y^Y^Y^Y^Y^Y^Y^Yʉ

Slide 5

Slide 5 text

໰୊ɹ˞௚ײͰ͓౴͍͑ͩ͘͞ MLBͰ૯Ҡಈڑ཭͕Ұ൪௕͔ͬͨνʔϜͷmile͸?
 ※2018೥ɾ162ࢼ߹෼ͷूܭ, ݪଇ๺ถେ཮಺ 1. 30,000milesະຬ 2. 30,000milesҎ্ 3. 33,400milesͪΐ͏Ͳʢʁʁʁʣ

Slide 6

Slide 6 text

໰୊ɹ˞௚ײͰ͓౴͍͑ͩ͘͞ MLBͰ૯Ҡಈڑ཭͕Ұ൪௕͔ͬͨνʔϜͷmile͸?
 ※2018೥ɾ162ࢼ߹෼ͷूܭ, ݪଇ๺ถେ཮಺ 1. 30,000milesະຬ 2. 30,000milesҎ্ 3. 33,400milesͪΐ͏Ͳʢʁʁʁʣ ʁʁʁʮͳΜͰ΍ʂ)BOTIJOʂؔ܎ͳ͍΍Ζʂʂʯ

Slide 7

Slide 7 text

ਖ਼ղ…ͷલʹ PythonͰҐஔ৘ใΛग़͠ ڑ཭ΛٻΊΔํ๏ʹ͍ͭͯ.

Slide 8

Slide 8 text

ࢸͬͯγϯϓϧͰ͢. 1. Geocodingͯ͠ٿ৔ͷҐஔ৘ใΛग़͢.
 ۩ମతʹ͸ٿ৔໊͔ΒҢ౓ܦ౓Λग़͢. 2. 1.ͷσʔλΛݩʹ, ٿ৔ؒͷڑ཭Λग़͢ 3. 2.ΛεέδϡʔϧͱJOIN, νʔϜ͝ͱʹूܭ, CSVग़ྗ. ࠓ೔͸1.ͱ2.ΛPythonͰ͍͍ײ͡ʹ͢Δํ๏Λ.
 ※3.͸టष͍PandasՔۀͳͷͰࠓճ͸આ໌͠·ͤΜ

Slide 9

Slide 9 text

GeoPyΛ࢖͍͜ͳͯ͠ Ґஔ৘ใ΋ڑ཭΋ग़͢

Slide 10

Slide 10 text

GeoPy • PythonͰGeocoding͢Δͱ͖ͷఆ൪ϥΠϒϥϦ • ෳ਺ͷΠϯλʔωοτ஍ਤʢGoogle, Azure, OSM, etc…ʣͷAPIΛಉ͡Α͏ͳίʔυͰPython͔Βѻ͑Δ • ެࣜυΩϡϝϯτ͕ৄ͍͠ͷͰਅࣅ͢Ε͹େମ͍͚Δ • https://geopy.readthedocs.io/en/stable/#

Slide 11

Slide 11 text

GeoPyͰٿ৔໊͔ΒGeocoding • MLBͷSean Lahman Databaseʹ
 ٿ৔σʔλ͕͋Δʢͳ͓, Φʔϓϯσʔλʣ • ٿ৔ͷ໊લͱ౎ࢢ໊Λ࣋ͬͯΔͷͰ, ͔ͦͬΒ Geocodingͯ͋͛͠Ε͹OK • શମͷ7ׂ͸͜ΕͰΠέͨ, ࢒Γ͸ख࡞ۀ(ry

Slide 12

Slide 12 text

ʲงғؾʳGeoPyͰGeocoding import csv import time from geopy.geocoders import Nominatim # Geocoder(ͲͷαʔϏε࢖͏͔)ࢦఆ from geopy.exc import GeocoderTimedOut from retry import retry # ࠓճ͸OSMϕʔεͷ΋ͷΛ࢖͏ geoLocator = Nominatim(user_agent='Baseball Radar24 / 0.1 [email protected]’) # Geocoding͍ͯ͠Δͱ͜Ζ. งғؾΛݟͯRetry @retry((GeocoderTimedOut, ), delay=5, backoff=2, max_delay=4) def get_location(name, alias): loc = geoLocator.geocode(name) if not loc: loc = geoLocator.geocode(alias) return loc # ٿ৔໊ΛGeocodingͰ͖ΔΑ͏ʹͪΐͬ͜ͱ͚ͩΫϨϯδϯά def park_name(name): return name.replace('I', '').replace('II', '').replace('III', '').replace('IV', '').strip() Nominatimͱ͍͏OSMσʔλͷAPIͰGeocoding ٿ৔໊͸geocodersʹؾʹೖΒΕΔΑ͏ʁʹͪΐͬͱ͚ͩΫϨϯδϯά

Slide 13

Slide 13 text

ʲงғؾʳGeoPyͰGeocoding # ͔ͬ͜Β࣮ߦ # ٿ৔Ϧετ values = [] with open('./datasets/baseballdatabank/Parks.csv', 'r') as f: reader = csv.DictReader(f) for r in reader: values.append(r) # GeocodingΛͻͨ͢Β࣮ߦ locations = [] for park in values: loc = get_location(park_name(park['park.name']), park_name(park['park.alias'])) if loc: locations.append( { 'id': park['park.key'], 'name': park['park.name'], 'lat': loc.latitude, 'lng': loc.longitude, 'address': loc.address, 'state': park['state'], 'country': park['country'] } ) else: print('geo not found: ', park['park.name'], park['park.key']) # CSVʹॻ͖ࠐΈ fields = ['id', 'name', 'lat', 'lng', 'address', 'state', 'country'] with open('./datasets/parklist.csv', 'w') as f: writer = csv.DictWriter(f, fieldnames=fields) writer.writeheader() for loc in locations: writer.writerow(loc) CSVΛಡΈࠐΜͰͻͨ͢ΒGeocoding͢ΔʢલͷεϥΠυͷؔ਺Ͱʣ ͜ͷล͸ׂͱී௨ͷεΫϦϓτͩͬͨΓ͢ΔͷͰ௚ײత͔΋.

Slide 14

Slide 14 text

ʲงғؾʳGeoPyͰೋ఺ؒڑ཭ # ڑ཭Λग़͢ from geopy.distance import great_circle, geodesic def park2park_distance_datasets(self, park_datasets: dict) -> list: values = [] for id1, park1 in park_datasets.items(): for id2, park2 in park_datasets.items(): if id1 == id2: continue park1_geo = (park1.get('lat'), park1.get('lng')) park2_geo = (park2.get('lat'), park2.get(‘lng')) # geodesic͕ଌ஍ઢ, great_circle͕େԁڑ཭ values.append( { 'id': f"{id1}_{id2}", 'miles_geo': geodesic(park1_geo, park2_geo).mile 'miles_circle': great_circle(park1_geo, park2_geo).mile } ) return values geopy.distanceͷؔ਺Λ࢖͏, ଌ஍ઢ, େԁڑ཭౳ϝιου͕͍͔ͭ͘. Ҿ਺͸ଌΓ͍ͨڑ཭ͷlat/lngೖͬͨtuple

Slide 15

Slide 15 text

໰୊ɹ˞࠶ܝɾೋ୒Ͱ͢ MLBͰ૯Ҡಈڑ཭͕Ұ൪௕͔ͬͨνʔϜͷmile͸?
 ※2018೥ɾ162ࢼ߹෼ͷूܭ, ݪଇ๺ถେ཮಺ 1. 30,000milesະຬ 2. 30,000milesҎ্ 3. 33,400milesͪΐ͏Ͳʢʁʁʁʣ

Slide 16

Slide 16 text

ʲ౴ʳ2.ʮ30,000mileҎ্ʯ 1Ґ͕40,000ϚΠϧ, 30Ґ͕20,000ϚΠϧͪΐ͍

Slide 17

Slide 17 text

ͪͳΈʹ্Ґ5νʔϜ 5νʔϜத5νʔϜ͕֤Ϧʔάͷ੢஍۠ ώϡʔετϯ͸ԕ͍ԕ͍γΞτϧɾΦʔΫϥϯυ૬ख͕ଟ͍

Slide 18

Slide 18 text

ʁʁʁʮ௕ཱྀ͸͔ͭΕΔͷͰ͸ʁʯ ϝδϟʔϦʔά͸֤νʔϜઐ༻ػʢνϟʔλʔػʣͰҠಈ͍ͯ͠Δ ͱ͸͍͑, ೥ؒ40,000mileҎ্ͷҠಈͬͯπϥϛͳͷͰ͸ʁʁ

Slide 19

Slide 19 text

ݕূํ๏ • ೥ؒͷҠಈڑ཭ͱओཁࢦඪͷϚτϦΫεΛग़͢ • উ཰ • ಘࣦ఺ࠩ • ༧ଌউ཰ʢϐλΰϥεউ཰ʣ ※ಘࣦ఺͔Βউ཰Λग़͢ • ͳʹ͔ۙͦ͏ͳ΋ͷ͕ݟ͔ͭͬͨΒϥοΩʔ • Ռ࣮ͨͯ͠ࡍ͸…ʂʁ

Slide 20

Slide 20 text

ʲਤʳҠಈڑ཭ͱ֤ࢦඪͷϚτϦΫε ࢦඪ͸উ཰͓Αͼϐλΰϥεউ཰, ؔ܎͋ͬͨΒ૬ؔ͋Δ͸ͣ
 ˞ϐλΰϥεউ཰ɿಘࣦ఺ࠩΛ࢖ͬͨ༧ଌউ཰Ϟσϧ

Slide 21

Slide 21 text

ʁʁʁʮ͓͔͍͠ͷ͸͓લͷҠಈڑ཭ͩΑʯ ૬ؔ܎਺Λग़͢·Ͱ΋ͳ͍݁Ռʹʢ਒͑ʣ

Slide 22

Slide 22 text

ऑ͍ɾਏ͍ʹڑ཭͸ؔ܎ͳ͍ ͍΍,ͳΜͱͳͦ͘Μͳ༧ײ͸ͨ͠ΜͰ͚͢ͲͶ()

Slide 23

Slide 23 text

ͪͳΈʹ஍ҬΛՄࢹԽ͢Δͱ ຊྥଧ͕ͨ͘͞Μग़Δͱ͜Ζ,ೋྥଧʢҎԼಉจʣ Kepler.glʹCSVΛ৯ΘͤΔͱ͔͍͍ͬ͜ՄࢹԽ͕ʂ

Slide 24

Slide 24 text

͓ͬͱ ࣗݾ঺հ๨Εͯͨ :ukkari:

Slide 25

Slide 25 text

Who am I?ʢ͓લ୭Αʣ • Shinichi Nakagawa(@shinyorke) • ͔ͭͯ໺ٿΤϯδχΞΛ࢓ࣄʹͯͨ͠ਓ • ઌ݄·Ͱɿʮϓϩʯͷ໺ٿΤϯδχΞ • ࠓ݄͔Βɿʮ໺ੜʯͷ໺ٿΤϯδχΞʢʹ෮ؼʣ • Python΋͘΋ࣗ͘शࣨʢ#rettypyʣΦʔΨφΠβʔ • Web, σʔλαΠΤϯε, Opsʹ⽁ΛPythonͰ΍Δਓ

Slide 26

Slide 26 text

JX௨৴ࣾʢʹస৬ͯ͠·ͨ͠ʣ • ࠓ݄͔ΒʢגʣJX௨৴ࣾͷSenior Engineerʹ • σʔλج൫ΛθϩϕʔεͰ্ཱͪ͛Δ࢓ࣄ
 ʢଞ, Pythonؔ࿈ͷ͋Ε͜Εɾ࠾༻޿ใͳͲʣ • స৬ͷܦҢɾϙΤϜ౳͸ϒϩάʹͯ
 https://shinyorke.hatenablog.com/entry/it-really- could-happen

Slide 27

Slide 27 text

JX௨৴ࣾ #ͱ͸ ؾʹͳΔํ͸ޙ΄ͲλΠϜͰʂ
 Corp: https://jxpress.net/ Twitter: @jxpress_corp

Slide 28

Slide 28 text

#஥ؒืूத • αʔόʔαΠυɾϑϩϯτΤϯυɾػցֶश
 ৄ͘͠͸ https://jobs.jxpress.net/ • ߇͑ΊʹݴͬͯΊͬͪΌPythonͰ͢ʢ͜ͳΈʣ • Serverlessͱ͔Big Dataͱ͔ϝονϟ௅ઓͰ͖·͢ • ॻ੶, IDE, ษڧձࢀՃඅ͸ձࣾෛ୲, #PyConJP εϙϯαʔଞ • ؾʹͳΔํ͸ੋඇ੠͔͚ͯͶʂ

Slide 29

Slide 29 text

ͦΕͰ͸Αཱྀ͍Λ✈ PyLadies Tokyo͞Μӹʑͷ͝ൃలΛʂ Shinichi Nakagawa(Twitter/Facebook/etc… @shinyorke)

Slide 30

Slide 30 text

ʲAppendixʳ࢖ͬͨ΋ͷҰཡ • σʔλ෼ੳ • Jupyter notebook / Jupyter Lab https://jupyter.org/ • Pandas https://pandas.pydata.org/ • Plotly https://plot.ly/python/ • GIS • GeoPyʢGeocodingʣ https://geopy.readthedocs.io/en/stable/ • FoliumʢJupyter notebook಺஍ਤʣ https://python-visualization.github.io/folium/ • Kepler.glʢVisualizationʣ https://kepler.gl/ • ⚾ ໺ٿɹ˞͢΂ͯMLBͰ͢ • Baseball Databank https://github.com/chadwickbureau/baseballdatabank • Retrosheet https://github.com/chadwickbureau/retrosheet • Analyzing Baseball Data with Rʢॻ੶,༸ॻʣ https://www.amazon.co.jp/dp/B07KRNP2BB