Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
長旅は疲れるけど野球とPythonは好きだ / PyLadiesTokyo-5-years-LT
Search
Shinichi Nakagawa
PRO
October 19, 2019
Programming
100
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
長旅は疲れるけど野球とPythonは好きだ / PyLadiesTokyo-5-years-LT
#SABRmetrics #Baseball #Python #GIS
https://pyladies-tokyo.connpass.com/event/145046/
Shinichi Nakagawa
PRO
October 19, 2019
More Decks by Shinichi Nakagawa
See All by Shinichi Nakagawa
野球解説AI Agentを開発してみた - 2026/02/27 LayerX社内LT会資料
shinyorke
PRO
0
480
WBCの解説は生成AIにやらせよう - 生成AIで野球解説者AI Agentを実現する / Baseball Commentator AI Agent for Gemini
shinyorke
PRO
1
450
自らを強いエンジニアにするための3つの習慣 2025/ Fitter happier more productive
shinyorke
PRO
0
300
生成AI時代におけるSREの進化とキャリア戦略 / Building an Embedded SRE team and my career
shinyorke
PRO
0
160
生成AIを活用した野球データ分析 - メジャーリーグ編 / Baseball Analytics for Gen AI
shinyorke
PRO
1
6.3k
ゼロから始めるSREの事業貢献 - 生成AI時代のSRE成長戦略と実践 / Starting SRE from Day One
shinyorke
PRO
3
7.9k
AI・LLM事業部のSREとタスクの自動運転
shinyorke
PRO
0
560
実践Dash - 手を抜きながら本気で作るデータApplicationの基本と応用 / Dash for Python and Baseball
shinyorke
PRO
2
4.5k
Terraform, GitHub Actions, Cloud Buildでデータ基盤をProvisioningする / Data Platform provisioning for Google Cloud and Terraform
shinyorke
PRO
2
3.7k
Other Decks in Programming
See All in Programming
Lessons from Spec-Driven Development
simas
PRO
0
180
「AIで開発し、AIを届ける」をEvalでつなぐ 〜AIネイティブに始めるプロダクト開発の実践〜 / Connecting "Develop with AI, deliver AI" with Eval
rkaga
4
4.9k
JavaDoc 再入門
nagise
0
320
「エンジニアインターン、どうやって取った?」準備のリアルを語るLT会 Progate BAR
akiomatic
0
130
Javaの型とAI時代に型が大事な理由 / java types and type in AI era
kishida
2
130
Webフレームワークの ベンチマークについて
yusukebe
0
160
Composerを使ったサプライチェーン攻撃の様子を眺めてみる #phpstudy
o0h
PRO
2
240
AIだと陥りがちなJakarta EE最新技術への移行時の落とし穴と解決策
tnagao7
0
100
Observability in Practice:Grafana 與 Edge Device SRE 的那些事
blueswen
0
160
過去最大のMCPアップデート! 2026-07-28 RC版の謎に迫る
licux
6
260
Contextとはなにか
chiroruxx
1
300
Vite+ Unified Toolchain for the Web
naokihaba
0
280
Featured
See All Featured
Building a Modern Day E-commerce SEO Strategy
aleyda
45
9.1k
Large-scale JavaScript Application Architecture
addyosmani
515
110k
GitHub's CSS Performance
jonrohan
1033
470k
The SEO identity crisis: Don't let AI make you average
varn
0
490
コードの90%をAIが書く世界で何が待っているのか / What awaits us in a world where 90% of the code is written by AI
rkaga
62
44k
Navigating the moral maze — ethical principles for Al-driven product design
skipperchong
2
390
Amusing Abliteration
ianozsvald
1
200
How Software Deployment tools have changed in the past 20 years
geshan
0
34k
Agile Leadership in an Agile Organization
kimpetersen
PRO
0
160
Impact Scores and Hybrid Strategies: The future of link building
tamaranovitovic
0
310
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
9
1.4k
Test your architecture with Archunit
thirion
1
2.3k
Transcript
ཱྀർΕΔ✈Ͱ⽁ʁ Shinichi Nakagawa a.k.a. @shinyorke PyLadies Tokyo 5पه೦ύʔςΟʔ
#PyLadiesTokyo 5प͓ΊͰͱ͏͍͟͝·͢ʂ ࠓ͜͏͓ͯ͠ॕ͍Ͱ͖ͯخ͍͠Ͱ͢ʢ5࿈ଓ5ճʣ
ࠓͷ͓ • ཱྀߦͱϝδϟʔϦʔάʢMLBʣ • GISʢҐஔใʣΛGeoPyͰૢΔ • ҠಈڑͱνʔϜͷύϑΥʔϚϯε
ʊਓਓਓਓਓਓਓਓਓਓʊ ʼɹಥવͷٿΫΠζɹʻ ʉY^Y^Y^Y^Y^Y^Y^Y^Yʉ
ɹ˞ײͰ͓͍͑ͩ͘͞ MLBͰ૯Ҡಈڑ͕Ұ൪͔ͬͨνʔϜͷmile? ※2018ɾ162ࢼ߹ͷूܭ, ݪଇถେ 1. 30,000milesະຬ 2. 30,000milesҎ্ 3. 33,400milesͪΐ͏Ͳʢʁʁʁʣ
ɹ˞ײͰ͓͍͑ͩ͘͞ MLBͰ૯Ҡಈڑ͕Ұ൪͔ͬͨνʔϜͷmile? ※2018ɾ162ࢼ߹ͷूܭ, ݪଇถେ 1. 30,000milesະຬ 2. 30,000milesҎ্ 3. 33,400milesͪΐ͏Ͳʢʁʁʁʣ
ʁʁʁʮͳΜͰʂ)BOTIJOʂؔͳ͍Ζʂʂʯ
ਖ਼ղ…ͷલʹ PythonͰҐஔใΛग़͠ ڑΛٻΊΔํ๏ʹ͍ͭͯ.
ࢸͬͯγϯϓϧͰ͢. 1. Geocodingͯ͠ٿͷҐஔใΛग़͢. ۩ମతʹٿ໊͔ΒҢܦΛग़͢. 2. 1.ͷσʔλΛݩʹ, ٿؒͷڑΛग़͢ 3. 2.ΛεέδϡʔϧͱJOIN, νʔϜ͝ͱʹूܭ,
CSVग़ྗ. ࠓ1.ͱ2.ΛPythonͰ͍͍ײ͡ʹ͢Δํ๏Λ. ※3.టष͍PandasՔۀͳͷͰࠓճઆ໌͠·ͤΜ
GeoPyΛ͍͜ͳͯ͠ Ґஔใڑग़͢
GeoPy • PythonͰGeocoding͢Δͱ͖ͷఆ൪ϥΠϒϥϦ • ෳͷΠϯλʔωοτਤʢGoogle, Azure, OSM, etc…ʣͷAPIΛಉ͡Α͏ͳίʔυͰPython͔Βѻ͑Δ • ެࣜυΩϡϝϯτ͕ৄ͍͠ͷͰਅࣅ͢Εେମ͍͚Δ
• https://geopy.readthedocs.io/en/stable/#
GeoPyͰٿ໊͔ΒGeocoding • MLBͷSean Lahman Databaseʹ ٿσʔλ͕͋Δʢͳ͓, Φʔϓϯσʔλʣ • ٿͷ໊લͱࢢ໊Λ࣋ͬͯΔͷͰ, ͔ͦͬΒ
Geocodingͯ͋͛͠ΕOK • શମͷ7ׂ͜ΕͰΠέͨ, Γख࡞ۀ(ry
ʲงғؾʳGeoPyͰGeocoding import csv import time from geopy.geocoders import Nominatim #
Geocoder(ͲͷαʔϏε͏͔)ࢦఆ from geopy.exc import GeocoderTimedOut from retry import retry # ࠓճOSMϕʔεͷͷΛ͏ geoLocator = Nominatim(user_agent='Baseball Radar24 / 0.1
[email protected]
’) # Geocoding͍ͯ͠Δͱ͜Ζ. งғؾΛݟͯRetry @retry((GeocoderTimedOut, ), delay=5, backoff=2, max_delay=4) def get_location(name, alias): loc = geoLocator.geocode(name) if not loc: loc = geoLocator.geocode(alias) return loc # ٿ໊ΛGeocodingͰ͖ΔΑ͏ʹͪΐͬ͜ͱ͚ͩΫϨϯδϯά def park_name(name): return name.replace('I', '').replace('II', '').replace('III', '').replace('IV', '').strip() Nominatimͱ͍͏OSMσʔλͷAPIͰGeocoding ٿ໊geocodersʹؾʹೖΒΕΔΑ͏ʁʹͪΐͬͱ͚ͩΫϨϯδϯά
ʲงғؾʳGeoPyͰGeocoding # ͔ͬ͜Β࣮ߦ # ٿϦετ values = [] with open('./datasets/baseballdatabank/Parks.csv',
'r') as f: reader = csv.DictReader(f) for r in reader: values.append(r) # GeocodingΛͻͨ͢Β࣮ߦ locations = [] for park in values: loc = get_location(park_name(park['park.name']), park_name(park['park.alias'])) if loc: locations.append( { 'id': park['park.key'], 'name': park['park.name'], 'lat': loc.latitude, 'lng': loc.longitude, 'address': loc.address, 'state': park['state'], 'country': park['country'] } ) else: print('geo not found: ', park['park.name'], park['park.key']) # CSVʹॻ͖ࠐΈ fields = ['id', 'name', 'lat', 'lng', 'address', 'state', 'country'] with open('./datasets/parklist.csv', 'w') as f: writer = csv.DictWriter(f, fieldnames=fields) writer.writeheader() for loc in locations: writer.writerow(loc) CSVΛಡΈࠐΜͰͻͨ͢ΒGeocoding͢ΔʢલͷεϥΠυͷؔͰʣ ͜ͷลׂͱී௨ͷεΫϦϓτͩͬͨΓ͢ΔͷͰײత͔.
ʲงғؾʳGeoPyͰೋؒڑ # ڑΛग़͢ from geopy.distance import great_circle, geodesic def park2park_distance_datasets(self,
park_datasets: dict) -> list: values = [] for id1, park1 in park_datasets.items(): for id2, park2 in park_datasets.items(): if id1 == id2: continue park1_geo = (park1.get('lat'), park1.get('lng')) park2_geo = (park2.get('lat'), park2.get(‘lng')) # geodesic͕ଌઢ, great_circle͕େԁڑ values.append( { 'id': f"{id1}_{id2}", 'miles_geo': geodesic(park1_geo, park2_geo).mile 'miles_circle': great_circle(park1_geo, park2_geo).mile } ) return values geopy.distanceͷؔΛ͏, ଌઢ, େԁڑϝιου͕͍͔ͭ͘. ҾଌΓ͍ͨڑͷlat/lngೖͬͨtuple
ɹ˞࠶ܝɾೋͰ͢ MLBͰ૯Ҡಈڑ͕Ұ൪͔ͬͨνʔϜͷmile? ※2018ɾ162ࢼ߹ͷूܭ, ݪଇถେ 1. 30,000milesະຬ 2. 30,000milesҎ্ 3. 33,400milesͪΐ͏Ͳʢʁʁʁʣ
ʲʳ2.ʮ30,000mileҎ্ʯ 1Ґ͕40,000ϚΠϧ, 30Ґ͕20,000ϚΠϧͪΐ͍
ͪͳΈʹ্Ґ5νʔϜ 5νʔϜத5νʔϜ͕֤Ϧʔάͷ۠ ώϡʔετϯԕ͍ԕ͍γΞτϧɾΦʔΫϥϯυ૬ख͕ଟ͍
ʁʁʁʮཱྀ͔ͭΕΔͷͰʁʯ ϝδϟʔϦʔά֤νʔϜઐ༻ػʢνϟʔλʔػʣͰҠಈ͍ͯ͠Δ ͱ͍͑, ؒ40,000mileҎ্ͷҠಈͬͯπϥϛͳͷͰʁʁ
ݕূํ๏ • ؒͷҠಈڑͱओཁࢦඪͷϚτϦΫεΛग़͢ • উ • ಘࣦࠩ • ༧ଌউʢϐλΰϥεউʣ ※ಘࣦ͔ΒউΛग़͢
• ͳʹ͔ۙͦ͏ͳͷ͕ݟ͔ͭͬͨΒϥοΩʔ • Ռ࣮ͨͯ͠ࡍ…ʂʁ
ʲਤʳҠಈڑͱ֤ࢦඪͷϚτϦΫε ࢦඪউ͓Αͼϐλΰϥεউ, ؔ͋ͬͨΒ૬ؔ͋Δͣ ˞ϐλΰϥεউɿಘࣦࠩΛͬͨ༧ଌউϞσϧ
ʁʁʁʮ͓͔͍͠ͷ͓લͷҠಈڑͩΑʯ ૬ؔΛग़͢·Ͱͳ͍݁Ռʹʢ͑ʣ
ऑ͍ɾਏ͍ʹڑؔͳ͍ ͍,ͳΜͱͳͦ͘Μͳ༧ײͨ͠ΜͰ͚͢ͲͶ()
ͪͳΈʹҬΛՄࢹԽ͢Δͱ ຊྥଧ͕ͨ͘͞Μग़Δͱ͜Ζ,ೋྥଧʢҎԼಉจʣ Kepler.glʹCSVΛ৯ΘͤΔͱ͔͍͍ͬ͜ՄࢹԽ͕ʂ
͓ͬͱ ࣗݾհΕͯͨ :ukkari:
Who am I?ʢ͓લ୭Αʣ • Shinichi Nakagawa(@shinyorke) • ͔ͭͯٿΤϯδχΞΛࣄʹͯͨ͠ਓ • ઌ݄·ͰɿʮϓϩʯͷٿΤϯδχΞ
• ࠓ݄͔ΒɿʮੜʯͷٿΤϯδχΞʢʹ෮ؼʣ • Pythonࣗ͘͘शࣨʢ#rettypyʣΦʔΨφΠβʔ • Web, σʔλαΠΤϯε, Opsʹ⽁ΛPythonͰΔਓ
JX௨৴ࣾʢʹస৬ͯ͠·ͨ͠ʣ • ࠓ݄͔ΒʢגʣJX௨৴ࣾͷSenior Engineerʹ • σʔλج൫ΛθϩϕʔεͰ্ཱͪ͛Δࣄ ʢଞ, Pythonؔ࿈ͷ͋Ε͜Εɾ࠾༻ใͳͲʣ • స৬ͷܦҢɾϙΤϜϒϩάʹͯ
https://shinyorke.hatenablog.com/entry/it-really- could-happen
JX௨৴ࣾ #ͱ ؾʹͳΔํޙ΄ͲλΠϜͰʂ Corp: https://jxpress.net/ Twitter: @jxpress_corp
#ؒืूத • αʔόʔαΠυɾϑϩϯτΤϯυɾػցֶश ৄ͘͠ https://jobs.jxpress.net/ • ߇͑ΊʹݴͬͯΊͬͪΌPythonͰ͢ʢ͜ͳΈʣ • Serverlessͱ͔Big Dataͱ͔ϝονϟઓͰ͖·͢
• ॻ੶, IDE, ษڧձࢀՃඅձࣾෛ୲, #PyConJP εϙϯαʔଞ • ؾʹͳΔํੋඇ͔͚ͯͶʂ
ͦΕͰΑཱྀ͍Λ✈ PyLadies Tokyo͞Μӹʑͷ͝ൃలΛʂ Shinichi Nakagawa(Twitter/Facebook/etc… @shinyorke)
ʲAppendixʳͬͨͷҰཡ • σʔλੳ • Jupyter notebook / Jupyter Lab https://jupyter.org/
• Pandas https://pandas.pydata.org/ • Plotly https://plot.ly/python/ • GIS • GeoPyʢGeocodingʣ https://geopy.readthedocs.io/en/stable/ • FoliumʢJupyter notebookਤʣ https://python-visualization.github.io/folium/ • Kepler.glʢVisualizationʣ https://kepler.gl/ • ⚾ ٿɹ˞ͯ͢MLBͰ͢ • Baseball Databank https://github.com/chadwickbureau/baseballdatabank • Retrosheet https://github.com/chadwickbureau/retrosheet • Analyzing Baseball Data with Rʢॻ੶,༸ॻʣ https://www.amazon.co.jp/dp/B07KRNP2BB