Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Weather Data Scraping

Keiichiro
October 27, 2017

Weather Data Scraping

2017年10月27日に開催された "Pythonスクレイピング勉強会(APIによるデータの収集と活用)" で発表したスライドです。

Keiichiro

October 27, 2017
Tweet

More Decks by Keiichiro

Other Decks in Programming

Transcript

  1. ࣗݾ঺հ w ٢ଜܓҰ࿠ ͚ʔ͍ͪ!42  w ήώϧϯͱ͍͏ձࣾͰ๷ࡂʹؔ͢ΔγεςϜΛ࡞͍ͬͯ·͢
 ˠ+BWB 1ZUIPO +BWB4DSJQU

    /PEFKT ͳͲΛ࢖༻ w Ͱ΋ઐ໳͸ҰԠిؾిࢠܥͷԿͰ΋԰Ͱ͢ w ݄ճిࢠ޻࡞*P5ษڧձΛ΍ͬͯ·͢
  2. ؍ଌॴ൪߸,౎ಓ෎ݝ,஍఺,ࠃࡍ஍఺൪߸,ݱࡏ࣌ࠁ(೥),ݱࡏ࣌ࠁ(݄),ݱࡏ࣌ࠁ(೔),ݱࡏ࣌ࠁ(࣌),ݱࡏ࣌ࠁ (෼),1࣌ؒ߱ਫྔۃ஋ߋ৽,1࣌ؒ߱ਫྔۃ஋ߋ৽(10೥ະຬ),3࣌ؒ߱ਫྔۃ஋ߋ৽,3࣌ؒ߱ਫྔۃ஋ߋ৽(10೥ ະຬ),24࣌ؒ߱ਫྔۃ஋ߋ৽,24࣌ؒ߱ਫྔۃ஋ߋ৽(10೥ະຬ),48࣌ؒ߱ਫྔۃ஋ߋ৽,48࣌ؒ߱ਫྔۃ஋ߋ৽ (10೥ະຬ),72࣌ؒ߱ਫྔۃ஋ߋ৽,72࣌ؒ߱ਫྔۃ஋ߋ৽(10೥ະຬ),1࣌ؒ߱ਫྔ ݱࡏ஋(mm),1࣌ؒ߱ਫྔ ݱࡏ஋ͷ඼࣭৘ใ,1࣌ؒ߱ਫྔ ࠓ೔ͷ࠷େ஋(mm),1࣌ؒ߱ਫྔ ࠓ೔ͷ࠷େ஋ͷ඼࣭৘ใ,3࣌ؒ߱ਫྔ ݱࡏ஋ (mm),3࣌ؒ߱ਫྔ

    ݱࡏ஋ͷ඼࣭৘ใ,3࣌ؒ߱ਫྔ ࠓ೔ͷ࠷େ஋(mm),3࣌ؒ߱ਫྔ ࠓ೔ͷ࠷େ஋ͷ඼࣭৘ ใ,24࣌ؒ߱ਫྔ ݱࡏ஋(mm),24࣌ؒ߱ਫྔ ݱࡏ஋ͷ඼࣭৘ใ,24࣌ؒ߱ਫྔ ࠓ೔ͷ࠷େ஋(mm),24࣌ؒ߱ਫྔ ࠓ೔ͷ࠷େ஋ͷ඼࣭৘ใ,48࣌ؒ߱ਫྔ ݱࡏ஋(mm),48࣌ؒ߱ਫྔ ݱࡏ஋ͷ඼࣭৘ใ,48࣌ؒ߱ਫྔ ࠓ೔ͷ࠷େ ஋(mm),48࣌ؒ߱ਫྔ ࠓ೔ͷ࠷େ஋ͷ඼࣭৘ใ,72࣌ؒ߱ਫྔ ݱࡏ஋(mm),72࣌ؒ߱ਫྔ ݱࡏ஋ͷ඼࣭৘ใ,72 ࣌ؒ߱ਫྔ ࠓ೔ͷ࠷େ஋(mm),72࣌ؒ߱ਫྔ ࠓ೔ͷ࠷େ஋ͷ඼࣭৘ใ 11001,๺ւಓ फ୩஍ํ,फ୩ິ,, 2017,10,27,19,30,,,,,,,,,,,0.0,8,0.0,5,0.0,8,0.0,4,0.0,8,0.0,4,0.0,8,0.0,4,0.0,8,0.0,4 11016,๺ւಓ फ୩஍ํ,ஓ಺, 47401,2017,10,27,19,30,,,,,,,,,,,0.0,8,0.0,5,0.0,8,0.0,4,0.0,8,0.0,4,0.0,8,0.0,4,0.0,8 ,0.0,4 11046,๺ւಓ फ୩஍ํ,ྱจ,, 2017,10,27,19,30,,,,,,,,,,,0.0,8,0.0,5,0.0,8,0.0,4,0.0,8,0.0,4,0.0,8,0.0,4,0.0,8,0.0,4 11061,๺ւಓ फ୩஍ํ,੠໰,,2017,10,27,19,30,,,,,,,,,,,,1,,1,,1,,1,,1,,1,,1,,1,,1,,1 ߱ਫྔͷऔಘ
  3. ද  फ୩ फ୩ິ ι΢ϠϛαΩ ஓ಺ࢢफ୩ິ    

     फ୩ ஓ಺ ϫοΧφΠ ஓ಺ࢢ։ӡɹஓ಺஍ํؾ৅୆      फ୩ ྱจ Ϩϒϯ ྱจ܊ྱจொେࣈ߳ਂଜࣈτϯφΠ      फ୩ ੠໰ ίΤτΠ ஓ಺ࢢେࣈ੠໰ଜࣈ੠໰ɹஓ಺ߤۭؾ৅؍ଌॴ      फ୩ ඿َࢤผ ϋϚΦχγϕπ फ୩܊Ԑ෷ଜ඿َࢤผ      फ୩ ຊധ ϞτυϚϦ ར৲܊ར৲෋࢜ொԗധࣈຊധɹར৲ߤۭؾ৅؍ଌॴ      फ୩ প઒ ψϚΧϫ ஓ಺ࢢ੠໰ଜࣈপ઒      फ୩ ۹ܗ ΫπΨλ ར৲܊ར৲ொ۹ܗࣈઘொ      फ୩ ๛෋ τϤτϛ ఱԘ܊๛෋ொࣈ্αϩϕπ      फ୩ ඿಴ผ ϋϚτϯϕπ ࢬ޾܊඿಴ผொΫονϟϩބ൞      फ୩ த಴ผ φΧτϯϕπ ࢬ޾܊த಴ผொ্ۨ      फ୩ ๺ݟࢬ޾ ΩλϛΤαγ ࢬ޾܊ࢬ޾ொຊொɹ๺ݟࢬ޾ಛผ஍Ҭؾ৅؍ଌॴ      फ୩ Վొ ΢λϊϘϦ ࢬ޾܊ࢬ޾ொՎొ౦ொ      फ୩ ຈԆ ϗϩϊϕ ఱԘ܊ຈԆொࣈ্ຈԆ      ্઒ த઒ φΧΨϫ த઒܊த઒ொத઒      ্઒ ԻҖࢠ෎ ΦτΠωοϓ த઒܊ԻҖࢠ෎ଜԻҖࢠ෎      ্઒ খं ΦάϧϚ த઒܊ඒਂொࣈখं      ্઒ ඒਂ ϏϑΧ த઒܊ඒਂொ੢ொ      ্઒ ໊د φϤϩ ໊دࢢେڮ      ্઒ ੢෩࿈ χγϑ΢Ϩϯ ໊دࢢ෩࿈ொ੢෩࿈     ඞཁͳ৘ใ͚ͩൈ͖ग़͢
  4. import sys, shutil import csv, json import urllib.request import sqlite3

    import codecs argvs = sys.argv argc = len(argvs) if (argc > 1): datetime = argvs[1] else: datetime = 'rct' # SQLiteʹΞϝμε؍ଌ஍఺ςʔϒϧΛల։ connection = sqlite3.connect(":memory:") cursor = connection.cursor() cursor.execute("CREATE TABLE amedas (code TEXT PRIMARY KEY, pref TEXT, name TEXT, kana TEXT, address TEXT, lat_d INTEGER, lat_m REAL, lon_d INTEGER, lon_m REAL);") with open("amedas_point.csv",'r') as fin: dr = csv.DictReader(fin, fieldnames = ('code', 'pref', 'name', 'kana', 'address', 'lat_d', 'lat_m', 'lon_d', 'lon_m')) to_db = [(c['code'], c['pref'], c['name'], c['kana'], c['address'], c['lat_d'], c['lat_m'], c['lon_d'], c['lon_m']) for c in dr] cursor.executemany("INSERT INTO amedas(code, pref, name, kana, address, lat_d, lat_m, lon_d, lon_m) VALUES(?, ?, ?, ?, ?, ?, ?, ?, ?);", to_db) connection.commit()
  5. # ؾ৅ிWebαΠτ͔ΒcsvΛऔಘ data_url = 'http://www.data.jma.go.jp/obd/stats/data/mdrr/pre_rct/alltable/ preall00_'+datetime+'.csv' data_req = urllib.request.Request(data_url) with

    urllib.request.urlopen(data_req) as response: reader = csv.reader(response.read().decode('shift-jis').splitlines()) header = next(reader)
  6. for r in reader: cursor.execute("SELECT lat_d, lat_m, lon_d, lon_m FROM

    amedas WHERE code=" + r[0]) row = cursor.fetchone() if(row is not None): lat = float(row[0]) + float(row[1])/60 lon = float(row[2]) + float(row[3])/60 value1h = float(r[19]) if r[19] else None value3h = float(r[23]) if r[23] else None value24h = float(r[27]) if r[27] else None value48h = float(r[31]) if r[31] else None value72h = float(r[35]) if r[35] else None feature = { "geometry": { "type": "Point", "coordinates": [ float(lon), float(lat) ] }, "type": "Feature", "properties": { "code": r[0], "pref": r[1], "name": r[2], "value1h": value1h, "value3h": value3h, "value24h": value24h, "value48h": value48h, "value72h": value72h } } features.append(feature) featurecollection = {"type":"FeatureCollection","features":features}
  7. import sys, shutil import json import urllib.request import xml.etree.ElementTree as

    et argvs = sys.argv argc = len(argvs) if (argc > 1): datetime = argvs[1] else: basetime_url = 'http://www.jma.go.jp/jp/highresorad/highresorad_tile/ tile_basetime.xml' basetime_req = urllib.request.Request(basetime_url) with urllib.request.urlopen(basetime_req) as response: basetime_xml = response.read() basetime = et.fromstring(basetime_xml) datetime = basetime[0].text data_url = 'http://www.jma.go.jp/jp/highresorad/highresorad_tile/ LIDEN/'+datetime+'/'+datetime+'/none/data.xml' data_req = urllib.request.Request(data_url) with urllib.request.urlopen(data_req) as response: data_xml = response.read() data = et.fromstring(data_xml) features = []
  8. for i,child in enumerate(data): if i is not 0: feature

    = { "geometry": { "type": "Point", "coordinates": [ float(child.attrib["lon"]), float(child.attrib["lat"]) ] }, "type": "Feature", "properties": { "type": int(child.attrib["type"]) } } features.append(feature) featurecollection = {"type":"FeatureCollection","features":features} f = open(datetime + ".json", "w") f.write(json.dumps(featurecollection)) f.close() shutil.copy(datetime + ".json", "recent.json") print("save to " + datetime + ".json")