Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Weather Data Scraping

Avatar for Keiichiro Keiichiro
October 27, 2017

Weather Data Scraping

2017年10月27日に開催された "Pythonスクレイピング勉強会(APIによるデータの収集と活用)" で発表したスライドです。

Avatar for Keiichiro

Keiichiro

October 27, 2017
Tweet

More Decks by Keiichiro

Other Decks in Programming

Transcript

  1. ࣗݾ঺հ w ٢ଜܓҰ࿠ ͚ʔ͍ͪ!42  w ήώϧϯͱ͍͏ձࣾͰ๷ࡂʹؔ͢ΔγεςϜΛ࡞͍ͬͯ·͢
 ˠ+BWB 1ZUIPO +BWB4DSJQU

    /PEFKT ͳͲΛ࢖༻ w Ͱ΋ઐ໳͸ҰԠిؾిࢠܥͷԿͰ΋԰Ͱ͢ w ݄ճిࢠ޻࡞*P5ษڧձΛ΍ͬͯ·͢
  2. ؍ଌॴ൪߸,౎ಓ෎ݝ,஍఺,ࠃࡍ஍఺൪߸,ݱࡏ࣌ࠁ(೥),ݱࡏ࣌ࠁ(݄),ݱࡏ࣌ࠁ(೔),ݱࡏ࣌ࠁ(࣌),ݱࡏ࣌ࠁ (෼),1࣌ؒ߱ਫྔۃ஋ߋ৽,1࣌ؒ߱ਫྔۃ஋ߋ৽(10೥ະຬ),3࣌ؒ߱ਫྔۃ஋ߋ৽,3࣌ؒ߱ਫྔۃ஋ߋ৽(10೥ ະຬ),24࣌ؒ߱ਫྔۃ஋ߋ৽,24࣌ؒ߱ਫྔۃ஋ߋ৽(10೥ະຬ),48࣌ؒ߱ਫྔۃ஋ߋ৽,48࣌ؒ߱ਫྔۃ஋ߋ৽ (10೥ະຬ),72࣌ؒ߱ਫྔۃ஋ߋ৽,72࣌ؒ߱ਫྔۃ஋ߋ৽(10೥ະຬ),1࣌ؒ߱ਫྔ ݱࡏ஋(mm),1࣌ؒ߱ਫྔ ݱࡏ஋ͷ඼࣭৘ใ,1࣌ؒ߱ਫྔ ࠓ೔ͷ࠷େ஋(mm),1࣌ؒ߱ਫྔ ࠓ೔ͷ࠷େ஋ͷ඼࣭৘ใ,3࣌ؒ߱ਫྔ ݱࡏ஋ (mm),3࣌ؒ߱ਫྔ

    ݱࡏ஋ͷ඼࣭৘ใ,3࣌ؒ߱ਫྔ ࠓ೔ͷ࠷େ஋(mm),3࣌ؒ߱ਫྔ ࠓ೔ͷ࠷େ஋ͷ඼࣭৘ ใ,24࣌ؒ߱ਫྔ ݱࡏ஋(mm),24࣌ؒ߱ਫྔ ݱࡏ஋ͷ඼࣭৘ใ,24࣌ؒ߱ਫྔ ࠓ೔ͷ࠷େ஋(mm),24࣌ؒ߱ਫྔ ࠓ೔ͷ࠷େ஋ͷ඼࣭৘ใ,48࣌ؒ߱ਫྔ ݱࡏ஋(mm),48࣌ؒ߱ਫྔ ݱࡏ஋ͷ඼࣭৘ใ,48࣌ؒ߱ਫྔ ࠓ೔ͷ࠷େ ஋(mm),48࣌ؒ߱ਫྔ ࠓ೔ͷ࠷େ஋ͷ඼࣭৘ใ,72࣌ؒ߱ਫྔ ݱࡏ஋(mm),72࣌ؒ߱ਫྔ ݱࡏ஋ͷ඼࣭৘ใ,72 ࣌ؒ߱ਫྔ ࠓ೔ͷ࠷େ஋(mm),72࣌ؒ߱ਫྔ ࠓ೔ͷ࠷େ஋ͷ඼࣭৘ใ 11001,๺ւಓ फ୩஍ํ,फ୩ິ,, 2017,10,27,19,30,,,,,,,,,,,0.0,8,0.0,5,0.0,8,0.0,4,0.0,8,0.0,4,0.0,8,0.0,4,0.0,8,0.0,4 11016,๺ւಓ फ୩஍ํ,ஓ಺, 47401,2017,10,27,19,30,,,,,,,,,,,0.0,8,0.0,5,0.0,8,0.0,4,0.0,8,0.0,4,0.0,8,0.0,4,0.0,8 ,0.0,4 11046,๺ւಓ फ୩஍ํ,ྱจ,, 2017,10,27,19,30,,,,,,,,,,,0.0,8,0.0,5,0.0,8,0.0,4,0.0,8,0.0,4,0.0,8,0.0,4,0.0,8,0.0,4 11061,๺ւಓ फ୩஍ํ,੠໰,,2017,10,27,19,30,,,,,,,,,,,,1,,1,,1,,1,,1,,1,,1,,1,,1,,1 ߱ਫྔͷऔಘ
  3. ද  फ୩ फ୩ິ ι΢ϠϛαΩ ஓ಺ࢢफ୩ິ    

     फ୩ ஓ಺ ϫοΧφΠ ஓ಺ࢢ։ӡɹஓ಺஍ํؾ৅୆      फ୩ ྱจ Ϩϒϯ ྱจ܊ྱจொେࣈ߳ਂଜࣈτϯφΠ      फ୩ ੠໰ ίΤτΠ ஓ಺ࢢେࣈ੠໰ଜࣈ੠໰ɹஓ಺ߤۭؾ৅؍ଌॴ      फ୩ ඿َࢤผ ϋϚΦχγϕπ फ୩܊Ԑ෷ଜ඿َࢤผ      फ୩ ຊധ ϞτυϚϦ ར৲܊ར৲෋࢜ொԗധࣈຊധɹར৲ߤۭؾ৅؍ଌॴ      फ୩ প઒ ψϚΧϫ ஓ಺ࢢ੠໰ଜࣈপ઒      फ୩ ۹ܗ ΫπΨλ ར৲܊ར৲ொ۹ܗࣈઘொ      फ୩ ๛෋ τϤτϛ ఱԘ܊๛෋ொࣈ্αϩϕπ      फ୩ ඿಴ผ ϋϚτϯϕπ ࢬ޾܊඿಴ผொΫονϟϩބ൞      फ୩ த಴ผ φΧτϯϕπ ࢬ޾܊த಴ผொ্ۨ      फ୩ ๺ݟࢬ޾ ΩλϛΤαγ ࢬ޾܊ࢬ޾ொຊொɹ๺ݟࢬ޾ಛผ஍Ҭؾ৅؍ଌॴ      फ୩ Վొ ΢λϊϘϦ ࢬ޾܊ࢬ޾ொՎొ౦ொ      फ୩ ຈԆ ϗϩϊϕ ఱԘ܊ຈԆொࣈ্ຈԆ      ্઒ த઒ φΧΨϫ த઒܊த઒ொத઒      ্઒ ԻҖࢠ෎ ΦτΠωοϓ த઒܊ԻҖࢠ෎ଜԻҖࢠ෎      ্઒ খं ΦάϧϚ த઒܊ඒਂொࣈখं      ্઒ ඒਂ ϏϑΧ த઒܊ඒਂொ੢ொ      ্઒ ໊د φϤϩ ໊دࢢେڮ      ্઒ ੢෩࿈ χγϑ΢Ϩϯ ໊دࢢ෩࿈ொ੢෩࿈     ඞཁͳ৘ใ͚ͩൈ͖ग़͢
  4. import sys, shutil import csv, json import urllib.request import sqlite3

    import codecs argvs = sys.argv argc = len(argvs) if (argc > 1): datetime = argvs[1] else: datetime = 'rct' # SQLiteʹΞϝμε؍ଌ஍఺ςʔϒϧΛల։ connection = sqlite3.connect(":memory:") cursor = connection.cursor() cursor.execute("CREATE TABLE amedas (code TEXT PRIMARY KEY, pref TEXT, name TEXT, kana TEXT, address TEXT, lat_d INTEGER, lat_m REAL, lon_d INTEGER, lon_m REAL);") with open("amedas_point.csv",'r') as fin: dr = csv.DictReader(fin, fieldnames = ('code', 'pref', 'name', 'kana', 'address', 'lat_d', 'lat_m', 'lon_d', 'lon_m')) to_db = [(c['code'], c['pref'], c['name'], c['kana'], c['address'], c['lat_d'], c['lat_m'], c['lon_d'], c['lon_m']) for c in dr] cursor.executemany("INSERT INTO amedas(code, pref, name, kana, address, lat_d, lat_m, lon_d, lon_m) VALUES(?, ?, ?, ?, ?, ?, ?, ?, ?);", to_db) connection.commit()
  5. # ؾ৅ிWebαΠτ͔ΒcsvΛऔಘ data_url = 'http://www.data.jma.go.jp/obd/stats/data/mdrr/pre_rct/alltable/ preall00_'+datetime+'.csv' data_req = urllib.request.Request(data_url) with

    urllib.request.urlopen(data_req) as response: reader = csv.reader(response.read().decode('shift-jis').splitlines()) header = next(reader)
  6. for r in reader: cursor.execute("SELECT lat_d, lat_m, lon_d, lon_m FROM

    amedas WHERE code=" + r[0]) row = cursor.fetchone() if(row is not None): lat = float(row[0]) + float(row[1])/60 lon = float(row[2]) + float(row[3])/60 value1h = float(r[19]) if r[19] else None value3h = float(r[23]) if r[23] else None value24h = float(r[27]) if r[27] else None value48h = float(r[31]) if r[31] else None value72h = float(r[35]) if r[35] else None feature = { "geometry": { "type": "Point", "coordinates": [ float(lon), float(lat) ] }, "type": "Feature", "properties": { "code": r[0], "pref": r[1], "name": r[2], "value1h": value1h, "value3h": value3h, "value24h": value24h, "value48h": value48h, "value72h": value72h } } features.append(feature) featurecollection = {"type":"FeatureCollection","features":features}
  7. import sys, shutil import json import urllib.request import xml.etree.ElementTree as

    et argvs = sys.argv argc = len(argvs) if (argc > 1): datetime = argvs[1] else: basetime_url = 'http://www.jma.go.jp/jp/highresorad/highresorad_tile/ tile_basetime.xml' basetime_req = urllib.request.Request(basetime_url) with urllib.request.urlopen(basetime_req) as response: basetime_xml = response.read() basetime = et.fromstring(basetime_xml) datetime = basetime[0].text data_url = 'http://www.jma.go.jp/jp/highresorad/highresorad_tile/ LIDEN/'+datetime+'/'+datetime+'/none/data.xml' data_req = urllib.request.Request(data_url) with urllib.request.urlopen(data_req) as response: data_xml = response.read() data = et.fromstring(data_xml) features = []
  8. for i,child in enumerate(data): if i is not 0: feature

    = { "geometry": { "type": "Point", "coordinates": [ float(child.attrib["lon"]), float(child.attrib["lat"]) ] }, "type": "Feature", "properties": { "type": int(child.attrib["type"]) } } features.append(feature) featurecollection = {"type":"FeatureCollection","features":features} f = open(datetime + ".json", "w") f.write(json.dumps(featurecollection)) f.close() shutil.copy(datetime + ".json", "recent.json") print("save to " + datetime + ".json")