Save 37% off PRO during our Black Friday Sale! »

Weather Data Scraping

A67c031fb6984bab6560721bd4b830c0?s=47 Keiichiro
October 27, 2017

Weather Data Scraping

2017年10月27日に開催された "Pythonスクレイピング勉強会(APIによるデータの収集と活用)" で発表したスライドです。

A67c031fb6984bab6560721bd4b830c0?s=128

Keiichiro

October 27, 2017
Tweet

Transcript

  1. ೔ຊࠃ಺ͷؾ৅σʔλΛ εΫϨΠϐϯάͯ͠༡ΜͰΈΔ ,FJ:04)*.63"

  2. ࣗݾ঺հ w ٢ଜܓҰ࿠ ͚ʔ͍ͪ!42  w ήώϧϯͱ͍͏ձࣾͰ๷ࡂʹؔ͢ΔγεςϜΛ࡞͍ͬͯ·͢
 ˠ+BWB 1ZUIPO +BWB4DSJQU

    /PEFKT ͳͲΛ࢖༻ w Ͱ΋ઐ໳͸ҰԠిؾిࢠܥͷԿͰ΋԰Ͱ͢ w ݄ճిࢠ޻࡞*P5ษڧձΛ΍ͬͯ·͢
  3. None
  4. None
  5. εΫϨΠϐϯάର৅ͱͯ͠ͷ ؾ৅σʔλ

  6. ৭ʑͳ৘ใݯ w ؾ৅ிؔ܎
 ᵓؾ৅ி8FCαΠτ
 ᵓؾ৅ி๷ࡂ৘ใ9.-ϑΥʔϚοτిจ
 ᵓ-Ξϥʔτ ެڞ৘ใίϞϯζ 
 ᵓؾ৅ۀ຿ࢧԉηϯλʔ
 FUDʜ

    w :BIPP"1*
  7. ৭ʑͳ৘ใݯ w ؾ৅ிؔ܎
 ᵓؾ৅ி8FCαΠτ
 ᵓؾ৅ி๷ࡂ৘ใ9.-ϑΥʔϚοτిจ
 ᵓ-Ξϥʔτ ެڞ৘ใίϞϯζ 
 ᵓؾ৅ۀ຿ࢧԉηϯλʔ
 FUDʜ

    w :BIPP"1*
  8. ߱ਫྔ ".&%"4 ͷ
 σʔλΛऔಘͯ͠ΈΔ

  9. ϑϩʔ w ʮ࠷৽ͷؾ৅σʔλʯ$47μ΢ϯϩʔυ͔Βऔಘ w (FP+40/ʹม׵͢Δ w ஍ਤ্ʹϚοϐϯάͯ͠ΈΔ 2(*4 ώʔτϚοϓ

  10. w ؾ৅ிʮ࠷৽ͷؾ৅σʔλʯ$47μ΢ϯϩʔυʹ͍ͭͯ w IUUQXXXEBUBKNBHPKQPCETUBUTEBUBNESS EPDTDTW@EM@SFBENFIUNM ߱ਫྔͷऔಘ

  11. ߱ਫྔͷऔಘ ߱ਫྔશཁૉ࠷৽
 IUUQXXXEBUBKNBHPKQPCETUBUTEBUBNESSQSF@SDUBMMUBCMF QSFBMM@SDUDTW ߱ਫྔશཁૉ࣌ࠁࢦఆ ೥݄೔࣌෼ͷ৔߹ 
 IUUQXXXEBUBKNBHPKQPCETUBUTEBUBNESSQSF@SDUBMMUBCMF QSFBMM@DTW

  12. ؍ଌॴ൪߸,౎ಓ෎ݝ,஍఺,ࠃࡍ஍఺൪߸,ݱࡏ࣌ࠁ(೥),ݱࡏ࣌ࠁ(݄),ݱࡏ࣌ࠁ(೔),ݱࡏ࣌ࠁ(࣌),ݱࡏ࣌ࠁ (෼),1࣌ؒ߱ਫྔۃ஋ߋ৽,1࣌ؒ߱ਫྔۃ஋ߋ৽(10೥ະຬ),3࣌ؒ߱ਫྔۃ஋ߋ৽,3࣌ؒ߱ਫྔۃ஋ߋ৽(10೥ ະຬ),24࣌ؒ߱ਫྔۃ஋ߋ৽,24࣌ؒ߱ਫྔۃ஋ߋ৽(10೥ະຬ),48࣌ؒ߱ਫྔۃ஋ߋ৽,48࣌ؒ߱ਫྔۃ஋ߋ৽ (10೥ະຬ),72࣌ؒ߱ਫྔۃ஋ߋ৽,72࣌ؒ߱ਫྔۃ஋ߋ৽(10೥ະຬ),1࣌ؒ߱ਫྔ ݱࡏ஋(mm),1࣌ؒ߱ਫྔ ݱࡏ஋ͷ඼࣭৘ใ,1࣌ؒ߱ਫྔ ࠓ೔ͷ࠷େ஋(mm),1࣌ؒ߱ਫྔ ࠓ೔ͷ࠷େ஋ͷ඼࣭৘ใ,3࣌ؒ߱ਫྔ ݱࡏ஋ (mm),3࣌ؒ߱ਫྔ

    ݱࡏ஋ͷ඼࣭৘ใ,3࣌ؒ߱ਫྔ ࠓ೔ͷ࠷େ஋(mm),3࣌ؒ߱ਫྔ ࠓ೔ͷ࠷େ஋ͷ඼࣭৘ ใ,24࣌ؒ߱ਫྔ ݱࡏ஋(mm),24࣌ؒ߱ਫྔ ݱࡏ஋ͷ඼࣭৘ใ,24࣌ؒ߱ਫྔ ࠓ೔ͷ࠷େ஋(mm),24࣌ؒ߱ਫྔ ࠓ೔ͷ࠷େ஋ͷ඼࣭৘ใ,48࣌ؒ߱ਫྔ ݱࡏ஋(mm),48࣌ؒ߱ਫྔ ݱࡏ஋ͷ඼࣭৘ใ,48࣌ؒ߱ਫྔ ࠓ೔ͷ࠷େ ஋(mm),48࣌ؒ߱ਫྔ ࠓ೔ͷ࠷େ஋ͷ඼࣭৘ใ,72࣌ؒ߱ਫྔ ݱࡏ஋(mm),72࣌ؒ߱ਫྔ ݱࡏ஋ͷ඼࣭৘ใ,72 ࣌ؒ߱ਫྔ ࠓ೔ͷ࠷େ஋(mm),72࣌ؒ߱ਫྔ ࠓ೔ͷ࠷େ஋ͷ඼࣭৘ใ 11001,๺ւಓ फ୩஍ํ,फ୩ິ,, 2017,10,27,19,30,,,,,,,,,,,0.0,8,0.0,5,0.0,8,0.0,4,0.0,8,0.0,4,0.0,8,0.0,4,0.0,8,0.0,4 11016,๺ւಓ फ୩஍ํ,ஓ಺, 47401,2017,10,27,19,30,,,,,,,,,,,0.0,8,0.0,5,0.0,8,0.0,4,0.0,8,0.0,4,0.0,8,0.0,4,0.0,8 ,0.0,4 11046,๺ւಓ फ୩஍ํ,ྱจ,, 2017,10,27,19,30,,,,,,,,,,,0.0,8,0.0,5,0.0,8,0.0,4,0.0,8,0.0,4,0.0,8,0.0,4,0.0,8,0.0,4 11061,๺ւಓ फ୩஍ํ,੠໰,,2017,10,27,19,30,,,,,,,,,,,,1,,1,,1,,1,,1,,1,,1,,1,,1,,1 ߱ਫྔͷऔಘ
  13. w ϚοϓʹΦʔόʔϨΠ͢Δʹ͸৘ใ͕଍Γͳ͍ ‎؍ଌ஍఺ίʔυɺ؍ଌ஍఺໊͸͋Δ͚Ͳ࠲ඪ͕ͳ͍ ߱ਫྔͷऔಘ

  14. w ஍Ҭؾ৅؍ଌγεςϜ Ξϝμε ͷ֓ཁ w IUUQXXXKNBHPKQKNBLJTIPVLOPX BNFEBTLBJTFUTVIUNM ‎ $47ܗࣜ ஍Ҭؾ৅؍ଌॴҰཡ<;*1ѹॖܗࣜ>

    ؍ଌ஍఺৘ใͷ४උ
  15. None
  16. ද  फ୩ फ୩ິ ι΢ϠϛαΩ ஓ಺ࢢफ୩ິ    

     फ୩ ஓ಺ ϫοΧφΠ ஓ಺ࢢ։ӡɹஓ಺஍ํؾ৅୆      फ୩ ྱจ Ϩϒϯ ྱจ܊ྱจொେࣈ߳ਂଜࣈτϯφΠ      फ୩ ੠໰ ίΤτΠ ஓ಺ࢢେࣈ੠໰ଜࣈ੠໰ɹஓ಺ߤۭؾ৅؍ଌॴ      फ୩ ඿َࢤผ ϋϚΦχγϕπ फ୩܊Ԑ෷ଜ඿َࢤผ      फ୩ ຊധ ϞτυϚϦ ར৲܊ར৲෋࢜ொԗധࣈຊധɹར৲ߤۭؾ৅؍ଌॴ      फ୩ প઒ ψϚΧϫ ஓ಺ࢢ੠໰ଜࣈপ઒      फ୩ ۹ܗ ΫπΨλ ར৲܊ར৲ொ۹ܗࣈઘொ      फ୩ ๛෋ τϤτϛ ఱԘ܊๛෋ொࣈ্αϩϕπ      फ୩ ඿಴ผ ϋϚτϯϕπ ࢬ޾܊඿಴ผொΫονϟϩބ൞      फ୩ த಴ผ φΧτϯϕπ ࢬ޾܊த಴ผொ্ۨ      फ୩ ๺ݟࢬ޾ ΩλϛΤαγ ࢬ޾܊ࢬ޾ொຊொɹ๺ݟࢬ޾ಛผ஍Ҭؾ৅؍ଌॴ      फ୩ Վొ ΢λϊϘϦ ࢬ޾܊ࢬ޾ொՎొ౦ொ      फ୩ ຈԆ ϗϩϊϕ ఱԘ܊ຈԆொࣈ্ຈԆ      ্઒ த઒ φΧΨϫ த઒܊த઒ொத઒      ্઒ ԻҖࢠ෎ ΦτΠωοϓ த઒܊ԻҖࢠ෎ଜԻҖࢠ෎      ্઒ খं ΦάϧϚ த઒܊ඒਂொࣈখं      ্઒ ඒਂ ϏϑΧ த઒܊ඒਂொ੢ொ      ্઒ ໊د φϤϩ ໊دࢢେڮ      ্઒ ੢෩࿈ χγϑ΢Ϩϯ ໊دࢢ෩࿈ொ੢෩࿈     ඞཁͳ৘ใ͚ͩൈ͖ग़͢
  17. import sys, shutil import csv, json import urllib.request import sqlite3

    import codecs argvs = sys.argv argc = len(argvs) if (argc > 1): datetime = argvs[1] else: datetime = 'rct' # SQLiteʹΞϝμε؍ଌ஍఺ςʔϒϧΛల։ connection = sqlite3.connect(":memory:") cursor = connection.cursor() cursor.execute("CREATE TABLE amedas (code TEXT PRIMARY KEY, pref TEXT, name TEXT, kana TEXT, address TEXT, lat_d INTEGER, lat_m REAL, lon_d INTEGER, lon_m REAL);") with open("amedas_point.csv",'r') as fin: dr = csv.DictReader(fin, fieldnames = ('code', 'pref', 'name', 'kana', 'address', 'lat_d', 'lat_m', 'lon_d', 'lon_m')) to_db = [(c['code'], c['pref'], c['name'], c['kana'], c['address'], c['lat_d'], c['lat_m'], c['lon_d'], c['lon_m']) for c in dr] cursor.executemany("INSERT INTO amedas(code, pref, name, kana, address, lat_d, lat_m, lon_d, lon_m) VALUES(?, ?, ?, ?, ?, ?, ?, ?, ?);", to_db) connection.commit()
  18. # ؾ৅ிWebαΠτ͔ΒcsvΛऔಘ data_url = 'http://www.data.jma.go.jp/obd/stats/data/mdrr/pre_rct/alltable/ preall00_'+datetime+'.csv' data_req = urllib.request.Request(data_url) with

    urllib.request.urlopen(data_req) as response: reader = csv.reader(response.read().decode('shift-jis').splitlines()) header = next(reader)
  19. for r in reader: cursor.execute("SELECT lat_d, lat_m, lon_d, lon_m FROM

    amedas WHERE code=" + r[0]) row = cursor.fetchone() if(row is not None): lat = float(row[0]) + float(row[1])/60 lon = float(row[2]) + float(row[3])/60 value1h = float(r[19]) if r[19] else None value3h = float(r[23]) if r[23] else None value24h = float(r[27]) if r[27] else None value48h = float(r[31]) if r[31] else None value72h = float(r[35]) if r[35] else None feature = { "geometry": { "type": "Point", "coordinates": [ float(lon), float(lat) ] }, "type": "Feature", "properties": { "code": r[0], "pref": r[1], "name": r[2], "value1h": value1h, "value3h": value3h, "value24h": value24h, "value48h": value48h, "value72h": value72h } } features.append(feature) featurecollection = {"type":"FeatureCollection","features":features}
  20. f = open(datetime + ".json", "w") f.write(json.dumps(featurecollection, ensure_ascii=False)) f.close() shutil.copy(datetime

    + ".json", "recent.json") print("save to " + datetime + ".json")
  21. %FNP ࣮ࡍʹσʔλΛऔಘͯ͠ɺ+40/Λ֬ೝͯ͠ΈΔ

  22. None
  23. None
  24. (FP+40/Λඳըͯ͠ΈΔ

  25. %FNP 2(*4Ͱ(FP+40/ΛಡΈࠐΜͰɺ஍ਤʹΦʔόʔϨΠͯ͠ΈΔ

  26. None
  27. མཕ৘ใ -*%&/ ͷ
 σʔλΛऔಘͯ͠ΈΔ

  28. ϑϩʔ w ߴղ૾౓߱ਫφ΢Ωϟετ͔Βམཕ஍఺ͱछྨΛऔಘ w (FP+40/ʹม׵͢Δ w ஍ਤ্ʹϚοϐϯάͯ͠ΈΔ 0QFO-BZFST

  29. མཕ৘ใͷऔಘ

  30. མཕ৘ใͷऔಘ w σʔλҰཡ
 IUUQXXXKNBHPKQKQIJHISFTPSBE IJHISFTPSBE@UJMFUJMF@CBTFUJNFYNM w ݸʑͷσʔλ
 IUUQTXXXKNBHPKQKQIJHISFTPSBE IJHISFTPSBE@UJMF-*%&/ OPOFEBUBYNM


  31. import sys, shutil import json import urllib.request import xml.etree.ElementTree as

    et argvs = sys.argv argc = len(argvs) if (argc > 1): datetime = argvs[1] else: basetime_url = 'http://www.jma.go.jp/jp/highresorad/highresorad_tile/ tile_basetime.xml' basetime_req = urllib.request.Request(basetime_url) with urllib.request.urlopen(basetime_req) as response: basetime_xml = response.read() basetime = et.fromstring(basetime_xml) datetime = basetime[0].text data_url = 'http://www.jma.go.jp/jp/highresorad/highresorad_tile/ LIDEN/'+datetime+'/'+datetime+'/none/data.xml' data_req = urllib.request.Request(data_url) with urllib.request.urlopen(data_req) as response: data_xml = response.read() data = et.fromstring(data_xml) features = []
  32. for i,child in enumerate(data): if i is not 0: feature

    = { "geometry": { "type": "Point", "coordinates": [ float(child.attrib["lon"]), float(child.attrib["lat"]) ] }, "type": "Feature", "properties": { "type": int(child.attrib["type"]) } } features.append(feature) featurecollection = {"type":"FeatureCollection","features":features} f = open(datetime + ".json", "w") f.write(json.dumps(featurecollection)) f.close() shutil.copy(datetime + ".json", "recent.json") print("save to " + datetime + ".json")
  33. %FNP ࣮ࡍʹσʔλΛऔಘͯ͠ɺ+40/Λ֬ೝͯ͠ΈΔ

  34. None
  35. None
  36. (FP+40/Λඳըͯ͠ΈΔ

  37. %FNP 0QFO-BZFSTΛ࢖ͬͯɺϒϥ΢βͰඳըͯ͠ΈΔ

  38. ιʔείʔυ w KNBMJEFOHFPKTPO
 IUUQTHJUIVCDPN42KNBMJEFOHFPKTPO

  39. ͓ΘΓ