GCPではじめるスモールスタートなデータ活用

 GCPではじめるスモールスタートなデータ活用

2016-09-06
bq_sushi #4での発表資料です

D77e6b2d469947a4792ab062d466350b?s=128

Takashi Nishibayashi

September 06, 2016
Tweet

Transcript

  1. 2.

    2 Takashi Nishibayashi Software Engineer Zucks AdNetwork, Zucks Inc. Data

    analysis team ݱࡏ͸഑৴ޮ཰ͷ࠷దԽ ೖࡳՁ֨ࣗಈௐ੔ϩδοΫɺ഑৴αʔ όʔͷ޿ࠂબ୒ϩδοΫͷ։ൃʹैࣄ @hagino3000
  2. 7.
  3. 10.

    10 1ظ: ·ͣ͸σʔλαΠΤϯςΟετ͕ར༻Ͱ͖ΔΑ͏ʹ ²  ޿ࠂͷ഑৴ϩάΛBigQueryʹྲྀ͠ࠐΜͩ ²  MySQLͷϚελσʔλ΋BigQueryʹಉظ ²  WebUI΍PandasɺBigQuery Pythonܦ༝Ͱར༻

    ²  BigQueryͰαϒαϯϓϦϯάͯ͠ϩʔΧϧϚγϯͰֶश ²  AWS EMRୀ໾ ²  Elastic Searchୀ໾ ²  Cloud Datalab betaʹඈͼ͍ͭͯരࢮ (2016೥1݄)
  4. 12.

    12 2ظ: όονॲཧ͔Βར༻Ͱ͖ΔΑ͏ʹ ²  CloudLoggingͷઃఆͰBigQueryͷ؂ࠪϩάΛBigQueryʹΤΫεϙʔτ ²  ػೳຖʹαʔϏεΞΧ΢ϯτΛ෷͍ग़ͯ͠ɺ࢖༻ঢ়گΛ೺Ѳ ²  ίετ͕௓ͶͨΒ௨஌ ² 

    ೖࡳ୯Ձࣗಈௐ੔όονɺෆਖ਼ΫϦοΫ൑ఆόον͕Քಈ ²  ϧʔϧϕʔεɺҟৗݕ஌ϕʔεͷࣝผλεΫ͸SQLͰॻ͚Δ ²  ࣮ݧ݁Ռ͸Cloud Storage/BigQueryʹอଘ
  5. 13.

    13

  6. 14.

    14

  7. 18.

    18 ཁٻ͞ΕΔσʔλ඼࣭Ϩϕϧ΋มΘΔ ü  Ϣʔεέʔε͕૿͑Δͱσʔλ඼࣭͕՝୊ʹ ü  23࣌୆ͷϩάऔΓࠐΈ͕ऴͬͨ௚ޙʹॲཧΛ૸Β͍ͤͨΜ͚ͩͲ? ²  Stream Insert, Batch

    Insert, ΫΤϦશͯϦτϥΠػߏ͸ඞਢ ²  ݄ʹ1౓͸BigQueryͷௐࢠͷѱ͍೔͕͋Δ ²  σʔλͷऔΓࠐΈ࿙ΕɺॏෳऔΓࠐΈνΣοΫͷόονΛՔಇ ²  σʔλͷऔΓࠐΈঢ়گ͕֎෦͔Β֬ೝͰ͖Δ࢓૊Έ
  8. 20.

    20 ͦͷଞ •  ΦϯϥΠϯͰ౎౓σʔλΛࢀর͢ΔΑ͏ͳॲཧʹBigQuery͸޲͔ͳ͍ •  Key-ValueͰҾ͚ΔΑ͏ʹͯ͠BigtableΛ࢖ͬͨํ͕͍͍ •  BigQueryͷલʹΩϟογϡϨΠϠΛ༻ҙ͢Δࣄྫ΋ •  Cloud

    Dataproc or Cloud Dataflow…… •  Spotify͸Spark͸ෳࡶ͗ͯ͢࢖͑ͳ͍ͱͷࣄͰDataflowΛscala͔Βར༻ •  https://github.com/spotify/scio •  Cloud Datalab͕৽͘͠ͳͬͨͦ͏ͳͷͰظ଴ •  Jupyter NotebookͷΫϥ΢υ൛
  9. 21.

    21 ·ͱΊ •  ͍͖ͳΓ೉͍͠ॴΛૂ͏ͱ੒Ռ͕ग़Δ·Ͱ͕͔͔࣌ؒΔͨΊɺ஍ͳΒ͠Λ͠ ͳ͕Βσʔλ׆༻ΛਐΊ͍ͯΔ •  SQLͰهड़Ͱ͖Δϧʔϧϕʔε΍ҟৗݕ஌ϕʔεͷॲཧ͸ػցֶशͱൺֱ͢ Δͱૣ͘੒Ռ͕ग़ͤΔ •  Cloud

    Storage, Cloud Logging, Cloud Dataprocͱͷ࿈ܞ͕ڧԽ͞Εɺ BigQueryͷϢʔεέʔε͕૿͑ͨ •  ਺ඦmsecͷԠ౴ੑೳɺಉ࣌ΫΤϦ࣮ߦ਺ɺ҆ఆੑΛٻΊͳ͚Ε͹BigQuery͸ Ϧʔζφϒϧʹ࢖͑Δ