BigQuery速習会 / Quick Learning about BigQuery

6115584ec4554ed71f17effd3e0b6988?s=47 Altech
September 21, 2016

BigQuery速習会 / Quick Learning about BigQuery

BigQuery速習会@Wantedly
http://wantedly.connpass.com/event/40480

6115584ec4554ed71f17effd3e0b6988?s=128

Altech

September 21, 2016
Tweet

Transcript

  1. BigQuery ଎शձ 2016/09/20 @ Wantedly

  2. ࣗݾ঺հ • ஛໺ ૑ฏ • Wantedly • 2015೥2݄- Πϯλʔϯ •

    2016೥4݄- ৽ଔೖࣾ • Ϣʔβʔάϩʔεɺ΢Σϒ • @Altech_2015
  3. Contents • Overview • Data Stores @Wantedly • How to

    Import • Query Language • Query Cost • Using Result
  4. Overview

  5. BigQuery • Google͕ࣾ಺Ͱ࢖͍ͬͯͨDremelͱ͍͏ΫΤϦαʔϏ εΛ֎෦αʔϏεͱͯ͠ެ։ͨ͠΋ͷ • Dremel : 2006೥~ • ΠϯλϥΫςΟϒͳղੳʹదͨ͠πʔϧ

    • େྔͷσʔλͰ΋਺ඵͰΫΤϦͷ݁Ռ͕ฦͬͯ͘Δ
  6. Q. େྔͷσʔλʹର͢ΔղੳΫΤϦΛ
 ΠϯλϥΫςΟϒʹฦ͢ʹ͸Ͳ͏͢Ε͹͍͍ʁ - - - What if a director

    suddenly asks, “Hey, can you give me yesterday’s number of impressions for AdWords display ads – but only in the Tokyo region?” https://cloud.google.com/files/BigQueryTechnicalWP.pdf
  7. OLAP - Online Analytical Processing • ैདྷͷιϦϡʔγϣϯ • ΠϯσοΫεΛͨ͘͞ΜషͬͨΓɺதؒσʔλΛू ܭ͓ͯ͘͠

    • ໰୊఺ɿΫΤϦͷ੾Γޱͷ໢ཏ͕େมɺଟ਺ͷΠϯ σοΫε͕ݩσʔλΑΓ΋େ͖͘ͳΔ͜ͱɺதؒσʔ λ͕εΩʔϚมߋͰյΕΔɺͳͲ… • BigQueryͷιϦϡʔγϣϯ • ࣄલܭࢉͤͣͦͷ৔ͰϑϧεΩϟϯ͢Δ
  8. ϑϧεΩϟϯ͢ΔͨΊͷٕज़ • ߦࢦ޲σʔλϕʔε • ௒ฒྻԽ • ਺ઍ୆Ͱಉ࣌ॲཧ͢Δ • ʢʴಉ࣌ॲཧͨ͠ͱ͖ʹͪΌΜͱૣ͘ͳΔͨΊͷ͢ ͍ٕ͝ज़ɾ͍͢͝Πϯϑϥʣ

  9. ྻࢦ޲σʔλϕʔε • Relational Database : ߦ͝ͱʹσʔλΛ֨ೲ • BigQuery : ྻ͝ͱʹσʔλΛ֨ೲ

    • Pros: • ѹॖޮ཰ • εΩϟϯޮ཰ • Cons: • ߋ৽͠ʹ͍͘ SELECT * ☓
  10. Data Stores

  11. BigQuery ಋೖલ • σʔλϕʔε : Postgres • σʔλ΢ΣΞϋ΢εɿTreasureData • σʔλՄࢹԽɿDOMO

  12. 2015-09-03

  13. BigQuery ಋೖޙ • σʔλϕʔε : Postgres • σʔλ΢ΣΞϋ΢εɿTreasureData … batch

    query • σʔλ΢ΣΞϋ΢εɿBigQuery … ad-hoc query • σʔλՄࢹԽɿDOMO
 
 
 
 … batch query ΋ BigQuery ʹҠߦ͢Δ༨஍͸͋Δɻ
  14. How to Import

  15. TreasureData to BigQuery Huntr::TreasureDataToBigQuery.export_table!(“log”, day: date) Huntr::TreasureDataToBigQuery::AUTO_EXPORT_TABLES SELECT time, controller,

    action, ... -- TD ͷεΩʔϚ৘ใ͔ΒΫΤϦΛੜ੒ FROM log WHERE TD_TIME_RANGE(time, '2016-09-20', '2016-09-21', 'JST') ORDER BY time Command:
  16. Postgres to BigQuery Huntr::RdbToBigQuery.export_table!(“projects”) Huntr::RdbToBigQuery::AUTO_EXPORT_TABLES Command:

  17. Partitioning • σʔλΛ෼͚͓͖ͯɺεΩϟϯ͢ΔൣғΛͬ͘͟Γߜ ΕΔΑ͏ʹ͢Δ • From TreasureData: daily partitioning •

    td.log20160921, td.log20160922, 
 td.log20160923, … • From Postgres: no partitioning • rdb.projects, rdb.companies, …
  18. Query Language

  19. Console : https://bq.wantedly.com

  20. Exercise: DAUΛग़͢ • td.log ςʔϒϧΛ࢖ͬͯग़͢ • %Y%m%d ͷ೔෇͕ suffix ʹͳΔ

    SELECT COUNT(DISTINCT user) FROM td.log20160922
  21. StandardSQL, LegacySQL • 2016೥9݄ݱࡏɺೋͭͷΫΤϦݴޠ͕࢖͑Δ • Legacy SQL : FROM۟ʹؔ਺͕ॻ͚ΔͳͲಠࣗͷSQL. •

    Standard SQL : 6݄ʹβϦϦʔε. SQL2011ʹ४ڌ. • ࣾ಺ਪ঑: StandardSQL • PostgresSQL ΋ඪ४ʹൺֱత४ڌ͍ͯ͠Δ • ؔ਺ͷΠϯλʔϑΣʔε͕ྑ͍ɺUDF͕ѻ͍΍͍͢
  22. https://cloud.google.com/bigquery/release-notes

  23. Exercise: MAUΛग़͢ • _TABLE_SUFFIX ͱ͍͏ΧϥϜʹςʔϒϧͷ೔෇෦෼͕ จࣈྻͰೖ͍ͬͯΔ • ϫΠϧυΧʔυͰෳ਺ςʔϒϧΛऔ͖ͬͯͯɺ _TABLE_SUFFIX ͰߜΓࠐΉ

    SELECT COUNT(DISTINCT user) FROM `td.log*` WHERE _TABLE_SUFFIX 
 BETWEEN ‘20160826’ AND ‘20160922’
  24. Data Types • INT64 (INTEGER) • FLOAT64 (FLOAT) • BOOL

    (BOOLEAN) • STRING • BYTES • DATE • TIMESTAMP • Array • Struct CAST("123" AS INT64) Type Casting:
  25. Timezone • TIMESTAMP('2016-9-14 +9’) • TIMESTAMP('2016-9-14 00:12:06 +9’) • TIMESTAMP('2016-9-14

    00:12:06 Asia/Tokyo’)
  26. Syntax, Functions, Operators • Syntax: WITH, PARTITION BY etc. •

    https://cloud.google.com/bigquery/sql-reference/ query-syntax • Functions: • https://cloud.google.com/bigquery/sql-reference/ functions-and-operators
  27. Example: ਖ਼نදݱͰ໊લ෦෼Λऔಘ WITH emails AS ( SELECT "james@gmail.com" AS email

    UNION ALL SELECT "bond@wantedly.com" AS email ) SELECT 
 REGEXP_EXTRACT(email, r"^[a-zA-Z0-9_.+-]+") 
 AS user_name FROM emails; https://cloud.google.com/bigquery/sql-reference/functions-and-operators#regexp_extract
  28. User Define Function • JSON_EXTRACT Λࣗલ࣮૷ͯ͠࢖͏ྫɿ • υΩϡϝϯτʹಋઢ͕ͳ͍ͷͰ࢓༷͸গ͠มΘΔ͔΋ CREATE TEMPORARY

    FUNCTION CUSTOM_JSON_EXTRACT(json STRING, json_path STRING) RETURNS STRING LANGUAGE js AS """ try { var parsed = JSON.parse(json); } catch (e) { return null } return eval(json_path.replace("$", "parsed")); “""; SELECT CUSTOM_JSON_EXTRACT(“{a: 1}”, “$.a”);
  29. Query Cost

  30. ྉۚ https://cloud.google.com/bigquery/pricing?hl=ja εΩϟϯྔ →

  31. Using Result

  32. with Excel • CSVμ΢ϯϩʔυ • BQ ͩͱ10000ߦ·Ͱ

  33. with Ruby • Ruby gems • google-api-client • Ұ൪ϓϦϛςΟϒͰɺgenerate ͞Εͨ΋ͷɻAPIͰ

    Ͱ͖Δ͜ͱ͸શͯͰ͖Δ͕ɺҰ෦൚༻త͗ͯ͢࢖ ͍ʹ͍͘ͱ͜Ζ΋ɻ • google-cloud • google-api-client ΛਓखͰϥοϓͨ͠΋ͷɻͰ͖ ͳ͍͜ͱ΋͋Δ͕ɺΠϯλʔϑΣʔε͕ྑ͍ɻ
  34. Example: δϣϒͷ݁ՌΛRubyͰ࢖͏ • APIΩʔͷઃఆ͕ඞཁ client = BQ.client job = client.job('job_BzUm359VIComAIBYqrjfVLiPnX8')

    job.query_results
  35. with DOMO • ίωΫλ͕͋ΔͷͰ௚઀ୟ͚Δ • AnalyticsDB͸ܦ༝͠ͳͯ͘ྑ͍

  36. with Service • ΦϑϥΠϯॲཧͳΒ௚઀ୟ͍ͯ଴ͭͷָ͕ • ΫΤϦΩϟογϡػೳΛ্ख͘࢖͏ͱίετΛ࡟ݮͰ ͖ΔʢΒ͍͠ʣ • ΦϯϥΠϯॲཧͰ࢖͏ͨΊʹͲ͏͍͏ߏ੒͕ྑ͍͔͸ ໛ࡧத

    BQ.client.query(“SELECT 1”, legacy_sql: false)
  37. End. • Overview • Data Stores @Wantedly • How to

    Import • Query Language • Query Cost • Using Result
  38. ࢀߟࢿྉ • An Inside Look at Google BigQuery • https://cloud.google.com/files/

    BigQueryTechnicalWP.pdf • SQL Reference | BigQuery Documentation • https://cloud.google.com/bigquery/sql-reference/
  39. Exercise: ϝʔϧ͔ΒԠԉͨ͠ϢʔβʔΛݟ͚ͭΔ • mail_log ςʔϒϧΛ༻͍ͯɺઌिҰिؒͰԠԉΛ͓ئ ͍͢Δϝʔϧܦ༝ͰߦΘΕͨԠԉΛݟ͚ͭΔ • mailer_name: "unsubscribable_user_mailer" •

    action_name: "suggest_support_mail" • mail_action: "clicked" • ϝʔϧܦ༝ͰͷԠԉʹΫϦοΫͨ͠ืूΛ10෼Ҏ಺ʹ Ԡԉ • mail_log ͱ project_supports ςʔϒϧΛར༻