BigQuery速習会 / Quick Learning about BigQuery

6115584ec4554ed71f17effd3e0b6988?s=47 Altech
September 21, 2016

BigQuery速習会 / Quick Learning about BigQuery

BigQuery速習会@Wantedly
http://wantedly.connpass.com/event/40480

6115584ec4554ed71f17effd3e0b6988?s=128

Altech

September 21, 2016
Tweet

Transcript

  1. 2.

    ࣗݾ঺հ • ஛໺ ૑ฏ • Wantedly • 2015೥2݄- Πϯλʔϯ •

    2016೥4݄- ৽ଔೖࣾ • Ϣʔβʔάϩʔεɺ΢Σϒ • @Altech_2015
  2. 3.

    Contents • Overview • Data Stores @Wantedly • How to

    Import • Query Language • Query Cost • Using Result
  3. 6.

    Q. େྔͷσʔλʹର͢ΔղੳΫΤϦΛ
 ΠϯλϥΫςΟϒʹฦ͢ʹ͸Ͳ͏͢Ε͹͍͍ʁ - - - What if a director

    suddenly asks, “Hey, can you give me yesterday’s number of impressions for AdWords display ads – but only in the Tokyo region?” https://cloud.google.com/files/BigQueryTechnicalWP.pdf
  4. 7.

    OLAP - Online Analytical Processing • ैདྷͷιϦϡʔγϣϯ • ΠϯσοΫεΛͨ͘͞ΜషͬͨΓɺதؒσʔλΛू ܭ͓ͯ͘͠

    • ໰୊఺ɿΫΤϦͷ੾Γޱͷ໢ཏ͕େมɺଟ਺ͷΠϯ σοΫε͕ݩσʔλΑΓ΋େ͖͘ͳΔ͜ͱɺதؒσʔ λ͕εΩʔϚมߋͰյΕΔɺͳͲ… • BigQueryͷιϦϡʔγϣϯ • ࣄલܭࢉͤͣͦͷ৔ͰϑϧεΩϟϯ͢Δ
  5. 9.

    ྻࢦ޲σʔλϕʔε • Relational Database : ߦ͝ͱʹσʔλΛ֨ೲ • BigQuery : ྻ͝ͱʹσʔλΛ֨ೲ

    • Pros: • ѹॖޮ཰ • εΩϟϯޮ཰ • Cons: • ߋ৽͠ʹ͍͘ SELECT * ☓
  6. 13.

    BigQuery ಋೖޙ • σʔλϕʔε : Postgres • σʔλ΢ΣΞϋ΢εɿTreasureData … batch

    query • σʔλ΢ΣΞϋ΢εɿBigQuery … ad-hoc query • σʔλՄࢹԽɿDOMO
 
 
 
 … batch query ΋ BigQuery ʹҠߦ͢Δ༨஍͸͋Δɻ
  7. 15.

    TreasureData to BigQuery Huntr::TreasureDataToBigQuery.export_table!(“log”, day: date) Huntr::TreasureDataToBigQuery::AUTO_EXPORT_TABLES SELECT time, controller,

    action, ... -- TD ͷεΩʔϚ৘ใ͔ΒΫΤϦΛੜ੒ FROM log WHERE TD_TIME_RANGE(time, '2016-09-20', '2016-09-21', 'JST') ORDER BY time Command:
  8. 17.

    Partitioning • σʔλΛ෼͚͓͖ͯɺεΩϟϯ͢ΔൣғΛͬ͘͟Γߜ ΕΔΑ͏ʹ͢Δ • From TreasureData: daily partitioning •

    td.log20160921, td.log20160922, 
 td.log20160923, … • From Postgres: no partitioning • rdb.projects, rdb.companies, …
  9. 21.

    StandardSQL, LegacySQL • 2016೥9݄ݱࡏɺೋͭͷΫΤϦݴޠ͕࢖͑Δ • Legacy SQL : FROM۟ʹؔ਺͕ॻ͚ΔͳͲಠࣗͷSQL. •

    Standard SQL : 6݄ʹβϦϦʔε. SQL2011ʹ४ڌ. • ࣾ಺ਪ঑: StandardSQL • PostgresSQL ΋ඪ४ʹൺֱత४ڌ͍ͯ͠Δ • ؔ਺ͷΠϯλʔϑΣʔε͕ྑ͍ɺUDF͕ѻ͍΍͍͢
  10. 24.

    Data Types • INT64 (INTEGER) • FLOAT64 (FLOAT) • BOOL

    (BOOLEAN) • STRING • BYTES • DATE • TIMESTAMP • Array • Struct CAST("123" AS INT64) Type Casting:
  11. 26.

    Syntax, Functions, Operators • Syntax: WITH, PARTITION BY etc. •

    https://cloud.google.com/bigquery/sql-reference/ query-syntax • Functions: • https://cloud.google.com/bigquery/sql-reference/ functions-and-operators
  12. 27.

    Example: ਖ਼نදݱͰ໊લ෦෼Λऔಘ WITH emails AS ( SELECT "james@gmail.com" AS email

    UNION ALL SELECT "bond@wantedly.com" AS email ) SELECT 
 REGEXP_EXTRACT(email, r"^[a-zA-Z0-9_.+-]+") 
 AS user_name FROM emails; https://cloud.google.com/bigquery/sql-reference/functions-and-operators#regexp_extract
  13. 28.

    User Define Function • JSON_EXTRACT Λࣗલ࣮૷ͯ͠࢖͏ྫɿ • υΩϡϝϯτʹಋઢ͕ͳ͍ͷͰ࢓༷͸গ͠มΘΔ͔΋ CREATE TEMPORARY

    FUNCTION CUSTOM_JSON_EXTRACT(json STRING, json_path STRING) RETURNS STRING LANGUAGE js AS """ try { var parsed = JSON.parse(json); } catch (e) { return null } return eval(json_path.replace("$", "parsed")); “""; SELECT CUSTOM_JSON_EXTRACT(“{a: 1}”, “$.a”);
  14. 33.

    with Ruby • Ruby gems • google-api-client • Ұ൪ϓϦϛςΟϒͰɺgenerate ͞Εͨ΋ͷɻAPIͰ

    Ͱ͖Δ͜ͱ͸શͯͰ͖Δ͕ɺҰ෦൚༻త͗ͯ͢࢖ ͍ʹ͍͘ͱ͜Ζ΋ɻ • google-cloud • google-api-client ΛਓखͰϥοϓͨ͠΋ͷɻͰ͖ ͳ͍͜ͱ΋͋Δ͕ɺΠϯλʔϑΣʔε͕ྑ͍ɻ
  15. 37.

    End. • Overview • Data Stores @Wantedly • How to

    Import • Query Language • Query Cost • Using Result
  16. 38.

    ࢀߟࢿྉ • An Inside Look at Google BigQuery • https://cloud.google.com/files/

    BigQueryTechnicalWP.pdf • SQL Reference | BigQuery Documentation • https://cloud.google.com/bigquery/sql-reference/
  17. 39.

    Exercise: ϝʔϧ͔ΒԠԉͨ͠ϢʔβʔΛݟ͚ͭΔ • mail_log ςʔϒϧΛ༻͍ͯɺઌिҰिؒͰԠԉΛ͓ئ ͍͢Δϝʔϧܦ༝ͰߦΘΕͨԠԉΛݟ͚ͭΔ • mailer_name: "unsubscribable_user_mailer" •

    action_name: "suggest_support_mail" • mail_action: "clicked" • ϝʔϧܦ༝ͰͷԠԉʹΫϦοΫͨ͠ืूΛ10෼Ҏ಺ʹ Ԡԉ • mail_log ͱ project_supports ςʔϒϧΛར༻