Upgrade to Pro — share decks privately, control downloads, hide ads and more …

BigQuery速習会 / Quick Learning about BigQuery

Altech
September 21, 2016

BigQuery速習会 / Quick Learning about BigQuery

BigQuery速習会@Wantedly
http://wantedly.connpass.com/event/40480

Altech

September 21, 2016
Tweet

More Decks by Altech

Other Decks in Technology

Transcript

  1. BigQuery ଎शձ
    2016/09/20 @ Wantedly

    View Slide

  2. ࣗݾ঺հ
    • ஛໺ ૑ฏ
    • Wantedly
    • 2015೥2݄- Πϯλʔϯ
    • 2016೥4݄- ৽ଔೖࣾ
    • Ϣʔβʔάϩʔεɺ΢Σϒ
    • @Altech_2015

    View Slide

  3. Contents
    • Overview
    • Data Stores @Wantedly
    • How to Import
    • Query Language
    • Query Cost
    • Using Result

    View Slide

  4. Overview

    View Slide

  5. BigQuery
    • Google͕ࣾ಺Ͱ࢖͍ͬͯͨDremelͱ͍͏ΫΤϦαʔϏ
    εΛ֎෦αʔϏεͱͯ͠ެ։ͨ͠΋ͷ
    • Dremel : 2006೥~
    • ΠϯλϥΫςΟϒͳղੳʹదͨ͠πʔϧ
    • େྔͷσʔλͰ΋਺ඵͰΫΤϦͷ݁Ռ͕ฦͬͯ͘Δ

    View Slide

  6. Q. େྔͷσʔλʹର͢ΔղੳΫΤϦΛ

    ΠϯλϥΫςΟϒʹฦ͢ʹ͸Ͳ͏͢Ε͹͍͍ʁ
    - - -
    What if a director suddenly asks, “Hey, can
    you give me yesterday’s number of
    impressions for AdWords display ads – but
    only in the Tokyo region?”
    https://cloud.google.com/files/BigQueryTechnicalWP.pdf

    View Slide

  7. OLAP - Online Analytical Processing
    • ैདྷͷιϦϡʔγϣϯ
    • ΠϯσοΫεΛͨ͘͞ΜషͬͨΓɺதؒσʔλΛू
    ܭ͓ͯ͘͠
    • ໰୊఺ɿΫΤϦͷ੾Γޱͷ໢ཏ͕େมɺଟ਺ͷΠϯ
    σοΫε͕ݩσʔλΑΓ΋େ͖͘ͳΔ͜ͱɺதؒσʔ
    λ͕εΩʔϚมߋͰյΕΔɺͳͲ…
    • BigQueryͷιϦϡʔγϣϯ
    • ࣄલܭࢉͤͣͦͷ৔ͰϑϧεΩϟϯ͢Δ

    View Slide

  8. ϑϧεΩϟϯ͢ΔͨΊͷٕज़
    • ߦࢦ޲σʔλϕʔε
    • ௒ฒྻԽ
    • ਺ઍ୆Ͱಉ࣌ॲཧ͢Δ
    • ʢʴಉ࣌ॲཧͨ͠ͱ͖ʹͪΌΜͱૣ͘ͳΔͨΊͷ͢
    ͍ٕ͝ज़ɾ͍͢͝Πϯϑϥʣ

    View Slide

  9. ྻࢦ޲σʔλϕʔε
    • Relational Database : ߦ͝ͱʹσʔλΛ֨ೲ
    • BigQuery : ྻ͝ͱʹσʔλΛ֨ೲ
    • Pros:
    • ѹॖޮ཰
    • εΩϟϯޮ཰
    • Cons:
    • ߋ৽͠ʹ͍͘
    SELECT *

    View Slide

  10. Data Stores

    View Slide

  11. BigQuery ಋೖલ
    • σʔλϕʔε : Postgres
    • σʔλ΢ΣΞϋ΢εɿTreasureData
    • σʔλՄࢹԽɿDOMO

    View Slide

  12. 2015-09-03

    View Slide

  13. BigQuery ಋೖޙ
    • σʔλϕʔε : Postgres
    • σʔλ΢ΣΞϋ΢εɿTreasureData … batch query
    • σʔλ΢ΣΞϋ΢εɿBigQuery … ad-hoc query
    • σʔλՄࢹԽɿDOMO




    … batch query ΋ BigQuery ʹҠߦ͢Δ༨஍͸͋Δɻ

    View Slide

  14. How to Import

    View Slide

  15. TreasureData to BigQuery
    Huntr::TreasureDataToBigQuery.export_table!(“log”, day: date)
    Huntr::TreasureDataToBigQuery::AUTO_EXPORT_TABLES
    SELECT
    time, controller, action, ... -- TD ͷεΩʔϚ৘ใ͔ΒΫΤϦΛੜ੒
    FROM log
    WHERE TD_TIME_RANGE(time, '2016-09-20', '2016-09-21', 'JST')
    ORDER BY time
    Command:

    View Slide

  16. Postgres to BigQuery
    Huntr::RdbToBigQuery.export_table!(“projects”)
    Huntr::RdbToBigQuery::AUTO_EXPORT_TABLES
    Command:

    View Slide

  17. Partitioning
    • σʔλΛ෼͚͓͖ͯɺεΩϟϯ͢ΔൣғΛͬ͘͟Γߜ
    ΕΔΑ͏ʹ͢Δ
    • From TreasureData: daily partitioning
    • td.log20160921, td.log20160922, 

    td.log20160923, …
    • From Postgres: no partitioning
    • rdb.projects, rdb.companies, …

    View Slide

  18. Query Language

    View Slide

  19. Console : https://bq.wantedly.com

    View Slide

  20. Exercise: DAUΛग़͢
    • td.log ςʔϒϧΛ࢖ͬͯग़͢
    • %Y%m%d ͷ೔෇͕ suffix ʹͳΔ
    SELECT COUNT(DISTINCT user) FROM td.log20160922

    View Slide

  21. StandardSQL, LegacySQL
    • 2016೥9݄ݱࡏɺೋͭͷΫΤϦݴޠ͕࢖͑Δ
    • Legacy SQL : FROM۟ʹؔ਺͕ॻ͚ΔͳͲಠࣗͷSQL.
    • Standard SQL : 6݄ʹβϦϦʔε. SQL2011ʹ४ڌ.
    • ࣾ಺ਪ঑: StandardSQL
    • PostgresSQL ΋ඪ४ʹൺֱత४ڌ͍ͯ͠Δ
    • ؔ਺ͷΠϯλʔϑΣʔε͕ྑ͍ɺUDF͕ѻ͍΍͍͢

    View Slide

  22. https://cloud.google.com/bigquery/release-notes

    View Slide

  23. Exercise: MAUΛग़͢
    • _TABLE_SUFFIX ͱ͍͏ΧϥϜʹςʔϒϧͷ೔෇෦෼͕
    จࣈྻͰೖ͍ͬͯΔ
    • ϫΠϧυΧʔυͰෳ਺ςʔϒϧΛऔ͖ͬͯͯɺ
    _TABLE_SUFFIX ͰߜΓࠐΉ
    SELECT COUNT(DISTINCT user) FROM `td.log*`
    WHERE
    _TABLE_SUFFIX 

    BETWEEN ‘20160826’ AND ‘20160922’

    View Slide

  24. Data Types
    • INT64 (INTEGER)
    • FLOAT64 (FLOAT)
    • BOOL (BOOLEAN)
    • STRING
    • BYTES
    • DATE
    • TIMESTAMP
    • Array
    • Struct CAST("123" AS INT64)
    Type Casting:

    View Slide

  25. Timezone
    • TIMESTAMP('2016-9-14 +9’)
    • TIMESTAMP('2016-9-14 00:12:06 +9’)
    • TIMESTAMP('2016-9-14 00:12:06 Asia/Tokyo’)

    View Slide

  26. Syntax, Functions, Operators
    • Syntax: WITH, PARTITION BY etc.
    • https://cloud.google.com/bigquery/sql-reference/
    query-syntax
    • Functions:
    • https://cloud.google.com/bigquery/sql-reference/
    functions-and-operators

    View Slide

  27. Example: ਖ਼نදݱͰ໊લ෦෼Λऔಘ
    WITH emails AS (
    SELECT "[email protected]" AS email
    UNION ALL
    SELECT "[email protected]" AS email
    )
    SELECT 

    REGEXP_EXTRACT(email, r"^[a-zA-Z0-9_.+-]+") 

    AS user_name
    FROM emails;
    https://cloud.google.com/bigquery/sql-reference/functions-and-operators#regexp_extract

    View Slide

  28. User Define Function
    • JSON_EXTRACT Λࣗલ࣮૷ͯ͠࢖͏ྫɿ
    • υΩϡϝϯτʹಋઢ͕ͳ͍ͷͰ࢓༷͸গ͠มΘΔ͔΋
    CREATE TEMPORARY FUNCTION CUSTOM_JSON_EXTRACT(json STRING,
    json_path STRING)
    RETURNS STRING
    LANGUAGE js AS """
    try { var parsed = JSON.parse(json);
    } catch (e) { return null }
    return eval(json_path.replace("$", "parsed"));
    “"";
    SELECT CUSTOM_JSON_EXTRACT(“{a: 1}”, “$.a”);

    View Slide

  29. Query Cost

    View Slide

  30. ྉۚ
    https://cloud.google.com/bigquery/pricing?hl=ja
    εΩϟϯྔ →

    View Slide

  31. Using Result

    View Slide

  32. with Excel
    • CSVμ΢ϯϩʔυ
    • BQ ͩͱ10000ߦ·Ͱ

    View Slide

  33. with Ruby
    • Ruby gems
    • google-api-client
    • Ұ൪ϓϦϛςΟϒͰɺgenerate ͞Εͨ΋ͷɻAPIͰ
    Ͱ͖Δ͜ͱ͸શͯͰ͖Δ͕ɺҰ෦൚༻త͗ͯ͢࢖
    ͍ʹ͍͘ͱ͜Ζ΋ɻ
    • google-cloud
    • google-api-client ΛਓखͰϥοϓͨ͠΋ͷɻͰ͖
    ͳ͍͜ͱ΋͋Δ͕ɺΠϯλʔϑΣʔε͕ྑ͍ɻ

    View Slide

  34. Example: δϣϒͷ݁ՌΛRubyͰ࢖͏
    • APIΩʔͷઃఆ͕ඞཁ
    client = BQ.client
    job = client.job('job_BzUm359VIComAIBYqrjfVLiPnX8')
    job.query_results

    View Slide

  35. with DOMO
    • ίωΫλ͕͋ΔͷͰ௚઀ୟ͚Δ
    • AnalyticsDB͸ܦ༝͠ͳͯ͘ྑ͍

    View Slide

  36. with Service
    • ΦϑϥΠϯॲཧͳΒ௚઀ୟ͍ͯ଴ͭͷָ͕
    • ΫΤϦΩϟογϡػೳΛ্ख͘࢖͏ͱίετΛ࡟ݮͰ
    ͖ΔʢΒ͍͠ʣ
    • ΦϯϥΠϯॲཧͰ࢖͏ͨΊʹͲ͏͍͏ߏ੒͕ྑ͍͔͸
    ໛ࡧத
    BQ.client.query(“SELECT 1”, legacy_sql: false)

    View Slide

  37. End.
    • Overview
    • Data Stores @Wantedly
    • How to Import
    • Query Language
    • Query Cost
    • Using Result

    View Slide

  38. ࢀߟࢿྉ
    • An Inside Look at Google BigQuery
    • https://cloud.google.com/files/
    BigQueryTechnicalWP.pdf
    • SQL Reference | BigQuery Documentation
    • https://cloud.google.com/bigquery/sql-reference/

    View Slide

  39. Exercise: ϝʔϧ͔ΒԠԉͨ͠ϢʔβʔΛݟ͚ͭΔ
    • mail_log ςʔϒϧΛ༻͍ͯɺઌिҰिؒͰԠԉΛ͓ئ
    ͍͢Δϝʔϧܦ༝ͰߦΘΕͨԠԉΛݟ͚ͭΔ
    • mailer_name: "unsubscribable_user_mailer"
    • action_name: "suggest_support_mail"
    • mail_action: "clicked"
    • ϝʔϧܦ༝ͰͷԠԉʹΫϦοΫͨ͠ืूΛ10෼Ҏ಺ʹ
    Ԡԉ
    • mail_log ͱ project_supports ςʔϒϧΛར༻

    View Slide