Slide 1

Slide 1 text

BigQuery ଎शձ 2016/09/20 @ Wantedly

Slide 2

Slide 2 text

ࣗݾ঺հ • ஛໺ ૑ฏ • Wantedly • 2015೥2݄- Πϯλʔϯ • 2016೥4݄- ৽ଔೖࣾ • Ϣʔβʔάϩʔεɺ΢Σϒ • @Altech_2015

Slide 3

Slide 3 text

Contents • Overview • Data Stores @Wantedly • How to Import • Query Language • Query Cost • Using Result

Slide 4

Slide 4 text

Overview

Slide 5

Slide 5 text

BigQuery • Google͕ࣾ಺Ͱ࢖͍ͬͯͨDremelͱ͍͏ΫΤϦαʔϏ εΛ֎෦αʔϏεͱͯ͠ެ։ͨ͠΋ͷ • Dremel : 2006೥~ • ΠϯλϥΫςΟϒͳղੳʹదͨ͠πʔϧ • େྔͷσʔλͰ΋਺ඵͰΫΤϦͷ݁Ռ͕ฦͬͯ͘Δ

Slide 6

Slide 6 text

Q. େྔͷσʔλʹର͢ΔղੳΫΤϦΛ
 ΠϯλϥΫςΟϒʹฦ͢ʹ͸Ͳ͏͢Ε͹͍͍ʁ - - - What if a director suddenly asks, “Hey, can you give me yesterday’s number of impressions for AdWords display ads – but only in the Tokyo region?” https://cloud.google.com/files/BigQueryTechnicalWP.pdf

Slide 7

Slide 7 text

OLAP - Online Analytical Processing • ैདྷͷιϦϡʔγϣϯ • ΠϯσοΫεΛͨ͘͞ΜషͬͨΓɺதؒσʔλΛू ܭ͓ͯ͘͠ • ໰୊఺ɿΫΤϦͷ੾Γޱͷ໢ཏ͕େมɺଟ਺ͷΠϯ σοΫε͕ݩσʔλΑΓ΋େ͖͘ͳΔ͜ͱɺதؒσʔ λ͕εΩʔϚมߋͰյΕΔɺͳͲ… • BigQueryͷιϦϡʔγϣϯ • ࣄલܭࢉͤͣͦͷ৔ͰϑϧεΩϟϯ͢Δ

Slide 8

Slide 8 text

ϑϧεΩϟϯ͢ΔͨΊͷٕज़ • ߦࢦ޲σʔλϕʔε • ௒ฒྻԽ • ਺ઍ୆Ͱಉ࣌ॲཧ͢Δ • ʢʴಉ࣌ॲཧͨ͠ͱ͖ʹͪΌΜͱૣ͘ͳΔͨΊͷ͢ ͍ٕ͝ज़ɾ͍͢͝Πϯϑϥʣ

Slide 9

Slide 9 text

ྻࢦ޲σʔλϕʔε • Relational Database : ߦ͝ͱʹσʔλΛ֨ೲ • BigQuery : ྻ͝ͱʹσʔλΛ֨ೲ • Pros: • ѹॖޮ཰ • εΩϟϯޮ཰ • Cons: • ߋ৽͠ʹ͍͘ SELECT * ☓

Slide 10

Slide 10 text

Data Stores

Slide 11

Slide 11 text

BigQuery ಋೖલ • σʔλϕʔε : Postgres • σʔλ΢ΣΞϋ΢εɿTreasureData • σʔλՄࢹԽɿDOMO

Slide 12

Slide 12 text

2015-09-03

Slide 13

Slide 13 text

BigQuery ಋೖޙ • σʔλϕʔε : Postgres • σʔλ΢ΣΞϋ΢εɿTreasureData … batch query • σʔλ΢ΣΞϋ΢εɿBigQuery … ad-hoc query • σʔλՄࢹԽɿDOMO
 
 
 
 … batch query ΋ BigQuery ʹҠߦ͢Δ༨஍͸͋Δɻ

Slide 14

Slide 14 text

How to Import

Slide 15

Slide 15 text

TreasureData to BigQuery Huntr::TreasureDataToBigQuery.export_table!(“log”, day: date) Huntr::TreasureDataToBigQuery::AUTO_EXPORT_TABLES SELECT time, controller, action, ... -- TD ͷεΩʔϚ৘ใ͔ΒΫΤϦΛੜ੒ FROM log WHERE TD_TIME_RANGE(time, '2016-09-20', '2016-09-21', 'JST') ORDER BY time Command:

Slide 16

Slide 16 text

Postgres to BigQuery Huntr::RdbToBigQuery.export_table!(“projects”) Huntr::RdbToBigQuery::AUTO_EXPORT_TABLES Command:

Slide 17

Slide 17 text

Partitioning • σʔλΛ෼͚͓͖ͯɺεΩϟϯ͢ΔൣғΛͬ͘͟Γߜ ΕΔΑ͏ʹ͢Δ • From TreasureData: daily partitioning • td.log20160921, td.log20160922, 
 td.log20160923, … • From Postgres: no partitioning • rdb.projects, rdb.companies, …

Slide 18

Slide 18 text

Query Language

Slide 19

Slide 19 text

Console : https://bq.wantedly.com

Slide 20

Slide 20 text

Exercise: DAUΛग़͢ • td.log ςʔϒϧΛ࢖ͬͯग़͢ • %Y%m%d ͷ೔෇͕ suffix ʹͳΔ SELECT COUNT(DISTINCT user) FROM td.log20160922

Slide 21

Slide 21 text

StandardSQL, LegacySQL • 2016೥9݄ݱࡏɺೋͭͷΫΤϦݴޠ͕࢖͑Δ • Legacy SQL : FROM۟ʹؔ਺͕ॻ͚ΔͳͲಠࣗͷSQL. • Standard SQL : 6݄ʹβϦϦʔε. SQL2011ʹ४ڌ. • ࣾ಺ਪ঑: StandardSQL • PostgresSQL ΋ඪ४ʹൺֱత४ڌ͍ͯ͠Δ • ؔ਺ͷΠϯλʔϑΣʔε͕ྑ͍ɺUDF͕ѻ͍΍͍͢

Slide 22

Slide 22 text

https://cloud.google.com/bigquery/release-notes

Slide 23

Slide 23 text

Exercise: MAUΛग़͢ • _TABLE_SUFFIX ͱ͍͏ΧϥϜʹςʔϒϧͷ೔෇෦෼͕ จࣈྻͰೖ͍ͬͯΔ • ϫΠϧυΧʔυͰෳ਺ςʔϒϧΛऔ͖ͬͯͯɺ _TABLE_SUFFIX ͰߜΓࠐΉ SELECT COUNT(DISTINCT user) FROM `td.log*` WHERE _TABLE_SUFFIX 
 BETWEEN ‘20160826’ AND ‘20160922’

Slide 24

Slide 24 text

Data Types • INT64 (INTEGER) • FLOAT64 (FLOAT) • BOOL (BOOLEAN) • STRING • BYTES • DATE • TIMESTAMP • Array • Struct CAST("123" AS INT64) Type Casting:

Slide 25

Slide 25 text

Timezone • TIMESTAMP('2016-9-14 +9’) • TIMESTAMP('2016-9-14 00:12:06 +9’) • TIMESTAMP('2016-9-14 00:12:06 Asia/Tokyo’)

Slide 26

Slide 26 text

Syntax, Functions, Operators • Syntax: WITH, PARTITION BY etc. • https://cloud.google.com/bigquery/sql-reference/ query-syntax • Functions: • https://cloud.google.com/bigquery/sql-reference/ functions-and-operators

Slide 27

Slide 27 text

Example: ਖ਼نදݱͰ໊લ෦෼Λऔಘ WITH emails AS ( SELECT "[email protected]" AS email UNION ALL SELECT "[email protected]" AS email ) SELECT 
 REGEXP_EXTRACT(email, r"^[a-zA-Z0-9_.+-]+") 
 AS user_name FROM emails; https://cloud.google.com/bigquery/sql-reference/functions-and-operators#regexp_extract

Slide 28

Slide 28 text

User Define Function • JSON_EXTRACT Λࣗલ࣮૷ͯ͠࢖͏ྫɿ • υΩϡϝϯτʹಋઢ͕ͳ͍ͷͰ࢓༷͸গ͠มΘΔ͔΋ CREATE TEMPORARY FUNCTION CUSTOM_JSON_EXTRACT(json STRING, json_path STRING) RETURNS STRING LANGUAGE js AS """ try { var parsed = JSON.parse(json); } catch (e) { return null } return eval(json_path.replace("$", "parsed")); “""; SELECT CUSTOM_JSON_EXTRACT(“{a: 1}”, “$.a”);

Slide 29

Slide 29 text

Query Cost

Slide 30

Slide 30 text

ྉۚ https://cloud.google.com/bigquery/pricing?hl=ja εΩϟϯྔ →

Slide 31

Slide 31 text

Using Result

Slide 32

Slide 32 text

with Excel • CSVμ΢ϯϩʔυ • BQ ͩͱ10000ߦ·Ͱ

Slide 33

Slide 33 text

with Ruby • Ruby gems • google-api-client • Ұ൪ϓϦϛςΟϒͰɺgenerate ͞Εͨ΋ͷɻAPIͰ Ͱ͖Δ͜ͱ͸શͯͰ͖Δ͕ɺҰ෦൚༻త͗ͯ͢࢖ ͍ʹ͍͘ͱ͜Ζ΋ɻ • google-cloud • google-api-client ΛਓखͰϥοϓͨ͠΋ͷɻͰ͖ ͳ͍͜ͱ΋͋Δ͕ɺΠϯλʔϑΣʔε͕ྑ͍ɻ

Slide 34

Slide 34 text

Example: δϣϒͷ݁ՌΛRubyͰ࢖͏ • APIΩʔͷઃఆ͕ඞཁ client = BQ.client job = client.job('job_BzUm359VIComAIBYqrjfVLiPnX8') job.query_results

Slide 35

Slide 35 text

with DOMO • ίωΫλ͕͋ΔͷͰ௚઀ୟ͚Δ • AnalyticsDB͸ܦ༝͠ͳͯ͘ྑ͍

Slide 36

Slide 36 text

with Service • ΦϑϥΠϯॲཧͳΒ௚઀ୟ͍ͯ଴ͭͷָ͕ • ΫΤϦΩϟογϡػೳΛ্ख͘࢖͏ͱίετΛ࡟ݮͰ ͖ΔʢΒ͍͠ʣ • ΦϯϥΠϯॲཧͰ࢖͏ͨΊʹͲ͏͍͏ߏ੒͕ྑ͍͔͸ ໛ࡧத BQ.client.query(“SELECT 1”, legacy_sql: false)

Slide 37

Slide 37 text

End. • Overview • Data Stores @Wantedly • How to Import • Query Language • Query Cost • Using Result

Slide 38

Slide 38 text

ࢀߟࢿྉ • An Inside Look at Google BigQuery • https://cloud.google.com/files/ BigQueryTechnicalWP.pdf • SQL Reference | BigQuery Documentation • https://cloud.google.com/bigquery/sql-reference/

Slide 39

Slide 39 text

Exercise: ϝʔϧ͔ΒԠԉͨ͠ϢʔβʔΛݟ͚ͭΔ • mail_log ςʔϒϧΛ༻͍ͯɺઌिҰिؒͰԠԉΛ͓ئ ͍͢Δϝʔϧܦ༝ͰߦΘΕͨԠԉΛݟ͚ͭΔ • mailer_name: "unsubscribable_user_mailer" • action_name: "suggest_support_mail" • mail_action: "clicked" • ϝʔϧܦ༝ͰͷԠԉʹΫϦοΫͨ͠ืूΛ10෼Ҏ಺ʹ Ԡԉ • mail_log ͱ project_supports ςʔϒϧΛར༻