BigQuery and the world after MapReduce

by Kazunori Sato

Slide 1

Slide 1 text

BigQuery and the world after MapReduce Handling Big Data in the Googley way

Slide 2

Slide 2 text

+Kazunori Sato @kazunori_279 Solutions Architect, Cloud Platform GBU, Google Inc - GCP solutions design - Professional services for GCP - Docker/GCP meetups support

Slide 3

Slide 3 text

“Big Data” at Google: How do we handle it

Slide 4

Slide 4 text

At Google, we have “big” big data everywhere What if a Googler is asked: “Can you give me the list of top 20 Android apps installed in 2012?”

Slide 5

Slide 5 text

At Google, we don’t use MapReduce for this We use Dremel = Google BigQuery SELECT top(appId, 20) AS app, count(*) AS count FROM installlog.2012 ORDER BY count DESC It scans 68B rows in ~20 sec, No index used.

Slide 6

Slide 6 text

Demo

Slide 7

Slide 7 text

BigQuery is low cost Storage: $0.026 per GB per month Queries: $5 per TB

Slide 8

Slide 8 text

Why is BigQuery so fast?

Slide 9

Slide 9 text

Column Oriented Storage Record Oriented Storage Column Oriented Storage Less bandwidth, More compression

Slide 10

Slide 10 text

Inside BQ: ColumnIO on Colossus Colossus File System - The next generation Google File System Chunk Servers - ColumnIO on CFS - Replicated to multiple DCs asynchronously - Mitigates Tail Latency, provides 99.9% availability

Slide 11

Slide 11 text

select top(title), count(*) from publicdata:samples.wikipedia Massively Parallel Processing Scanning 1 TB in 1 sec takes 5,000 disks Each query runs on thousands of servers

Slide 12

Slide 12 text

Fast aggregation by tree structure Mixer 0 Mixer 1 Mixer 1 Shard Shard Shard Shard ColumnIO on Colossus SELECT state, year COUNT(*) GROUP BY state WHERE year >= 1980 and year < 1990 ORDER BY count_babies DESC LIMIT 10 COUNT(*) GROUP BY state

Slide 13

Slide 13 text

Inside BQ: Query Processing 1. Mixer receives a query - Interprets the query - Maps tables to CFS files, requests to Shards 2. Shards process the partial queries - Open the ColumnIO files - Reads the table, applies WHERE, GROUP BY 3. Mixer aggregates the results - Applies ORDER BY, LIMIT

Slide 14

Slide 14 text

Inside BQ: Small JOIN SELECT wiki.title FROM [publicdata:samples.wikipedia] AS wiki JOIN ( SELECT word FROM [publicdata:samples.shakespeare] GROUP BY word ) AS shakes ON wiki.title = shakes.word;

Slide 15

Slide 15 text

Small JOIN: executed with Broadcast JOIN - One table should be < 8MB, sent to every shard From: Google BigQuery Analytics Inside BQ: Small JOIN

Slide 16

Slide 16 text

Inside BQ: Big JOIN SELECT wiki.title FROM [publicdata:samples.wikipedia] AS wiki JOIN EACH ( SELECT word FROM [publicdata:samples.shakespeare] GROUP EACH BY word ) AS shakes ON wiki.title = shakes.word;

Slide 17

Slide 17 text

Inside BQ: Big JOIN Big JOIN: executed with shuffling - Both tables can be > 8MB - BQ shuffler doesn’t sort; just hash partitioning From: Google BigQuery Analytics

Slide 18

Slide 18 text

Demo

Slide 19

Slide 19 text

Inside BQ: Big JOIN Big JOIN performance - JOIN 608M records x 38M records → 60 - 80 sec SELECT COUNT(*) FROM [bigquery-samples:wikimedia_pageviews. 200801] as a JOIN EACH [bigquery-samples:wikimedia_pageviews.200802] as b ON a.title = b.title WHERE b.language = "ja" From: Google BigQuery Analytics

Slide 20

Slide 20 text

BigQuery + Hadoop

Slide 21

Slide 21 text

HDFS (optional) Work Nodes Work Nodes Master Node Work Nodes HDFS (optional) Name Node (optional) Local SSD PD SSD PD standard GCS Connector BigQuery Connector Datastore Connector Connectors bdutil orchestration Connectors for Hadoop

Slide 22

Slide 22 text

BigQuery Connector for Hadoop // Set the job-level projectId. conf.set(BigQueryConfiguration.PROJECT_ID_KEY, projectId); // Configure input parameters. BigQueryConfiguration.configureBigQueryInput(conf, ”publicdata:samples.shakespeare”); // Set InputFormat. job.setInputFormatClass(BigQueryInputFormat.class);

Slide 23

Slide 23 text

BigQuery Connector for Hadoop // Configure output parameters String outSchema = "[{'name': 'Word','type': 'STRING'},{'name': 'Number','type': 'INTEGER'}]"; BigQueryConfiguration.configureBigQueryOutput( conf, ”gcp_samples.output”, outSchema); // Set OutputFormat. job.setOutputFormatClass(BigQueryOutputFormat.class);

Slide 24

Slide 24 text

Demo

Slide 25

Slide 25 text

BigQuery + Fluentd

Slide 26

Slide 26 text

BigQuery Streaming Low cost: $0.01 per 100,000 rows Real time availability of data 100,000 rows per second x tables

Slide 27

Slide 27 text

Slideshare uses Fluentd for collecting logs from >500 servers. "We take full advantage of its extendable plugin architecture and use it as a message bus that collects data from hundreds of servers into multiple backend systems." Sylvain Kalache, Operations Engineer

Slide 28

Slide 28 text

Why Fluentd? Because it’s super easy to use, and has extensive plugins written by active community.

Slide 29

Slide 29 text

Now Fluentd logs can be imported to BigQuery really easy, ~1M rows/s

Slide 30

Slide 30 text

Demo

Slide 31

Slide 31 text

BigQuery + User Defined Function

Slide 32

Slide 32 text

User Defined Function with JavaScript BigQuery User Defined Function by JavaScript: - takes input JSON as arguments - emits output JSON as results JS in SQL - can be written in place of tables/subqueries Coming Soon!

Slide 33

Slide 33 text

for (var i = 0; i < row.hits.length - 1; i++) { if (row.hits[i].page.pagepath.indexOf('github_nested') != -1 && row.hits[i + 1].page.pagepath.indexOf('github_timeline') != -1) { result.push({ visitid: row.visitid, duration: row.hits[i + 1].time - row.hits[i].time }); } } User Defined Function with JavaScript

Slide 34

Slide 34 text

SELECT actor, repository_name, page_name FROM js( -- Base query. (SELECT actor, payload.pages.page_name, repository.name, type FROM [publicdata:samples.github_nested] WHERE payload.pages.page_name IS NOT NULL), -- Input fields from base query. actor, payload.pages.page_name, repository.name, type, -- Output schema of JS function. "[{name: 'actor', type: 'string'}, {name: 'repository_name', type: 'string'}, {name: 'page_name', type: 'string'}]", -- JS function. "function(r, emit) { var foundHome = false; var foundNotHome = null; for (var i = 0; i < r.payload.pages.length; i++) { if (r.payload.pages[i].page_name == 'Home') { foundHome = true; } else if (r.payload.pages[i].page_name) { foundNotHome = r.payload.pages[i].page_name; } } if (foundHome && foundNotHome) { emit({actor: r.actor, repository_name: r.repository.name, page_name: foundNotHome}); } }") ORDER BY actor, repository_name, page_name User Defined Function with JavaScript Search GitHub repository for updates that change a 'Home' page

Slide 35

Slide 35 text

The World after MapReduce

Slide 36

Slide 36 text

From: Wired

Slide 37

Slide 37 text

BigQuery engine BigQuery Abuse detection User interactions Streaming Batch User engagement analytics Cloud Pub/Sub ACL ACL Topic 2 Business dashboard Data science tools Users Devs Data scientists Business App events ACL ACL Topic 1 Storage Services Cloud Storage Cloud Datastore Cloud SQL Open Source bdutil orchestration Connectors Cloud Dataflow

Slide 38

Slide 38 text

Cloud Pub/Sub ACL ACL Topic 2 Consumer Consumer Consumer Publisher Publisher Publisher Publisher ACL ACL Topic 1 Cloud Pub/Sub Coming Soon!

Slide 39

Slide 39 text

Abuse detection User interactions Cloud Dataflow Streaming Batch User engagement analytics Google Cloud Dataflow Optimize Schedule Cloud Dataflow Coming Soon!

Slide 40

Slide 40 text

Cloud Dataflow = Managed Flume + MillWheel on GCE

Slide 41

Slide 41 text

Prefix Suggestions #ar #argentina, #arugularocks, #argylesocks #arg #argentina, #argylesocks, #argonauts #arge #argentina, #argentum, #argentine Example: Auto completing hashtags

Slide 42

Slide 42 text

{a->[apple, art, argentina], ar->[art, argentina, armenia],...} Count ExpandPrefixes Top(3) Write Read ExtractTags {a->(argentina, 5M), a->(armenia, 2M), …, ar-> (argentina, 5M), ar->(armenia, 2M), ...} {#argentina scores!, watching #armenia vs #argentina, my #art project, …} {argentina, armenia, argentina, art, ...} {argentina->5M, armenia->2M, art->90M, ...} Tweets Predictions

Slide 43

Slide 43 text

Count ExpandPrefixes Top(3) Write Read ExtractTags Tweets Predictions Pipeline p = Pipeline.create(); p.begin() p.run(); .apply(ParDo.of(new ExtractTags())) .apply(Top.largestPerKey(3)) .apply(Count.create()) .apply(ParDo.of(new ExpandPrefixes()) .apply(TextIO.Write.to(“gs://…”)); .apply(TextIO.Read.from(“gs://…”))

Slide 44

Slide 44 text

Automatically optimizes the execution plan DAG

Slide 45

Slide 45 text

Let’s stream it! Google Cloud Pub/Sub for reads and writes. time #ar* rank game begins armenia wins! #argyle #armeniarocks Age out old data #argentinagoal

Slide 46

Slide 46 text

Pipeline p = Pipeline.create(); p.begin() .apply(PubsubIO.Read.from(“input_topic”)) .apply(Bucket.by(SlidingWindows.of(60, MINUTES)) .apply(ParDo.of(new ExtractTags())) .apply(Count.create()) .apply(ParDo.of(new ExpandPrefixes()) .apply(Top.largestPerKey(3)) .apply(PubsubIO.Write.to(“output_topic”)); p.run(); The same code can be used for stream processing

Slide 47

Slide 47 text

Demo at YouTube

Slide 48

Slide 48 text

Questions?

Slide 49

Slide 49 text

Backup

Slide 50

Slide 50 text

How to analyze big data in real-time?

Slide 51

Slide 51 text

Lambda Architecture is: A complementary pair of: - in-memory real-time processing - large HDD/SSD batch processing Proposed by Nathan Marz ex. Twitter Summingbird Slow, but large and persistent. Fast, but small and volatile.

Slide 52

Slide 52 text

Norikra: an open source Complex Event Processing (CEP) Production use at LINE, the largest asian SNS with 400M users, for massive log analysis