BigQuery and the world after MapReduce

BigQuery and the world after MapReduce


Kazunori Sato

July 09, 2014


  1. BigQuery and the world after MapReduce Handling Big Data in

    the Googley way
  2. +Kazunori Sato @kazunori_279 Solutions Architect, Cloud Platform GBU, Google Inc

    - GCP solutions design - Professional services for GCP - Docker/GCP meetups support
  3. “Big Data” at Google: How do we handle it

  4. At Google, we have “big” big data everywhere What if

    a Googler is asked: “Can you give me the list of top 20 Android apps installed in 2012?”
  5. At Google, we don’t use MapReduce for this We use

    Dremel = Google BigQuery SELECT top(appId, 20) AS app, count(*) AS count FROM installlog.2012 ORDER BY count DESC It scans 68B rows in ~20 sec, No index used.
  6. Demo

  7. BigQuery is low cost Storage: $0.026 per GB per month

    Queries: $5 per TB
  8. Why is BigQuery so fast?

  9. Column Oriented Storage Record Oriented Storage Column Oriented Storage Less

    bandwidth, More compression
  10. Inside BQ: ColumnIO on Colossus Colossus File System - The

    next generation Google File System Chunk Servers - ColumnIO on CFS - Replicated to multiple DCs asynchronously - Mitigates Tail Latency, provides 99.9% availability
  11. select top(title), count(*) from publicdata:samples.wikipedia Massively Parallel Processing Scanning 1

    TB in 1 sec takes 5,000 disks Each query runs on thousands of servers
  12. Fast aggregation by tree structure Mixer 0 Mixer 1 Mixer

    1 Shard Shard Shard Shard ColumnIO on Colossus SELECT state, year COUNT(*) GROUP BY state WHERE year >= 1980 and year < 1990 ORDER BY count_babies DESC LIMIT 10 COUNT(*) GROUP BY state
  13. Inside BQ: Query Processing 1. Mixer receives a query -

    Interprets the query - Maps tables to CFS files, requests to Shards 2. Shards process the partial queries - Open the ColumnIO files - Reads the table, applies WHERE, GROUP BY 3. Mixer aggregates the results - Applies ORDER BY, LIMIT
  14. Inside BQ: Small JOIN SELECT wiki.title FROM [publicdata:samples.wikipedia] AS wiki

    JOIN ( SELECT word FROM [publicdata:samples.shakespeare] GROUP BY word ) AS shakes ON wiki.title = shakes.word;
  15. Small JOIN: executed with Broadcast JOIN - One table should

    be < 8MB, sent to every shard From: Google BigQuery Analytics Inside BQ: Small JOIN
  16. Inside BQ: Big JOIN SELECT wiki.title FROM [publicdata:samples.wikipedia] AS wiki

    JOIN EACH ( SELECT word FROM [publicdata:samples.shakespeare] GROUP EACH BY word ) AS shakes ON wiki.title = shakes.word;
  17. Inside BQ: Big JOIN Big JOIN: executed with shuffling -

    Both tables can be > 8MB - BQ shuffler doesn’t sort; just hash partitioning From: Google BigQuery Analytics
  18. Demo

  19. Inside BQ: Big JOIN Big JOIN performance - JOIN 608M

    records x 38M records → 60 - 80 sec SELECT COUNT(*) FROM [bigquery-samples:wikimedia_pageviews. 200801] as a JOIN EACH [bigquery-samples:wikimedia_pageviews.200802] as b ON a.title = b.title WHERE b.language = "ja" From: Google BigQuery Analytics
  20. BigQuery + Hadoop

  21. HDFS (optional) Work Nodes Work Nodes Master Node Work Nodes

    HDFS (optional) Name Node (optional) Local SSD PD SSD PD standard GCS Connector BigQuery Connector Datastore Connector Connectors bdutil orchestration Connectors for Hadoop
  22. BigQuery Connector for Hadoop // Set the job-level projectId. conf.set(BigQueryConfiguration.PROJECT_ID_KEY,

    projectId); // Configure input parameters. BigQueryConfiguration.configureBigQueryInput(conf, ”publicdata:samples.shakespeare”); // Set InputFormat. job.setInputFormatClass(BigQueryInputFormat.class);
  23. BigQuery Connector for Hadoop // Configure output parameters String outSchema

    = "[{'name': 'Word','type': 'STRING'},{'name': 'Number','type': 'INTEGER'}]"; BigQueryConfiguration.configureBigQueryOutput( conf, ”gcp_samples.output”, outSchema); // Set OutputFormat. job.setOutputFormatClass(BigQueryOutputFormat.class);
  24. Demo

  25. BigQuery + Fluentd

  26. BigQuery Streaming Low cost: $0.01 per 100,000 rows Real time

    availability of data 100,000 rows per second x tables
  27. Slideshare uses Fluentd for collecting logs from >500 servers. "We

    take full advantage of its extendable plugin architecture and use it as a message bus that collects data from hundreds of servers into multiple backend systems." Sylvain Kalache, Operations Engineer
  28. Why Fluentd? Because it’s super easy to use, and has

    extensive plugins written by active community.
  29. Now Fluentd logs can be imported to BigQuery really easy,

    ~1M rows/s
  30. Demo

  31. BigQuery + User Defined Function

  32. User Defined Function with JavaScript BigQuery User Defined Function by

    JavaScript: - takes input JSON as arguments - emits output JSON as results JS in SQL - can be written in place of tables/subqueries Coming Soon!
  33. for (var i = 0; i < row.hits.length - 1;

    i++) { if (row.hits[i].page.pagepath.indexOf('github_nested') != -1 && row.hits[i + 1].page.pagepath.indexOf('github_timeline') != -1) { result.push({ visitid: row.visitid, duration: row.hits[i + 1].time - row.hits[i].time }); } } User Defined Function with JavaScript
  34. SELECT actor, repository_name, page_name FROM js( -- Base query. (SELECT

    actor, payload.pages.page_name,, type FROM [publicdata:samples.github_nested] WHERE payload.pages.page_name IS NOT NULL), -- Input fields from base query. actor, payload.pages.page_name,, type, -- Output schema of JS function. "[{name: 'actor', type: 'string'}, {name: 'repository_name', type: 'string'}, {name: 'page_name', type: 'string'}]", -- JS function. "function(r, emit) { var foundHome = false; var foundNotHome = null; for (var i = 0; i < r.payload.pages.length; i++) { if (r.payload.pages[i].page_name == 'Home') { foundHome = true; } else if (r.payload.pages[i].page_name) { foundNotHome = r.payload.pages[i].page_name; } } if (foundHome && foundNotHome) { emit({actor:, repository_name:, page_name: foundNotHome}); } }") ORDER BY actor, repository_name, page_name User Defined Function with JavaScript Search GitHub repository for updates that change a 'Home' page
  35. The World after MapReduce

  36. From: Wired

  37. BigQuery engine BigQuery Abuse detection User interactions Streaming Batch User

    engagement analytics Cloud Pub/Sub ACL ACL Topic 2 Business dashboard Data science tools Users Devs Data scientists Business App events ACL ACL Topic 1 Storage Services Cloud Storage Cloud Datastore Cloud SQL Open Source bdutil orchestration Connectors Cloud Dataflow
  38. Cloud Pub/Sub ACL ACL Topic 2 Consumer Consumer Consumer Publisher

    Publisher Publisher Publisher ACL ACL Topic 1 Cloud Pub/Sub Coming Soon!
  39. Abuse detection User interactions Cloud Dataflow Streaming Batch User engagement

    analytics Google Cloud Dataflow Optimize Schedule Cloud Dataflow Coming Soon!
  40. Cloud Dataflow = Managed Flume + MillWheel on GCE

  41. Prefix Suggestions #ar #argentina, #arugularocks, #argylesocks #arg #argentina, #argylesocks, #argonauts

    #arge #argentina, #argentum, #argentine Example: Auto completing hashtags
  42. {a->[apple, art, argentina], ar->[art, argentina, armenia],...} Count ExpandPrefixes Top(3) Write

    Read ExtractTags {a->(argentina, 5M), a->(armenia, 2M), …, ar-> (argentina, 5M), ar->(armenia, 2M), ...} {#argentina scores!, watching #armenia vs #argentina, my #art project, …} {argentina, armenia, argentina, art, ...} {argentina->5M, armenia->2M, art->90M, ...} Tweets Predictions
  43. Count ExpandPrefixes Top(3) Write Read ExtractTags Tweets Predictions Pipeline p

    = Pipeline.create(); p.begin(); .apply(ParDo.of(new ExtractTags())) .apply(Top.largestPerKey(3)) .apply(Count.create()) .apply(ParDo.of(new ExpandPrefixes()) .apply(“gs://…”)); .apply(TextIO.Read.from(“gs://…”))
  44. Automatically optimizes the execution plan DAG

  45. Let’s stream it! Google Cloud Pub/Sub for reads and writes.

    time #ar* rank game begins armenia wins! #argyle #armeniarocks Age out old data #argentinagoal
  46. Pipeline p = Pipeline.create(); p.begin() .apply(PubsubIO.Read.from(“input_topic”)) .apply(, MINUTES)) .apply(ParDo.of(new ExtractTags()))

    .apply(Count.create()) .apply(ParDo.of(new ExpandPrefixes()) .apply(Top.largestPerKey(3)) .apply(“output_topic”));; The same code can be used for stream processing
  47. Demo at YouTube

  48. Questions?

  49. Backup

  50. How to analyze big data in real-time?

  51. Lambda Architecture is: A complementary pair of: - in-memory real-time

    processing - large HDD/SSD batch processing Proposed by Nathan Marz ex. Twitter Summingbird Slow, but large and persistent. Fast, but small and volatile.
  52. Norikra: an open source Complex Event Processing (CEP) Production use

    at LINE, the largest asian SNS with 400M users, for massive log analysis
  53. Real-time analysis on streaming data with in-memory continuous query

  54. Lambda Arch by BQ+Norikra

  55. Demo