Upgrade to Pro — share decks privately, control downloads, hide ads and more …

BigQuery and the world after MapReduce

BigQuery and the world after MapReduce

Kazunori Sato

July 09, 2014
Tweet

More Decks by Kazunori Sato

Other Decks in Technology

Transcript

  1. +Kazunori Sato @kazunori_279 Solutions Architect, Cloud Platform GBU, Google Inc

    - GCP solutions design - Professional services for GCP - Docker/GCP meetups support
  2. At Google, we have “big” big data everywhere What if

    a Googler is asked: “Can you give me the list of top 20 Android apps installed in 2012?”
  3. At Google, we don’t use MapReduce for this We use

    Dremel = Google BigQuery SELECT top(appId, 20) AS app, count(*) AS count FROM installlog.2012 ORDER BY count DESC It scans 68B rows in ~20 sec, No index used.
  4. Inside BQ: ColumnIO on Colossus Colossus File System - The

    next generation Google File System Chunk Servers - ColumnIO on CFS - Replicated to multiple DCs asynchronously - Mitigates Tail Latency, provides 99.9% availability
  5. select top(title), count(*) from publicdata:samples.wikipedia Massively Parallel Processing Scanning 1

    TB in 1 sec takes 5,000 disks Each query runs on thousands of servers
  6. Fast aggregation by tree structure Mixer 0 Mixer 1 Mixer

    1 Shard Shard Shard Shard ColumnIO on Colossus SELECT state, year COUNT(*) GROUP BY state WHERE year >= 1980 and year < 1990 ORDER BY count_babies DESC LIMIT 10 COUNT(*) GROUP BY state
  7. Inside BQ: Query Processing 1. Mixer receives a query -

    Interprets the query - Maps tables to CFS files, requests to Shards 2. Shards process the partial queries - Open the ColumnIO files - Reads the table, applies WHERE, GROUP BY 3. Mixer aggregates the results - Applies ORDER BY, LIMIT
  8. Inside BQ: Small JOIN SELECT wiki.title FROM [publicdata:samples.wikipedia] AS wiki

    JOIN ( SELECT word FROM [publicdata:samples.shakespeare] GROUP BY word ) AS shakes ON wiki.title = shakes.word;
  9. Small JOIN: executed with Broadcast JOIN - One table should

    be < 8MB, sent to every shard From: Google BigQuery Analytics Inside BQ: Small JOIN
  10. Inside BQ: Big JOIN SELECT wiki.title FROM [publicdata:samples.wikipedia] AS wiki

    JOIN EACH ( SELECT word FROM [publicdata:samples.shakespeare] GROUP EACH BY word ) AS shakes ON wiki.title = shakes.word;
  11. Inside BQ: Big JOIN Big JOIN: executed with shuffling -

    Both tables can be > 8MB - BQ shuffler doesn’t sort; just hash partitioning From: Google BigQuery Analytics
  12. Inside BQ: Big JOIN Big JOIN performance - JOIN 608M

    records x 38M records → 60 - 80 sec SELECT COUNT(*) FROM [bigquery-samples:wikimedia_pageviews. 200801] as a JOIN EACH [bigquery-samples:wikimedia_pageviews.200802] as b ON a.title = b.title WHERE b.language = "ja" From: Google BigQuery Analytics
  13. HDFS (optional) Work Nodes Work Nodes Master Node Work Nodes

    HDFS (optional) Name Node (optional) Local SSD PD SSD PD standard GCS Connector BigQuery Connector Datastore Connector Connectors bdutil orchestration Connectors for Hadoop
  14. BigQuery Connector for Hadoop // Set the job-level projectId. conf.set(BigQueryConfiguration.PROJECT_ID_KEY,

    projectId); // Configure input parameters. BigQueryConfiguration.configureBigQueryInput(conf, ”publicdata:samples.shakespeare”); // Set InputFormat. job.setInputFormatClass(BigQueryInputFormat.class);
  15. BigQuery Connector for Hadoop // Configure output parameters String outSchema

    = "[{'name': 'Word','type': 'STRING'},{'name': 'Number','type': 'INTEGER'}]"; BigQueryConfiguration.configureBigQueryOutput( conf, ”gcp_samples.output”, outSchema); // Set OutputFormat. job.setOutputFormatClass(BigQueryOutputFormat.class);
  16. BigQuery Streaming Low cost: $0.01 per 100,000 rows Real time

    availability of data 100,000 rows per second x tables
  17. Slideshare uses Fluentd for collecting logs from >500 servers. "We

    take full advantage of its extendable plugin architecture and use it as a message bus that collects data from hundreds of servers into multiple backend systems." Sylvain Kalache, Operations Engineer
  18. Why Fluentd? Because it’s super easy to use, and has

    extensive plugins written by active community.
  19. User Defined Function with JavaScript BigQuery User Defined Function by

    JavaScript: - takes input JSON as arguments - emits output JSON as results JS in SQL - can be written in place of tables/subqueries Coming Soon!
  20. for (var i = 0; i < row.hits.length - 1;

    i++) { if (row.hits[i].page.pagepath.indexOf('github_nested') != -1 && row.hits[i + 1].page.pagepath.indexOf('github_timeline') != -1) { result.push({ visitid: row.visitid, duration: row.hits[i + 1].time - row.hits[i].time }); } } User Defined Function with JavaScript
  21. SELECT actor, repository_name, page_name FROM js( -- Base query. (SELECT

    actor, payload.pages.page_name, repository.name, type FROM [publicdata:samples.github_nested] WHERE payload.pages.page_name IS NOT NULL), -- Input fields from base query. actor, payload.pages.page_name, repository.name, type, -- Output schema of JS function. "[{name: 'actor', type: 'string'}, {name: 'repository_name', type: 'string'}, {name: 'page_name', type: 'string'}]", -- JS function. "function(r, emit) { var foundHome = false; var foundNotHome = null; for (var i = 0; i < r.payload.pages.length; i++) { if (r.payload.pages[i].page_name == 'Home') { foundHome = true; } else if (r.payload.pages[i].page_name) { foundNotHome = r.payload.pages[i].page_name; } } if (foundHome && foundNotHome) { emit({actor: r.actor, repository_name: r.repository.name, page_name: foundNotHome}); } }") ORDER BY actor, repository_name, page_name User Defined Function with JavaScript Search GitHub repository for updates that change a 'Home' page
  22. BigQuery engine BigQuery Abuse detection User interactions Streaming Batch User

    engagement analytics Cloud Pub/Sub ACL ACL Topic 2 Business dashboard Data science tools Users Devs Data scientists Business App events ACL ACL Topic 1 Storage Services Cloud Storage Cloud Datastore Cloud SQL Open Source bdutil orchestration Connectors Cloud Dataflow
  23. Cloud Pub/Sub ACL ACL Topic 2 Consumer Consumer Consumer Publisher

    Publisher Publisher Publisher ACL ACL Topic 1 Cloud Pub/Sub Coming Soon!
  24. Abuse detection User interactions Cloud Dataflow Streaming Batch User engagement

    analytics Google Cloud Dataflow Optimize Schedule Cloud Dataflow Coming Soon!
  25. Prefix Suggestions #ar #argentina, #arugularocks, #argylesocks #arg #argentina, #argylesocks, #argonauts

    #arge #argentina, #argentum, #argentine Example: Auto completing hashtags
  26. {a->[apple, art, argentina], ar->[art, argentina, armenia],...} Count ExpandPrefixes Top(3) Write

    Read ExtractTags {a->(argentina, 5M), a->(armenia, 2M), …, ar-> (argentina, 5M), ar->(armenia, 2M), ...} {#argentina scores!, watching #armenia vs #argentina, my #art project, …} {argentina, armenia, argentina, art, ...} {argentina->5M, armenia->2M, art->90M, ...} Tweets Predictions
  27. Count ExpandPrefixes Top(3) Write Read ExtractTags Tweets Predictions Pipeline p

    = Pipeline.create(); p.begin() p.run(); .apply(ParDo.of(new ExtractTags())) .apply(Top.largestPerKey(3)) .apply(Count.create()) .apply(ParDo.of(new ExpandPrefixes()) .apply(TextIO.Write.to(“gs://…”)); .apply(TextIO.Read.from(“gs://…”))
  28. Let’s stream it! Google Cloud Pub/Sub for reads and writes.

    time #ar* rank game begins armenia wins! #argyle #armeniarocks Age out old data #argentinagoal
  29. Pipeline p = Pipeline.create(); p.begin() .apply(PubsubIO.Read.from(“input_topic”)) .apply(Bucket.by(SlidingWindows.of(60, MINUTES)) .apply(ParDo.of(new ExtractTags()))

    .apply(Count.create()) .apply(ParDo.of(new ExpandPrefixes()) .apply(Top.largestPerKey(3)) .apply(PubsubIO.Write.to(“output_topic”)); p.run(); The same code can be used for stream processing
  30. Lambda Architecture is: A complementary pair of: - in-memory real-time

    processing - large HDD/SSD batch processing Proposed by Nathan Marz ex. Twitter Summingbird Slow, but large and persistent. Fast, but small and volatile.
  31. Norikra: an open source Complex Event Processing (CEP) Production use

    at LINE, the largest asian SNS with 400M users, for massive log analysis