Building a Lambda Architecture in 10 minutes with BigQuery, CEP and Docker

Building a Lambda Architecture in 10 minutes with BigQuery, CEP and Docker

91aeb42c5d9548918d1459f64240e503?s=128

Kazunori Sato

July 09, 2014
Tweet

Transcript

  1. None
  2. Building a Lambda Architecture in 10 minutes with BigQuery, CEP

    and Docker
  3. +Kazunori Sato @kazunori_279 Solutions Architect, Cloud Platform GBU, Google Inc

    - GCP solutions design - Professional services for GCP - Docker/GCP meetups support
  4. The Problem: Analyze big data in real-time

  5. “I want a real-time dashboard for my 200 web servers.”

    - a customer with 200 Google Compute Engine instances
  6. The Solution

  7. BigQuery BQ Stream + Fluentd Norikra CEP Lambda Arch The

    Solution: Lambda Architecture
  8. How to analyze big data?

  9. BigQuery BQ Stream + Fluentd Norikra CEP Lambda Arch How

    to analyze big data?
  10. At Google, we have “big” big data everywhere What if

    a Googler is asked: “Can you give me the list of top 20 Android apps installed in 2012?”
  11. At Google, we run SQLs on Dremel = Google BigQuery

    SELECT top(appId, 20) AS app, count(*) AS count FROM installlog.2012; ORDER BY count DESC It scans 68B rows in ~30 sec, No index used.
  12. Column Oriented Storage Record Oriented Storage Column Oriented Storage Less

    bandwidth, More compression
  13. select top(title), count(*) from publicdata:samples.wikipedia Massively Parallel Processing Scanning 1

    TB in 1 sec takes 5,000 disks Each query runs on thousands of servers
  14. Fast aggregation by tree structure Mixer 0 Mixer 1 Mixer

    1 Leaf Leaf Leaf Leaf Distributed Storage SELECT state, year COUNT(*) GROUP BY state WHERE year >= 1980 and year < 1990 ORDER BY count_babies DESC LIMIT 10 COUNT(*) GROUP BY state
  15. How to collect big data?

  16. BigQuery BQ Stream + Fluentd Norikra CEP Lambda Arch How

    to collect big data?
  17. BigQuery Streaming Low cost: $0.01 per 100,000 rows Real time

    availability of data 100,000 rows per second x tables
  18. Slideshare uses Fluentd for collecting logs from >500 servers. "We

    take full advantage of its extendable plugin architecture and use it as a message bus that collects data from hundreds of servers into multiple backend systems." Sylvain Kalache, Operations Engineer
  19. Why Fluentd? Because it’s super easy to use, and has

    extensive plugins written by active community.
  20. Now Fluentd logs can be imported to BigQuery really easy

  21. How to analyze in real-time?

  22. BigQuery BQ Stream + Fluentd Norikra CEP Lambda Arch How

    to analyze in real-time?
  23. Norikra: an open source Complex Event Processing (CEP) Production use

    at LINE, the largest asian SNS with 400M users, for massive log analysis
  24. Real-time analysis on streaming data with in-memory continuous query

  25. How to analyze big data in real-time?

  26. BigQuery BQ Stream + Fluentd Norikra CEP Lambda Arch How

    to analyze big data in real-time?
  27. Lambda Architecture is: A complementary pair of: - in-memory real-time

    processing - large HDD/SSD batch processing Proposed by Nathan Marz ex. Twitter Summingbird Slow, but large and persistent. Fast, but small and volatile.
  28. A Recipe for a Lambda Architecture in 10 minutes Fluentd:

    event log collection from various event sources Norikra: scalable real time Complex Event Processing (CEP) BigQuery: scalable query engine for large datasets 1 2 3 Google Spreadsheet: flexible dashboard with a variety of charts Docker: repeatable deployment in 10 minutes 4 5
  29. Lambda Arch by BQ+Norikra

  30. Google Spreadsheet as a dashboard for real-time and big data

    views
  31. Everything Dockernized

  32. None
  33. Demo

  34. Applications • Gaming: How many new users has purchased the

    first item in last 10 minutes? • Media: How many people hit the vote button during the live TV program? • Retail: What is the current total revenue of all stores nationwide? • Ads: What is the conversion rate of impressions/clicks to purchase? • Co-relate system resource usage with access/application logs • Real-time DoS or cheating detection • Send e-mail notification from Apps Script triggered by CEP query Real-time KPI Dashboard Real-time Monitoring and Alerting
  35. Summary

  36. Real-time analytics by Norikra CEP with 10 sec latency Big

    data collection and analytics by BigQuery + Fluentd at ~1M rows/s Available on GitHub: GoogleCloudPlatform/lambda-dashboard Solution Benefits Real-time dashboard with Google Spreadsheet Deployable within 10 min with Docker
  37. Questions?

  38. None