Upgrade to Pro — share decks privately, control downloads, hide ads and more …

モバイル KPI 分析の新標準 Fluentd + Google BigQuery #gcpライブ #gcpja

モバイル KPI 分析の新標準 Fluentd + Google BigQuery #gcpライブ #gcpja



February 25, 2015

More Decks by GoogleCloudPlatformJapan

Other Decks in Programming


  1. Confidential and proprietary モバイル KPI 分析の新標準 Fluentd + Google BigQuery

    Cloud Platformチーム デベロッパーアドボケイト 佐藤一憲 #gcpライブ
  2. Confidential and proprietary +Kazunori Sato @kazunori_279 Developer Advocate, Cloud Platform,

    Google Inc - GCP developer community support - GCP product launch support
  3. Confidential and proprietary agenda Big Data in Google and Google

    BigQuery Why BigQuery is so fast? Real-time Streaming Import by Fluentd + BigQuery Real-time KPI analytics by Lambda Architecture
  4. Confidential and proprietary Big Data in Google and Google BigQuery

  5. Confidential and proprietary 100 hours/min 100 petabytes 500+ million users

    900+ million devices Big Data in Google
  6. Confidential and proprietary Cloud Technology Innovations 2012 2013 MapReduce Spanner/F1

    2003 2006 2007 2010 2011 GFS Omega Colossus Cloud Storage Dremel BigQuery Big Table Cloud Datastore Paxos impl. 2004
  7. Confidential and proprietary At Google, we have “big” big data

    everywhere What if a Googler is asked: “Can you give me the list of top 20 Android apps installed in 2012?”
  8. Confidential and proprietary In Google, we don’t use MapReduce for

    this We use Dremel = Google BigQuery SELECT top(appId, 20) AS app, count(*) AS count FROM installlog.2012 ORDER BY count DESC It scans 100B rows in ~30 sec, No index used.
  9. Confidential and proprietary Google BigQuery: Massively Parallel Query Service

  10. Confidential and proprietary Storage: $0.020 per GB per month Queries:

    $5 per TB Cost of BigQuery
  11. Confidential and proprietary Gaming, Social, Mobile Ads, Digital Marketing, DMP,

    Media Monitoring, Alerting and Security Retails Internet of Things (IoT) Applications
  12. Confidential and proprietary BigQuery Analytic Service in the Cloud BigQuery

    R and Pandas Microsoft Excel Google Spreadsheet Hadoop/Hive Spark Adwords DoubleClick Google Analytics Event Logs, Databases IoT Devices Analyze Export BI Tools Import Import, Analyze and Export
  13. Confidential and proprietary Tableau Demo BIME Demo BigQuery + BI

  14. Confidential and proprietary Why BigQuery is so fast?

  15. Confidential and proprietary Column Oriented Storage Record Oriented Storage Column

    Oriented Storage Less bandwidth, More compression
  16. Confidential and proprietary select top(title), count(*) from publicdata:samples.wikipedia Massively Parallel

    Processing Scanning 1 TB in 1 sec takes 5,000 disks Each query runs on thousands of servers
  17. Confidential and proprietary Fast aggregation by tree structure Mixer 0

    Mixer 1 Mixer 1 Shard Shard Shard Shard ColumnIO on Colossus SELECT state, year COUNT(*) GROUP BY state WHERE year >= 1980 and year < 1990 ORDER BY count_babies DESC LIMIT 10 COUNT(*) GROUP BY state
  18. Confidential and proprietary Inside BQ: Big JOIN Big JOIN: executed

    with shuffling - Both tables can be > 8MB - BQ shuffler doesn’t sort; just hash partitioning From: Google BigQuery Analytics
  19. Confidential and proprietary Real-time Streaming Import with Fluentd + BigQuery

  20. Confidential and proprietary “I want a real-time dashboard for collecting

    the votes and system stats from 200 servers”
  21. Confidential and proprietary BigQuery Streaming Low cost: $0.01 per 100,000

    rows Real time availability of data 100,000 rows per second x tables
  22. Confidential and proprietary Slideshare uses Fluentd for collecting logs from

    >500 servers. "We take full advantage of its extendable plugin architecture and use it as a message bus that collects data from hundreds of servers into multiple backend systems." Sylvain Kalache, Operations Engineer
  23. Confidential and proprietary Why Fluentd? Because it’s super easy to

    use, and has extensive plugins written by active community.
  24. Confidential and proprietary Now Fluentd logs can be imported to

    BigQuery really easy, ~1M rows/s
  25. Confidential and proprietary Search “fluentd bigquery” on GitHub

  26. Confidential and proprietary Google Spreadsheet IoT Example: RasPi > BigQuery

    > Spreadsheet
  27. Confidential and proprietary Real-time KPI Analytics with Lambda Architecture

  28. Confidential and proprietary Lambda Architecture is: A complementary pair of:

    - in-memory real-time processing - large HDD/SSD batch processing Proposed by Nathan Marz ex. Twitter Summingbird Slow, but large and persistent. Fast, but small and volatile.
  29. Confidential and proprietary Norikra: an open source stream processing tool

    Production use at LINE, the largest asian SNS with 500M users, for massive log analysis Super easy to use: requires no heavy-weighted cluster set-up
  30. Confidential and proprietary Real-time KPI analysis with SQL-based in- memory

    continuous query
  31. Confidential and proprietary Proposed Solution: Lambda Architecture

  32. Confidential and proprietary Proposed Solution: Lambda Architecture Fluentd: event log

    collection from various event sources Norikra: easy, scalable real time stream processing BigQuery: scalable query engine for large datasets 1 2 3 Google Spreadsheet: flexible dashboard with charts Docker: repeatable deployment in 10 minutes 4 5
  33. Confidential and proprietary Demo

  34. Confidential and proprietary • Gaming: How many new users has

    purchased the first item in last 10 minutes? • Media: How many people hit the vote button during the live TV program? • Retail: What is the current total revenue of all stores nationwide? • Ads: What is the conversion rate of impressions/clicks to purchase? • Co-relate system resource usage with access/application logs • Real-time DoS or cheating detection • Send e-mail notification from Apps Script triggered by Norikra Real-time KPI Dashboard Real-time Monitoring and Alerting Applications
  35. Confidential and proprietary Easy real-time SQL-based KPI analytics at 1M+

    rows/sec by Norikra Easy real-time streaming import at 1M+ rows/sec by BigQuery + Fluentd Search “lambda dashboard” on GitHub Solution Benefits Real-time dashboard with Google Spreadsheet Deployable within 10 min with Docker