Upgrade to Pro — share decks privately, control downloads, hide ads and more …

モバイル KPI 分析の新標準 Fluentd + Google BigQuery #gcpライブ #gcpja

モバイル KPI 分析の新標準 Fluentd + Google BigQuery #gcpライブ #gcpja

GoogleCloudPlatformJapan

February 25, 2015
Tweet

More Decks by GoogleCloudPlatformJapan

Other Decks in Programming

Transcript

  1. Confidential and proprietary モバイル KPI 分析の新標準 Fluentd + Google BigQuery

    Cloud Platformチーム デベロッパーアドボケイト 佐藤一憲 #gcpライブ
  2. Confidential and proprietary +Kazunori Sato @kazunori_279 Developer Advocate, Cloud Platform,

    Google Inc - GCP developer community support - GCP product launch support
  3. Confidential and proprietary agenda Big Data in Google and Google

    BigQuery Why BigQuery is so fast? Real-time Streaming Import by Fluentd + BigQuery Real-time KPI analytics by Lambda Architecture
  4. Confidential and proprietary Cloud Technology Innovations 2012 2013 MapReduce Spanner/F1

    2003 2006 2007 2010 2011 GFS Omega Colossus Cloud Storage Dremel BigQuery Big Table Cloud Datastore Paxos impl. 2004
  5. Confidential and proprietary At Google, we have “big” big data

    everywhere What if a Googler is asked: “Can you give me the list of top 20 Android apps installed in 2012?”
  6. Confidential and proprietary In Google, we don’t use MapReduce for

    this We use Dremel = Google BigQuery SELECT top(appId, 20) AS app, count(*) AS count FROM installlog.2012 ORDER BY count DESC It scans 100B rows in ~30 sec, No index used.
  7. Confidential and proprietary Gaming, Social, Mobile Ads, Digital Marketing, DMP,

    Media Monitoring, Alerting and Security Retails Internet of Things (IoT) Applications
  8. Confidential and proprietary BigQuery Analytic Service in the Cloud BigQuery

    R and Pandas Microsoft Excel Google Spreadsheet Hadoop/Hive Spark Adwords DoubleClick Google Analytics Event Logs, Databases IoT Devices Analyze Export BI Tools Import Import, Analyze and Export
  9. Confidential and proprietary select top(title), count(*) from publicdata:samples.wikipedia Massively Parallel

    Processing Scanning 1 TB in 1 sec takes 5,000 disks Each query runs on thousands of servers
  10. Confidential and proprietary Fast aggregation by tree structure Mixer 0

    Mixer 1 Mixer 1 Shard Shard Shard Shard ColumnIO on Colossus SELECT state, year COUNT(*) GROUP BY state WHERE year >= 1980 and year < 1990 ORDER BY count_babies DESC LIMIT 10 COUNT(*) GROUP BY state
  11. Confidential and proprietary Inside BQ: Big JOIN Big JOIN: executed

    with shuffling - Both tables can be > 8MB - BQ shuffler doesn’t sort; just hash partitioning From: Google BigQuery Analytics
  12. Confidential and proprietary BigQuery Streaming Low cost: $0.01 per 100,000

    rows Real time availability of data 100,000 rows per second x tables
  13. Confidential and proprietary Slideshare uses Fluentd for collecting logs from

    >500 servers. "We take full advantage of its extendable plugin architecture and use it as a message bus that collects data from hundreds of servers into multiple backend systems." Sylvain Kalache, Operations Engineer
  14. Confidential and proprietary Why Fluentd? Because it’s super easy to

    use, and has extensive plugins written by active community.
  15. Confidential and proprietary Lambda Architecture is: A complementary pair of:

    - in-memory real-time processing - large HDD/SSD batch processing Proposed by Nathan Marz ex. Twitter Summingbird Slow, but large and persistent. Fast, but small and volatile.
  16. Confidential and proprietary Norikra: an open source stream processing tool

    Production use at LINE, the largest asian SNS with 500M users, for massive log analysis Super easy to use: requires no heavy-weighted cluster set-up
  17. Confidential and proprietary Proposed Solution: Lambda Architecture Fluentd: event log

    collection from various event sources Norikra: easy, scalable real time stream processing BigQuery: scalable query engine for large datasets 1 2 3 Google Spreadsheet: flexible dashboard with charts Docker: repeatable deployment in 10 minutes 4 5
  18. Confidential and proprietary • Gaming: How many new users has

    purchased the first item in last 10 minutes? • Media: How many people hit the vote button during the live TV program? • Retail: What is the current total revenue of all stores nationwide? • Ads: What is the conversion rate of impressions/clicks to purchase? • Co-relate system resource usage with access/application logs • Real-time DoS or cheating detection • Send e-mail notification from Apps Script triggered by Norikra Real-time KPI Dashboard Real-time Monitoring and Alerting Applications
  19. Confidential and proprietary Easy real-time SQL-based KPI analytics at 1M+

    rows/sec by Norikra Easy real-time streaming import at 1M+ rows/sec by BigQuery + Fluentd Search “lambda dashboard” on GitHub Solution Benefits Real-time dashboard with Google Spreadsheet Deployable within 10 min with Docker