Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Druid @ Branch

Imply
March 01, 2017

Druid @ Branch

Branch (http://branch.io/) shares examples of how they use Druid in production.

Imply

March 01, 2017
Tweet

More Decks by Imply

Other Decks in Technology

Transcript

  1. Druid @ Branch Enhancing the Data Platform for better Business

    Decisions • Sub Second aggregate queries • Real time analytics dashboard • Live queries for uniques • Instant exploratory analytics Technology powering the Data Platform Performance & Scale Considerations Opportunity for new Apps Monitoring Provisioning & deployment Future Plan Demo Biswajit Das Data Team @biswajit @branch.io Muwon Lum Infra Team @muwon @branch.io
  2. Agenda The Business Problem Technology Gap Data Platform Features Performance

    and Scale Opportunity for new Apps Monitoring Provisioning & deployment Future Plan
  3. The Business Problem • Cannot perform live complex queries •

    Lack of instant access to aggregate data • Gathering unique impressions time consuming • No single pane of glass to view all data • Ad Hoc query requires pre-aggregation Instant access to information at scale was a problem
  4. Technology Gap Key/Value Store (Aerospike) • Pre-compute all permutations of

    possible user queries. • Range scans on event data. • Pre-computing all permutations of all ad-hoc queries can lead to a result sets that grow exponentially with the number of columns of a data sets and can require hours of pre-processing time.
  5. Performance And Scale • 25 node Production cluster (only Druid)

    • Several hundred terabytes raw data indexed . • Typical complex datasource with 30 dimension and 2 metrics • Real time indexer with ~30k events per second to peak 50k • Hourly bucketed data to support different timezones • Sustained 2B + events day • Thousands of queries per second for online dashboard applications • Serving 11 million query every day
  6. Opportunity for new Apps • Druid helped us to support

    new analytics easily . • Ad hoc reporting . • Visualizing Data. • Exploratory analytics .
  7. Future Plan • More robust Query Service . • Migrate

    Hadoop indexer to Spark. • Actively working to migrate streaming pipeline to Flink . • Evaluating to move whole druid stack to Mesos/Docker .