Druid @ Branch

Druid @ Branch Enhancing the Data Platform for better Business
Decisions • Sub Second aggregate queries • Real time analytics dashboard • Live queries for uniques • Instant exploratory analytics Technology powering the Data Platform Performance & Scale Considerations Opportunity for new Apps Monitoring Provisioning & deployment Future Plan Demo Biswajit Das Data Team @biswajit @branch.io Muwon Lum Infra Team @muwon @branch.io

Agenda The Business Problem Technology Gap Data Platform Features Performance
and Scale Opportunity for new Apps Monitoring Provisioning & deployment Future Plan

The Business Problem • Cannot perform live complex queries •
Lack of instant access to aggregate data • Gathering unique impressions time consuming • No single pane of glass to view all data • Ad Hoc query requires pre-aggregation Instant access to information at scale was a problem

Technology Gap Key/Value Store (Aerospike) • Pre-compute all permutations of
possible user queries. • Range scans on event data. • Pre-computing all permutations of all ad-hoc queries can lead to a result sets that grow exponentially with the number of columns of a data sets and can require hours of pre-processing time.

Druid to the rescue…….

High Level Data Pipeline flow SECOR Tranquility Parquet

Batch System

Query path

Performance And Scale • 25 node Production cluster (only Druid)
• Several hundred terabytes raw data indexed . • Typical complex datasource with 30 dimension and 2 metrics • Real time indexer with ~30k events per second to peak 50k • Hourly bucketed data to support different timezones • Sustained 2B + events day • Thousands of queries per second for online dashboard applications • Serving 11 million query every day

Opportunity for new Apps • Druid helped us to support
new analytics easily . • Ad hoc reporting . • Visualizing Data. • Exploratory analytics .

Provisioning & Deployment SaltStack

Rolling Updates 1 2 3

Future Plan • More robust Query Service . • Migrate
Hadoop indexer to Spark. • Actively working to migrate streaming pipeline to Flink . • Evaluating to move whole druid stack to Mesos/Docker .

Thank you We are hiring : https://branch.io/careers

Druid @ Branch

Druid @ Branch

Imply

More Decks by Imply

Other Decks in Technology

Featured

Transcript

Druid @ Branch Enhancing the Data Platform for better Business

Agenda The Business Problem Technology Gap Data Platform Features Performance

The Business Problem • Cannot perform live complex queries •

Technology Gap Key/Value Store (Aerospike) • Pre-compute all permutations of

Druid to the rescue…….

High Level Data Pipeline flow SECOR Tranquility Parquet

Batch System

Query path

Performance And Scale • 25 node Production cluster (only Druid)

Opportunity for new Apps • Druid helped us to support

Provisioning & Deployment SaltStack

Rolling Updates 1 2 3

Future Plan • More robust Query Service . • Migrate

Thank you We are hiring : https://branch.io/careers