measuring api performance using druid

Ananth Packkildurai November 28, 2017 1 Measuring Slack API performance
using Druid

Public launch: 2014 800+ employees across 7 countries worldwide HQ
in San Francisco Diverse set of industries including software/technology, retail, media, telecom and professional services. About Slack

An unprecedented adoption rate

Agenda 1. A bit history. 2. Druid infrastructure & usecases
3. Challenges.

A bit history

March 2016 5 350+ 2M Data Engineers Slack employees Active
users

October 2017 10 800+ 6M Data Engineers Slack employees Active
users

Data usage 1 in 3 per week 500+ tables 400k
access data warehouse Tables Events per sec

It is all about Slogs

Well, not exactly

Druid infrastructure & usecases

What can go wrong?

We want more...

Performance & Experimentation • Engineering & CE team should be
able to detect the performance bottleneck proactively. • Engineers should be able to see their experimentation performance in near real-time.

Near Real time Pipeline

Keep the load in DW Kafka predictable. More comfortable to
upgrade and verify newer Kafka version. Smaller Kafka cluster is relatively more straightforward to operate. Why Analytics Kafka

Druid Architecture

Middle manager Autoscale based on number of running tasks. Historical
node autoscale based on the segment size. Fault tolerance deployment for overlord & Coordinator Brokers autoscale and load balanced by ELB. Druid Architecture

Challenges

Cascading failures

Forward Index fields

Bridge the gap between batch and realtime tables.

Thank You! 26

measuring api performance using druid

measuring api performance using druid

Ananth Packkildurai

More Decks by Ananth Packkildurai

Other Decks in Programming

Featured

Transcript

Ananth Packkildurai November 28, 2017 1 Measuring Slack API performance

Public launch: 2014 800+ employees across 7 countries worldwide HQ

An unprecedented adoption rate

Agenda 1. A bit history. 2. Druid infrastructure & usecases

A bit history

March 2016 5 350+ 2M Data Engineers Slack employees Active

October 2017 10 800+ 6M Data Engineers Slack employees Active

Data usage 1 in 3 per week 500+ tables 400k

It is all about Slogs

Well, not exactly

Slog

Slog

Druid infrastructure & usecases

What can go wrong?

We want more...

Performance & Experimentation • Engineering & CE team should be

Near Real time Pipeline

Keep the load in DW Kafka predictable. More comfortable to

Druid Architecture

Middle manager Autoscale based on number of running tasks. Historical

Challenges

Cascading failures

Forward Index fields

SQL

Bridge the gap between batch and realtime tables.

Thank You! 26