Ananth Packkildurai
November 28, 2017
1
Measuring Slack API performance using Druid
Slide 2
Slide 2 text
Public launch: 2014 800+ employees across
7 countries worldwide
HQ in San Francisco
Diverse set of industries
including software/technology, retail, media,
telecom and professional services.
About Slack
Slide 3
Slide 3 text
An unprecedented adoption rate
Slide 4
Slide 4 text
Agenda
1. A bit history.
2. Druid infrastructure & usecases
3. Challenges.
Slide 5
Slide 5 text
A bit history
Slide 6
Slide 6 text
March 2016
5 350+ 2M
Data Engineers Slack employees Active users
Slide 7
Slide 7 text
October 2017
10 800+ 6M
Data Engineers Slack employees Active users
Slide 8
Slide 8 text
Data usage
1 in 3 per
week
500+
tables
400k
access data
warehouse
Tables Events per sec
Slide 9
Slide 9 text
It is all about Slogs
Slide 10
Slide 10 text
Well, not exactly
Slide 11
Slide 11 text
Slog
Slide 12
Slide 12 text
Slog
Slide 13
Slide 13 text
Druid infrastructure & usecases
Slide 14
Slide 14 text
What can go wrong?
Slide 15
Slide 15 text
We want more...
Slide 16
Slide 16 text
Performance & Experimentation
● Engineering & CE team should be able to detect the performance
bottleneck proactively.
● Engineers should be able to see their experimentation performance in
near real-time.
Slide 17
Slide 17 text
Near Real time Pipeline
Slide 18
Slide 18 text
Keep the load in DW Kafka predictable.
More comfortable to upgrade and verify newer Kafka version.
Smaller Kafka cluster is relatively more straightforward to operate.
Why Analytics Kafka
Slide 19
Slide 19 text
Druid Architecture
Slide 20
Slide 20 text
Middle manager Autoscale based on number of running tasks.
Historical node autoscale based on the segment size.
Fault tolerance deployment for overlord & Coordinator
Brokers autoscale and load balanced by ELB.
Druid Architecture