measuring api performance using druid

Slide 1

Slide 1 text

Ananth Packkildurai November 28, 2017 1 Measuring Slack API performance using Druid

Slide 2

Slide 2 text

Public launch: 2014 800+ employees across 7 countries worldwide HQ in San Francisco Diverse set of industries including software/technology, retail, media, telecom and professional services. About Slack

Slide 3

Slide 3 text

An unprecedented adoption rate

Slide 4

Slide 4 text

Agenda 1. A bit history. 2. Druid infrastructure & usecases 3. Challenges.

Slide 5

Slide 5 text

A bit history

Slide 6

Slide 6 text

March 2016 5 350+ 2M Data Engineers Slack employees Active users

Slide 7

Slide 7 text

October 2017 10 800+ 6M Data Engineers Slack employees Active users

Slide 8

Slide 8 text

Data usage 1 in 3 per week 500+ tables 400k access data warehouse Tables Events per sec

Slide 9

Slide 9 text

It is all about Slogs

Slide 10

Slide 10 text

Well, not exactly

Slide 11

Slide 11 text

Slog

Slide 12

Slide 12 text

Slog

Slide 13

Slide 13 text

Druid infrastructure & usecases

Slide 14

Slide 14 text

What can go wrong?

Slide 15

Slide 15 text

We want more...

Slide 16

Slide 16 text

Performance & Experimentation ● Engineering & CE team should be able to detect the performance bottleneck proactively. ● Engineers should be able to see their experimentation performance in near real-time.

Slide 17

Slide 17 text

Near Real time Pipeline

Slide 18

Slide 18 text

Keep the load in DW Kafka predictable. More comfortable to upgrade and verify newer Kafka version. Smaller Kafka cluster is relatively more straightforward to operate. Why Analytics Kafka

Slide 19

Slide 19 text

Druid Architecture

Slide 20

Slide 20 text

Middle manager Autoscale based on number of running tasks. Historical node autoscale based on the segment size. Fault tolerance deployment for overlord & Coordinator Brokers autoscale and load balanced by ELB. Druid Architecture