Sometimes, Druid is not the best solution for a business use case

Sometimes, Druid is not the best solution for a business
use case Yulia Trakhtenberg

Real Time Dashboard • Optimizing campaigns based on user performance
analytics • 8 billion events daily • 12-15 dimensions • Data starting 2011 • Extendable

Time based data is mutable! - Life Time Value Session
started - date1 Install date purchase - date2 uninstall - date3

Previous Solution - Toku (Mongo) KAFKA Toku writers Toku master
Toku slaves Dashboard

Toku Problems • Failures on weekly basis – master lost
– no ability to select a new master • Bad modeling - slow writes • Data loss was possible • No ability to recover

Dashboard - DB abstraction level KAFKA Toku writers Toku master
Toku slaves Dashboard Middleware (Vishnu)

1. Cassandra 2. mongo 3 3. proprietary DB 4. redis
labs + indices DB 5. Pinot 6. Redshift 7. druid We tried...

Druid DB • Storage optimized for analytics • Lambda architecture
inside • JSON-based query language • Developed by analytics SAAS company • Free and open source • Scalable to petabytes... • Optimized for time based data

Druid - take 1 - Batch and Realtime combination KAFKA
Druid sink Dashboard Middleware (Vishnu) Daily Batch process In Memory Historical node Daily backup S3 files Druid

Druid - take 1 - Going to production • biggest
customers cannot open a dashboard • Busy weekend to give Druid a shot • On Sunday morning - dashboard opens and works well!!! • Customer was happy!!! • We moved more and more customers to Druid

Druid - problems • Not time based data - LTV
• Solution: – Timeseries on event time – Secondary index on event install date • Realtime data - in memory, no secondary indices - too slow

Current solution - MemSQL • In Memory DB • Rowstore
and Columnstore • Aggregators and Leaves

MemSQL Architecture KAFKA MemSQL writers Memsql Cluster Dashboard Middleware (Vishnu)
MemSQL writers Memsql Cluster (Slave)

MemSQL - why is it a solution • Fast •
Extendable • Better modeling • Recoverable solution • Possibility to return to 0 point

MemSQL - why is it a solution • Data -
450 GB x 2 clusters • Query Latency - 1-3 seconds • Machines x 2 clusters – 2 aggregators - m4.4xlarge – 4 leaves - r3.4xlarge • Cost reduction -$20K monthly

Recovery KAFKA (24h) MemSQL writers Master Memsql Cluster Dashboard Middleware
(Vishnu) Yesterday snapshot Recovery Memsql Cluster MemSQL writers - only current day

Yesterday S3 files Daily backup S3 files Daily Incremental Batch
- Yesterday Snapshot KAFKA Daily Batch Incremental process Past S3 files

Next steps - Architecture KAFKA writers - only new data
Memsql Rowstore Cluster 1-2 weeks Dashboard Middleware (Vishnu) Daily Batch process S3 files Memsql Columnstore History Cluster Daily

When Druid?

Thank you Join the Data Team! [email protected]

Sometimes, Druid is not the best solution for a...

Sometimes, Druid is not the best solution for a business use case

AppsFlyer

More Decks by AppsFlyer

Other Decks in Programming

Featured

Transcript

Sometimes, Druid is not the best solution for a business

Real Time Dashboard • Optimizing campaigns based on user performance

Time based data is mutable! - Life Time Value Session

Previous Solution - Toku (Mongo) KAFKA Toku writers Toku master

Toku Problems • Failures on weekly basis – master lost

Dashboard - DB abstraction level KAFKA Toku writers Toku master

1. Cassandra 2. mongo 3 3. proprietary DB 4. redis

Druid DB • Storage optimized for analytics • Lambda architecture

Druid - take 1 - Batch and Realtime combination KAFKA

Druid - take 1 - Going to production • biggest

Druid - problems • Not time based data - LTV

Current solution - MemSQL • In Memory DB • Rowstore

MemSQL Architecture KAFKA MemSQL writers Memsql Cluster Dashboard Middleware (Vishnu)

MemSQL - why is it a solution • Fast •

MemSQL - why is it a solution • Data -

Recovery KAFKA (24h) MemSQL writers Master Memsql Cluster Dashboard Middleware

Yesterday S3 files Daily backup S3 files Daily Incremental Batch

Next steps - Architecture KAFKA writers - only new data

When Druid?

Thank you Join the Data Team! [email protected]