ACM/SIGMOD 2014

DRUID A REAL-TIME ANALYTICAL DATA STORE ERIC TSCHETTER · FANGJIN
YANG · XAVIER LÉAUTÉ DRUID COMMITTERS DRUID.IO @DRUIDIO

MOTIVATION

2013 THIS PRESENTATION ‣ Description of a production system ‣
Benchmarks and numbers ‣ Many hard problems looking for solutions

2013 But First, a DEMO

2013 DATA EXPLORATION

2013 WHAT IS DRUID? ‣ Open-source ‣ Column-oriented ‣ Distributed
‣ Approximate or Exact ‣ Real-time ‣ Highly-Available ‣ Data store

2013 I.E. IT IS SIMILAR/COMPLIMENTARY TO ‣ Apache Hadoop ‣
Apache Storm ‣ Apache Elastic Search ‣ Apache Spark ‣ Facebook Presto ‣ Google Dremel ‣ Google PowerDrill ‣ etc.

2013 HARD PROBLEMS ‣ Multi-tenancy ‣ Selecting approximation algorithms ‣
Fast, distributed joins

ARCHITECTURE

2013 ARCHITECTURE Realtime Nodes Query API

2013 ‣ Log-structured merge-tree ‣ Ingest data and buffer events
in memory in a write-optimized data structure ‣ Periodically persist collected events to disk (converting to a read-optimized format) ‣ Query data as soon as it is ingested REAL-TIME NODES

2013 ARCHITECTURE Realtime Nodes Query API Query API Historical Nodes
a Hand Off Data

2013 ‣ Main workhorses of a Druid cluster ‣ Shared-nothing
architecture ‣ Load immutable read-optimized data ‣ Respond to queries HISTORICAL NODES

2013 ARCHITECTURE Query API Historical Nodes Realtime Nodes Query API
Broker Nodes Query API Query Rewrite Scatter/Gather a Hand Off Data

2013 ‣ Query scatter/gather (send requests to nodes and merge
results) ‣ (Distributed) Caching BROKER NODES

2013 COLUMN COMPRESSION · DICTIONARIES ‣ Create ids • Justin
Bieber -> 0, Ke$ha -> 1 ‣ Store • page -> [0 0 0 1 1 1] • language -> [0 0 0 0 0 0] timestamp page language city country ... added deleted! 2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65! 2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62! 2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45! 2011-01-01T01:00:00Z Ke$ha en Calgary CA 17 87! 2011-01-01T02:00:00Z Ke$ha en Calgary CA 43 99! 2011-01-01T02:00:00Z Ke$ha en Calgary CA 12 53! ...

2013 BITMAP INDICES ‣ Justin Bieber -> [0, 1, 2]
-> [111000] ‣ Ke$ha -> [3, 4, 5] -> [000111] ‣ Compressed with CONCISE timestamp page language city country ... added deleted! 2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65! 2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62! 2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45! 2011-01-01T01:00:00Z Ke$ha en Calgary CA 17 87! 2011-01-01T02:00:00Z Ke$ha en Calgary CA 43 99! 2011-01-01T02:00:00Z Ke$ha en Calgary CA 12 53! ...

2013 FAST AND FLEXIBLE QUERIES JUSTIN BIEBER [1, 1, 0,
0] KE$HA [0, 0, 1, 1] JUSTIN BIEBER OR KE$HA [1, 1, 1, 1] row page 0 Justin(Bieber 1 Justin(Bieber 2 Ke$ha 3 Ke$ha

DRUID IN PRACTICE

2014 REALTIME INGESTION >200K EVENTS / SECOND SUSTAINED 10 –
100K EVENTS / SECOND / CORE ! ! ! DRUID IN PRODUCTION

2014 CLUSTER SIZE  120TB OF DATA (~8 TRILLION RAW EVENTS) 
>2000 CORES (>100 NODES, 15TB RAM) ! IT’S CHEAP  MOST COST EFFECTIVE AT THIS SCALE DRUID IN PRODUCTION

2014 0.0 0.5 1.0 1.5 0 1 2 3 4
0 5 10 15 20 90%ile 95%ile 99%ile Feb 03 Feb 10 Feb 17 Feb 24 time query time (seconds) datasource a b c d e f g h Query latency percentiles QUERY LATENCY (500MS AVERAGE) 90% < 1S 95% < 5S 99% < 10S ! ! ! DRUID IN PRODUCTION

2014 QUERY VOLUME >1000 QUERIES / MINUTE VARIETY OF GROUP
BY & TOP-K QUERIES ! ! ! DRUID IN PRODUCTION

2013 MULTI-TENANCY IS HARD ‣ Prioritize • Recent data over
old data, short queries over long queries ‣ Make it Stop • Ability to time-out / cancel queries ‣ Small units of computation • Keep shards small ‣ Cost / performance trade-offs • Put redundant copies on cheaper hardware • Tweak ratio of data in memory / on-disk

2013 DRUID AS A PLATFORM Batch Ingestion (Hadoop) Streaming Ingestion
(Storm) Druid Approximate Algorithms Machine Learning (SciPy, R, ScalaNLP) Visualizations

2013 LOOKING FOR COLLABORATION ‣ Druid is open source !
‣ Multi-tenancy • Tied Requests, Better resource management, query prioritization ‣ Approximate Algorithms • must be memory bounded and allow distributed folding • e.g. HyperLogLog, approximate histograms, t-digest ‣ Memory bounded, fast, distributed Joins

THANK YOU

ACM/SIGMOD 2014

ACM/SIGMOD 2014

Druid

More Decks by Druid

Other Decks in Technology

Featured

Transcript

DRUID A REAL-TIME ANALYTICAL DATA STORE ERIC TSCHETTER · FANGJIN

MOTIVATION

2013 THIS PRESENTATION ‣ Description of a production system ‣

2013 But First, a DEMO

2013 DATA EXPLORATION

2013 WHAT IS DRUID? ‣ Open-source ‣ Column-oriented ‣ Distributed

2013 I.E. IT IS SIMILAR/COMPLIMENTARY TO ‣ Apache Hadoop ‣

2013 HARD PROBLEMS ‣ Multi-tenancy ‣ Selecting approximation algorithms ‣

ARCHITECTURE

2013 ARCHITECTURE Realtime Nodes Query API

2013 ‣ Log-structured merge-tree ‣ Ingest data and buffer events

2013 ARCHITECTURE Realtime Nodes Query API Query API Historical Nodes

2013 ‣ Main workhorses of a Druid cluster ‣ Shared-nothing

2013 ARCHITECTURE Query API Historical Nodes Realtime Nodes Query API

2013 ‣ Query scatter/gather (send requests to nodes and merge

2013 COLUMN COMPRESSION · DICTIONARIES ‣ Create ids • Justin

2013 BITMAP INDICES ‣ Justin Bieber -> [0, 1, 2]

2013 FAST AND FLEXIBLE QUERIES JUSTIN BIEBER [1, 1, 0,

DRUID IN PRACTICE

2014 REALTIME INGESTION >200K EVENTS / SECOND SUSTAINED 10 –

2014 CLUSTER SIZE  120TB OF DATA (~8 TRILLION RAW EVENTS)

2014 0.0 0.5 1.0 1.5 0 1 2 3 4

2014 QUERY VOLUME >1000 QUERIES / MINUTE VARIETY OF GROUP

2013 MULTI-TENANCY IS HARD ‣ Prioritize • Recent data over

2013 DRUID AS A PLATFORM Batch Ingestion (Hadoop) Streaming Ingestion

2013 LOOKING FOR COLLABORATION ‣ Druid is open source !

THANK YOU