DRUID
A REAL-TIME ANALYTICAL DATA STORE
ERIC TSCHETTER · FANGJIN YANG · XAVIER LÉAUTÉ
DRUID COMMITTERS
DRUID.IO
@DRUIDIO
Slide 2
Slide 2 text
MOTIVATION
Slide 3
Slide 3 text
2013
THIS PRESENTATION
‣ Description of a production system
‣ Benchmarks and numbers
‣ Many hard problems looking for solutions
Slide 4
Slide 4 text
2013
But First, a DEMO
Slide 5
Slide 5 text
2013
DATA EXPLORATION
Slide 6
Slide 6 text
2013
WHAT IS DRUID?
‣ Open-source
‣ Column-oriented
‣ Distributed
‣ Approximate or Exact
‣ Real-time
‣ Highly-Available
‣ Data store
Slide 7
Slide 7 text
2013
I.E. IT IS SIMILAR/COMPLIMENTARY TO
‣ Apache Hadoop
‣ Apache Storm
‣ Apache Elastic Search
‣ Apache Spark
‣ Facebook Presto
‣ Google Dremel
‣ Google PowerDrill
‣ etc.
2013
‣ Log-structured merge-tree
‣ Ingest data and buffer events in
memory in a write-optimized data
structure
‣ Periodically persist collected
events to disk (converting to a
read-optimized format)
‣ Query data as soon as it is ingested
REAL-TIME NODES
Slide 12
Slide 12 text
2013
ARCHITECTURE
Realtime Nodes
Query API
Query API
Historical Nodes
a
Hand Off Data
Slide 13
Slide 13 text
2013
‣ Main workhorses of a Druid cluster
‣ Shared-nothing architecture
‣ Load immutable read-optimized data
‣ Respond to queries
HISTORICAL NODES
Slide 14
Slide 14 text
2013
ARCHITECTURE
Query API
Historical Nodes
Realtime Nodes
Query API
Broker Nodes
Query API
Query Rewrite
Scatter/Gather
a
Hand Off Data
Slide 15
Slide 15 text
2013
‣ Query scatter/gather (send requests to nodes and merge results)
‣ (Distributed) Caching
BROKER NODES
Slide 16
Slide 16 text
2013
COLUMN COMPRESSION · DICTIONARIES
‣ Create ids
• Justin Bieber -> 0, Ke$ha -> 1
‣ Store
• page -> [0 0 0 1 1 1]
• language -> [0 0 0 0 0 0]
timestamp page language city country ... added deleted!
2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65!
2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62!
2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45!
2011-01-01T01:00:00Z Ke$ha en Calgary CA 17 87!
2011-01-01T02:00:00Z Ke$ha en Calgary CA 43 99!
2011-01-01T02:00:00Z Ke$ha en Calgary CA 12 53!
...
Slide 17
Slide 17 text
2013
BITMAP INDICES
‣ Justin Bieber -> [0, 1, 2] -> [111000]
‣ Ke$ha -> [3, 4, 5] -> [000111]
‣ Compressed with CONCISE
timestamp page language city country ... added deleted!
2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65!
2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62!
2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45!
2011-01-01T01:00:00Z Ke$ha en Calgary CA 17 87!
2011-01-01T02:00:00Z Ke$ha en Calgary CA 43 99!
2011-01-01T02:00:00Z Ke$ha en Calgary CA 12 53!
...
2014
REALTIME INGESTION
>200K EVENTS / SECOND SUSTAINED
10 – 100K EVENTS / SECOND / CORE
!
!
!
DRUID IN PRODUCTION
Slide 21
Slide 21 text
2014
CLUSTER SIZE
120TB OF DATA (~8 TRILLION RAW EVENTS)
>2000 CORES (>100 NODES, 15TB RAM)
!
IT’S CHEAP
MOST COST EFFECTIVE AT THIS SCALE
DRUID IN PRODUCTION
Slide 22
Slide 22 text
2014
0.0
0.5
1.0
1.5
0
1
2
3
4
0
5
10
15
20
90%ile 95%ile 99%ile
Feb 03 Feb 10 Feb 17 Feb 24
time
query time (seconds)
datasource
a
b
c
d
e
f
g
h
Query latency percentiles
QUERY LATENCY (500MS AVERAGE)
90% < 1S 95% < 5S 99% < 10S
!
!
!
DRUID IN PRODUCTION
Slide 23
Slide 23 text
2014
QUERY VOLUME
>1000 QUERIES / MINUTE
VARIETY OF GROUP BY & TOP-K QUERIES
!
!
!
DRUID IN PRODUCTION
Slide 24
Slide 24 text
2013
MULTI-TENANCY IS HARD
‣ Prioritize
• Recent data over old data, short queries over long queries
‣ Make it Stop
• Ability to time-out / cancel queries
‣ Small units of computation
• Keep shards small
‣ Cost / performance trade-offs
• Put redundant copies on cheaper hardware
• Tweak ratio of data in memory / on-disk
Slide 25
Slide 25 text
2013
DRUID AS A PLATFORM
Batch Ingestion (Hadoop) Streaming Ingestion (Storm)
Druid
Approximate
Algorithms
Machine Learning
(SciPy, R, ScalaNLP)
Visualizations
Slide 26
Slide 26 text
2013
LOOKING FOR COLLABORATION
‣ Druid is open source
!
‣ Multi-tenancy
• Tied Requests, Better resource management, query prioritization
‣ Approximate Algorithms
• must be memory bounded and allow distributed folding
• e.g. HyperLogLog, approximate histograms, t-digest
‣ Memory bounded, fast, distributed Joins