Druid - A Realtime Analytical Data Store

DRUID A REAL-TIME ANALYTICAL DATA STORE NISHANT BANGARWA DRUID COMMITTER
DRUID.IO @DRUIDIO

2014 THIS PRESENTATION ‣ Demo ‣ Problems ‣ Druid Architecture
‣ Benchmarks and numbers ‣ Many hard problems looking for solutions

DEMO IN CASE THE INTERNET DIDN’T WORK PRETEND YOU SAW
SOMETHING COOL

PROBLEMS

2013 WE HAVE LOT OF EVENT DATA! timestamp page language
city country ... added deleted! 2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65! 2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62! 2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45! 2011-01-01T01:00:00Z Ke$ha en Calgary CA 17 87! 2011-01-01T02:00:00Z Ke$ha en Calgary CA 43 99! 2011-01-01T02:00:00Z Ke$ha en Calgary CA 12 53! ...

2014 DATA EXPLORATION

2014 DATA INGESTION

2014 AVAILABILITY

WHAT WE TRIED

2014 I. RDBMS - Relational Database WHAT WE TRIED

2014 ‣ Star Schema ‣ Aggregate Tables ‣ Query Caching
I. RDBMS - THE SETUP

2014 ‣ Queries that were cached • fast ‣ Queries
against aggregate tables • fast to acceptable ‣ Queries against base fact table • generally unacceptable I. RDBMS - THE RESULTS

2014 I. RDBMS - PERFORMANCE Naive benchmark scan rate ~5.5M
rows / second / core 1 day of summarized aggregates 60M+ rows 1 query over 1 week, 16 cores ~5 seconds Page load with 20 queries over a week of data long time

2014 I. RDBMS - Relational Database WHAT WE TRIED

2014 I. RDMBS - Relational Database II.NoSQL - Key/Value Store
WHAT WE TRIED

2014 ‣ Pre-aggregate all dimensional combinations ‣ Store results in
a NoSQL store II. NOSQL - THE SETUP ts gender age revenue 1 M 18 US$ 0.15 1 F 25 US$ 1.03 1 F 18 US$ 0.01 Key Value 1 revenue=$1.19 1,M revenue=$0.15 1,F revenue=$1.04 1,18 revenue=$0.16 1,25 revenue=$1.03 1,M,18 revenue=$0.15 1,F,18 revenue=$0.01 1,F,25 revenue=$1.03

2014 ‣ Queries were fast • range scan on primary
key ‣ Inﬂexible • not aggregated, not available ‣ Not continuously updated • aggregate ﬁrst, then display ‣ Processing scales exponentially II. NOSQL - THE RESULTS

2014 ‣ Dimensional combinations => exponential increase ‣ Tried limiting
dimensional depth • still expands exponentially ‣ Example: ~500k records • 11 dimensions, 5-deep • 4.5 hours on a 15-node Hadoop cluster • 14 dimensions, 5-deep • 9 hours on a 25-node Hadoop cluster II. NOSQL - PERFORMANCE

WHAT WE TRIED

III. ??? WHAT WE TRIED

2014 ‣ Problem with RDBMS: scans are slow ‣ Problem
with NoSQL: computationally intractable WHAT WE LEARNED

2014 ‣ Problem with RDBMS: scans are slow ‣ Problem
with NoSQL: computationally intractable ! ! ! ! ! ‣ Tackling the RDBMS issue seems easier WHAT WE LEARNED

“ ” INTRODUCING DRUID

2014 WHAT IS DRUID? ‣ Open-source ‣ Column-oriented ‣ Distributed
‣ Fast ‣ Real-time ‣ Highly-Available ‣ Approximate or Exact ‣ Data store

ARCHITECTURE

2014 ARCHITECTURE Realtime Nodes Query API

2014 ‣ Log-structured merge-tree ‣ Ingest data and buffer events
in memory in a write-optimized data structure ‣ Periodically persist collected events to disk (converting to a read-optimized format) ‣ Query data as soon as it is ingested REAL-TIME NODES

2014 ARCHITECTURE Realtime Nodes Query API Query API Historical Nodes
a Hand Off Data

2014 ‣ Main workhorses of a Druid cluster ‣ Shared-nothing
architecture ‣ Load immutable read-optimized data ‣ Respond to queries HISTORICAL NODES

2014 ARCHITECTURE Query API Historical Nodes Realtime Nodes Query API
Broker Nodes Query API Query Rewrite Scatter/Gather a Hand Off Data

2014 ‣ Query scatter/gather (send requests to nodes and merge
results) ‣ (Distributed) Caching BROKER NODES

Historical Nodes Broker Nodes Real-time Nodes Data Stream Coordinator Nodes
Queries ARCHITECTURE a Hand Off Data

2013 • Distributes data across historical nodes • Assigns historical
nodes to load & drop data • Manages replication COORDINATOR NODES

2013 DATA! timestamp page language city country ... added deleted!
2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65! 2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62! 2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45! 2011-01-01T01:00:00Z Ke$ha en Calgary CA 17 87! 2011-01-01T02:00:00Z Ke$ha en Calgary CA 43 99! 2011-01-01T02:00:00Z Ke$ha en Calgary CA 12 53! ...

2014 COLUMN COMPRESSION · DICTIONARIES ‣ Create ids • Justin
Bieber -> 0, Ke$ha -> 1 ‣ Store • page -> [0 0 0 1 1 1] • language -> [0 0 0 0 0 0] timestamp page language city country ... added deleted! 2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65! 2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62! 2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45! 2011-01-01T01:00:00Z Ke$ha en Calgary CA 17 87! 2011-01-01T02:00:00Z Ke$ha en Calgary CA 43 99! 2011-01-01T02:00:00Z Ke$ha en Calgary CA 12 53! ...

2014 BITMAP INDICES ‣ Justin Bieber -> [0, 1, 2]
-> [111000] ‣ Ke$ha -> [3, 4, 5] -> [000111] ‣ Compressed with CONCISE timestamp page language city country ... added deleted! 2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65! 2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62! 2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45! 2011-01-01T01:00:00Z Ke$ha en Calgary CA 17 87! 2011-01-01T02:00:00Z Ke$ha en Calgary CA 43 99! 2011-01-01T02:00:00Z Ke$ha en Calgary CA 12 53! ...

2014 FAST AND FLEXIBLE QUERIES JUSTIN BIEBER [1, 1, 0,
0] KE$HA [0, 0, 1, 1] JUSTIN BIEBER OR KE$HA [1, 1, 1, 1] row page 0 Justin(Bieber 1 Justin(Bieber 2 Ke$ha 3 Ke$ha

DRUID IN PRACTICE

2014 REALTIME INGESTION >200K EVENTS / SECOND SUSTAINED 10 –
100K EVENTS / SECOND / CORE ! ! ! DRUID IN PRODUCTION

2014 CLUSTER SIZE  150TB OF DATA (~10 TRILLION RAW EVENTS) 
>2000 CORES (>100 NODES, 15TB RAM) ! IT’S CHEAP  MOST COST EFFECTIVE AT THIS SCALE DRUID IN PRODUCTION

2014 0.0 0.5 1.0 1.5 0 1 2 3 4
0 5 10 15 20 90%ile 95%ile 99%ile Feb 03 Feb 10 Feb 17 Feb 24 time query time (seconds) datasource a b c d e f g h Query latency percentiles QUERY LATENCY (500MS AVERAGE) 90% < 1S 95% < 5S 99% < 10S ! ! ! DRUID IN PRODUCTION

2014 QUERY VOLUME >1000 QUERIES / MINUTE VARIETY OF GROUP
BY & TOP-K QUERIES ! ! ! DRUID IN PRODUCTION

2013 REPLICATION ROLLING DEPLOYMENTS + RESTARTS GROW = START PROCESSES
SHRINK = KILL PROCESSES 2 YEARS · NO DOWNTIME FOR SOFTWARE UPDATE AVAILABILITY

2013 DRUID AS A PLATFORM Batch Ingestion (Hadoop) Streaming Ingestion
(Storm) Druid Approximate Algorithms Machine Learning (SciPy, R, ScalaNLP) Visualizations

2014 LOOKING FOR COLLABORATION ‣ Druid is open source !
‣ Multi-tenancy • Tied Requests, Better resource management, query prioritization ‣ Approximate Algorithms • must be memory bounded and allow distributed folding • e.g. HyperLogLog, approximate histograms, t-digest ‣ Memory bounded, fast, distributed Joins

THANK YOU

Fangjin Yang & Nelson Ray 2013 • Eric Tschetter •
Fangjin Yang • Xavier Léauté • Gian Merlino ACKNOWLEDGEMENTS

Compute Nodes Broker Nodes Real-time Nodes Zookeeper Queries MySQL Data
Stream Master Nodes ARCHITECTURE Deep Storage

Druid - A Realtime Analytical Data Store

Druid - A Realtime Analytical Data Store

More Decks by Nishant

Other Decks in Technology

Featured

Transcript