Druid: Interactive Analytics at Scale

Slide 1

Slide 1 text

DRUID INTERACTIVE ANALYTICS AT SCALE FANGJIN YANG · DRUID COMMITTER

Slide 2

Slide 2 text

OVERVIEW DEMO SEE SOME NEAT THINGS MOTIVATION WHY DRUID? ARCHITECTURE PICTURES WITH ARROWS COMMUNITY CONTRIBUTE TO DRUID

Slide 3

Slide 3 text

DEMO IN CASE THE INTERNET DIDN’T WORK PRETEND YOU SAW SOMETHING COOL

Slide 4

Slide 4 text

MOTIVATION

Slide 5

Slide 5 text

2013 THE PROBLEM ‣ Arbitrary and interactive exploration of event data • Online advertising • System/application metrics • Network trafﬁc monitoring • Activity stream analysis ‣ Multi-tenancy: lots of concurrent users ‣ Scalability: 20+ TB/day, ad-hoc queries on trillions of events ‣ Recency matters! Real-time analysis

Slide 6

Slide 6 text

2013 FINDING A SOLUTION ‣ Load all your data into Hadoop. Query it. Done! ‣ Good job guys, let’s go home

Slide 7

Slide 7 text

2013 FINDING A SOLUTION Hadoop Event Streams Insight

Slide 8

Slide 8 text

2013 PROBLEMS WITH THE NAIVE SOLUTION ‣ MapReduce can handle almost every distributed computing problem ‣ MapReduce over your raw data is ﬂexible but slow ‣ Hadoop is not optimized for query latency ‣ To optimize queries, we need a query layer

Slide 9

Slide 9 text

2013 FINDING A SOLUTION Hadoop (pre-processing and storage) Query Layer Hadoop Event Streams Insight

Slide 10

Slide 10 text

2013 MAKE QUERIES FASTER ‣ What types of queries to optimize for? • Revenue over time broken down by demographic • Top publishers by clicks over the last month • Number of unique visitors broken down by any dimension • Not dumping the entire dataset • Not examining individual events

Slide 11

Slide 11 text

WHAT WE TRIED

Slide 12

Slide 12 text

2013 FINDING A SOLUTION Hadoop (pre-processing and storage) RDBMS Hadoop Event Streams Insight

Slide 13

Slide 13 text

2013 ‣ Common solution in data warehousing: • Star Schema • Aggregate Tables • Query Caching I. RDBMS - THE SETUP

Slide 14

Slide 14 text

2013 ‣ Queries that were cached • fast ‣ Queries against aggregate tables • fast to acceptable ‣ Queries against base fact table • generally unacceptable I. RDBMS - THE RESULTS

Slide 15

Slide 15 text

2013 I. RDBMS - PERFORMANCE Naive benchmark scan rate ~5.5M rows / second / core 1 day of summarized aggregates 60M+ rows 1 query over 1 week, 16 cores ~5 seconds Page load with 20 queries over a week of data long time

Slide 16

Slide 16 text

2013 FINDING A SOLUTION Hadoop (pre-processing and storage) NoSQL K/V Stores Hadoop Event Streams Insight

Slide 17

Slide 17 text

2013 ‣ Pre-aggregate all dimensional combinations ‣ Store results in a NoSQL store II. NOSQL - THE SETUP ts gender age revenue 1 M 18 $0.15 1 F 25 $1.03 1 F 18 $0.01 Key Value 1 revenue=$1.19 1,M revenue=$0.15 1,F revenue=$1.04 1,18 revenue=$0.16 1,25 revenue=$1.03 1,M,18 revenue=$0.15 1,F,18 revenue=$0.01 1,F,25 revenue=$1.03

Slide 18

Slide 18 text

2013 ‣ Queries were fast • range scan on primary key ‣ Inﬂexible • not aggregated, not available ‣ Not continuously updated • aggregate ﬁrst, then display ‣ Processing scales exponentially II. NOSQL - THE RESULTS

Slide 19

Slide 19 text

2013 ‣ Dimensional combinations => exponential increase ‣ Tried limiting dimensional depth • still expands exponentially ‣ Example: ~500k records • 11 dimensions, 5-deep • 4.5 hours on a 15-node Hadoop cluster • 14 dimensions, 5-deep • 9 hours on a 25-node Hadoop cluster II. NOSQL - PERFORMANCE

Slide 20

Slide 20 text

2013 FINDING A SOLUTION Hadoop (pre-processing and storage) Commercial Databases Hadoop Event Streams Insight

Slide 21

Slide 21 text

DRUID AS A QUERY LAYER

Slide 22

Slide 22 text

2013 KEY FEATURES LOW LATENCY INGESTION FAST AGGREGATIONS ARBITRARY SLICE-N-DICE CAPABILITIES HIGHLY AVAILABLE APPROXIMATE & EXACT CALCULATIONS DRUID

Slide 23

Slide 23 text

DATA STORAGE

Slide 24

Slide 24 text

2013 DATA! timestamp page language city country ... added deleted 2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65 2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62 2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45 2011-01-01T01:00:00Z Ke$ha en Calgary CA 17 87 2011-01-01T02:00:00Z Ke$ha en Calgary CA 43 99 2011-01-01T02:00:00Z Ke$ha en Calgary CA 12 53 ...

Slide 25

Slide 25 text

2013 PARTITION DATA timestamp page language city country ... added deleted 2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65 2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62 2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45 2011-01-01T01:00:00Z Ke$ha en Calgary CA 17 87 2011-01-01T02:00:00Z Ke$ha en Calgary CA 43 99 2011-01-01T02:00:00Z Ke$ha en Calgary CA 12 53 ‣ Shard data by time ‣ Immutable chunks of data called “segments” Segment 2011-01-01T02/2011-01-01T03 Segment 2011-01-01T01/2011-01-01T02 Segment 2011-01-01T00/2011-01-01T01

Slide 26

Slide 26 text

2013 IMMUTABLE SEGMENTS ‣ Fundamental storage unit in Druid ‣ No contention between reads and writes ‣ One thread scans one segment ‣ Multiple threads can access same underlying data ‣ Segment sizes -> computation completes in 100s of ms ‣ Simpliﬁes distribution & replication

Slide 27

Slide 27 text

2013 COLUMNAR STORAGE ‣ Scan/load only what you need ‣ Compression! ‣ Indexes! timestamp page language city country ... added deleted 2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65 2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62 2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45 2011-01-01T01:00:00Z Ke$ha en Calgary CA 17 87 2011-01-01T02:00:00Z Ke$ha en Calgary CA 43 99 2011-01-01T02:00:00Z Ke$ha en Calgary CA 12 53 ...

Slide 28

Slide 28 text

2013 COLUMN COMPRESSION · DICTIONARIES ‣ Create ids • Justin Bieber -> 0, Ke$ha -> 1 ‣ Store • page -> [0 0 0 1 1 1] • language -> [0 0 0 0 0 0] timestamp page language city country ... added deleted 2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65 2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62 2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45 2011-01-01T01:00:00Z Ke$ha en Calgary CA 17 87 2011-01-01T02:00:00Z Ke$ha en Calgary CA 43 99 2011-01-01T02:00:00Z Ke$ha en Calgary CA 12 53 ...

Slide 29

Slide 29 text

2013 BITMAP INDICES ‣ Justin Bieber -> [0, 1, 2] -> [111000] ‣ Ke$ha -> [3, 4, 5] -> [000111] timestamp page language city country ... added deleted 2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65 2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62 2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45 2011-01-01T01:00:00Z Ke$ha en Calgary CA 17 87 2011-01-01T02:00:00Z Ke$ha en Calgary CA 43 99 2011-01-01T02:00:00Z Ke$ha en Calgary CA 12 53 ...

Slide 30

Slide 30 text

2013 FAST AND FLEXIBLE QUERIES JUSTIN BIEBER [1, 1, 0, 0] KE$HA [0, 0, 1, 1] JUSTIN BIEBER OR KE$HA [1, 1, 1, 1] row page 0 Justin(Bieber 1 Justin(Bieber 2 Ke$ha 3 Ke$ha

Slide 31

Slide 31 text

2013 BITMAP INDEX COMPRESSION ‣ Supports CONCISE and Roaring • Boolean operations directly on compressed indices • Less memory => faster scan rates ‣ More details • http://ricerca.mat.uniroma3.it/users/colanton/concise.html • http://roaringbitmap.org/

Slide 32

Slide 32 text

ARCHITECTURE

Slide 33

Slide 33 text

2013 ARCHITECTURE (BATCH ONLY) Historical Node Historical Node Historical Node Hadoop Data Segments

Slide 34

Slide 34 text

2013 ‣ Main workhorses of a Druid cluster ‣ Shared-nothing architecture ‣ Load immutable read-optimized data ‣ Respond to queries HISTORICAL NODES

Slide 35

Slide 35 text

2013 ARCHITECTURE (BATCH ONLY) Broker Node Historical Node Historical Node Historical Node Broker Node Queries Hadoop Data Segments

Slide 36

Slide 36 text

2013 ‣ Knows which nodes hold what data ‣ Query scatter/gather (send requests to nodes and merge results) ‣ Caching BROKER NODES

Slide 37

Slide 37 text

2013 EVOLVING A SOLUTION Hadoop (pre-processing and storage) Druid Hadoop Event Streams Insight

Slide 38

Slide 38 text

2013 MORE PROBLEMS ‣ We’ve solved the query problem • Druid gave us arbitrary data exploration & fast queries ‣ But what about data freshness? • Batch loading is slow! • We want “real-time” • Alerts, operational monitoring, etc.

Slide 39

Slide 39 text

2013 FAST LOADING WITH DRUID ‣ We have an indexing system ‣ We have a serving system that runs queries on data ‣ We can serve queries while building indexes! ‣ Real-time indexing workers do this

Slide 40

Slide 40 text

2013 ‣ Log-structured merge-tree ‣ Ingest data and buffer events in memory in a write-optimized data structure ‣ Periodically persist collected events to disk (converting to a read-optimized format) ‣ Query data as soon as it is ingested REAL-TIME NODES

Slide 41

Slide 41 text

2013 ARCHITECTURE (STREAMING-ONLY) Broker Node Historical Node Historical Node Historical Node Broker Node Queries Real-time Nodes Streaming Data Segments

Slide 42

Slide 42 text

2013 ARCHITECTURE (LAMBDA) Broker Node Historical Node Historical Node Historical Node Broker Node Queries Hadoop Batch Data Segments Real-time Nodes Streaming Data Segments

Slide 43

Slide 43 text

2013 REPLICATION ROLLING DEPLOYMENTS + RESTARTS GROW = START PROCESSES SHRINK = KILL PROCESSES 3 YEARS · NO DOWNTIME FOR SOFTWARE UPDATE AVAILABILITY

Slide 44

Slide 44 text

2013 ASK ABOUT OUR PLUGINS ‣ Extensible architecture • Build your own modules and extend Druid • Add your own complex metrics (cardinality estimation, approximate histograms and quantiles, approximate top-K algorithms, etc.) • Add your own proprietary modules

Slide 45

Slide 45 text

DRUID TODAY

Slide 46

Slide 46 text

2013 THE COMMUNITY ‣ Growing Community • 50+ contributors from many different companies • In production at multiple companies, we’re hoping for more! • Ad-tech, network trafﬁc, operations, activity streams, etc. • Support through community forums and IRC • We love contributions!

Slide 47

Slide 47 text

2014 REALTIME INGESTION >500K EVENTS / SECOND SUSTAINED >1M EVENTS / SECOND AT PEAK 10 – 100K EVENTS / SECOND / CORE DRUID IN PRODUCTION

Slide 48

Slide 48 text

2014 CLUSTER SIZE  >500TB OF SEGMENTS (>20 TRILLION RAW EVENTS)  >5000 CORES (>400 NODES, >100TB RAM) IT’S CHEAP  MOST COST EFFECTIVE AT THIS SCALE DRUID IN PRODUCTION

Slide 49

Slide 49 text

2014 0.0 0.5 1.0 1.5 0 1 2 3 4 0 5 10 15 20 90%ile 95%ile 99%ile Feb 03 Feb 10 Feb 17 Feb 24 time query time (seconds) datasource a b c d e f g h Query latency percentiles QUERY LATENCY (500MS AVERAGE) 95% < 1S 99% < 10S DRUID IN PRODUCTION

Slide 50

Slide 50 text

2014 QUERY VOLUME SEVERAL HUNDRED QUERIES / SECOND VARIETY OF GROUP BY & TOP-K QUERIES DRUID IN PRODUCTION

Slide 51

Slide 51 text

DRUID AND THE DATA INFRASTRUCTURE SPACE

Slide 52

Slide 52 text

2013 STREAMING SETUP Hadoop (pre-processing and storage) Druid Hadoop Event Streams Insight Kafka Samza Druid

Slide 53

Slide 53 text

2013 STREAMING ONLY INGESTION ‣ Stream processing isn’t perfect ‣ Difﬁcult to handle corrections of existing data ‣ Windows may be too small for fully accurate operations ‣ Hadoop was actually good at these things

Slide 54

Slide 54 text

2013 OPEN SOURCE LAMBDA ARCHITECTURE Event Streams Insight Kafka Hadoop Druid Samza ‣ Real-time ‣ Only on-time data ‣ Some hours later ‣ All data

Slide 55

Slide 55 text

2013 TAKE-AWAYS ‣ When Druid? • Interactive, fast exploration of large amounts of data • You need analytics (not key value store) • You want to do your analysis on data as it’s happening (realtime) • You need availability, extensibility and ﬂexibility