Druid @ Monitorama 2015

Slide 1

Slide 1 text

DRUID MONITORAMA 2015 GIAN MERLINO · METRICS ENTHUSIAST · DRUID COMMITTER @GIANMERLINO · @DRUIDIO

Slide 2

Slide 2 text

ORIGINS WHY DOES DRUID EXIST? DRUID HOW DRUID WORKS MINI-WORKSHOP TRY IT OUT FOR YOURSELF VISUALIZATIONS POWERED BY DRUID THE FUTURE SPOOKY! OVERVIEW

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

THE PROBLEM

Slide 5

Slide 5 text

THE PROBLEM

Slide 6

Slide 6 text

THE PROBLEM

Slide 7

Slide 7 text

THE PROBLEM ‣ Arbitrary, interactive exploration ‣ Multi-tenancy: thousands of concurrent users ‣ Recency: explore current data, alert on major changes ‣ Efﬁciency: each event is individually very low-value ‣ Scale: petabytes of raw data

Slide 8

Slide 8 text

THE PROBLEM ‣ Questions lead to more questions ‣ Interested not just in what happened, but why ‣ Dig into the dataset using ﬁlters, aggregates, and comparisons ‣ All interesting queries cannot be determined upfront

Slide 9

Slide 9 text

DRUID

Slide 10

Slide 10 text

DRUID ‣ Druid project started in 2011, went open source in 2012 ‣ Druid is an event stream database ‣ Low latency ingestion ‣ Ad-hoc aggregations (no precomputation) ‣ Can keep around a lot of history ‣ Community driven • 90+ contributors • In production at Yahoo!, Netﬂix, Metamarkets, many others

Slide 11

Slide 11 text

EVENT STREAMS ‣ Unifying feature: events happening over time ‣ Questions often time-oriented ‣ Monitoring: CPU usage over the past 3 days, in 5-min buckets ‣ Web analytics: Top pages by number of unique users this month ‣ Performance: 99%ile latency over the past hour

Slide 12

Slide 12 text

EVENT STREAMS {  “timestamp”: “2015-06-01T01:22:33Z”,   “page”: “Augustinian theodicy”,  “user”: “Glory of Space”, “user_country”: “USA”,   “delta_bytes”: -47,  “delta_words”: -7  }

Slide 13

Slide 13 text

EVENT STREAMS {  “timestamp”: “2015-06-01T01:22:33Z”,   “user”: “Gian Merlino”, “action”: “edit profile”, “host”: “host001.example.com”, “fields_edited”: [“email”, “phone”], “response_latency”: 38, “response_bytes”: 2041  }

Slide 14

Slide 14 text

TIME SERIES ‣ Measure your world with some resolution ‣ Timestamp each data point ‣ Name each series ‣ Maybe include some tags for ﬁltering ‣ Examples: %cpu, disk usage, network trafﬁc

Slide 15

Slide 15 text

TIME SERIES ‣ Druid is not a time series database! ‣ But that’s okay… ‣ …because time series are actually event streams

Slide 16

Slide 16 text

TIME SERIES AS EVENT STREAMS {  “timestamp”: “2015-06-01T01:22:33Z”,   “series_name”: “cpu”, “host”: “host001.example.com”, “cpu”: “cpu0”, “value”: 0.81  }

Slide 17

Slide 17 text

DRUID IN PRODUCTION

Slide 18

Slide 18 text

2014 REALTIME INGESTION >500K EVENTS / SECOND AVERAGE >1M EVENTS / SECOND PEAK 10 – 100K EVENTS / SECOND / CORE DRUID IN PRODUCTION

Slide 19

Slide 19 text

2014 CLUSTER SIZE  >500TB OF SEGMENTS (>20 TRILLION RAW EVENTS)  >5000 CORES (>350 NODES, >100TB RAM) IT’S CHEAP  MOST COST EFFECTIVE AT THIS SCALE DRUID IN PRODUCTION

Slide 20

Slide 20 text

2014 0.0 0.5 1.0 1.5 0 1 2 3 4 0 5 10 15 20 90%ile 95%ile 99%ile Feb 03 Feb 10 Feb 17 Feb 24 time query time (seconds) datasource a b c d e f g h Query latency percentiles QUERY LATENCY (500MS AVERAGE) 90% < 1S 95% < 5S 99% < 10S DRUID IN PRODUCTION

Slide 21

Slide 21 text

USING DRUID

Slide 22

Slide 22 text

QUERIES ‣ JSON over HTTP ‣ All computation pushed down to the data nodes

Slide 23

Slide 23 text

QUERIES ‣ TimeBoundary ‣ Timeseries ‣ GroupBy ‣ Approximate TopN ‣ …like GroupBy + limit, but faster

Slide 24

Slide 24 text

METRICS ‣ Count ‣ Sum ‣ Average ‣ Min/Max ‣ Approximate cardinality (HyperLogLog) ‣ Approximate histograms and quantiles ‣ Extend with custom metrics and sketches

Slide 25

Slide 25 text

CLIENT LIBRARIES ‣ Python: https://github.com/metamx/pydruid ‣ R: https://github.com/metamx/RDruid ‣ Ruby: https://github.com/madvertise/ruby-druid ‣ JavaScript: https://github.com/facetjs/facetjs ‣ SQL: https://github.com/facetjs/facet-cli ‣ More at http://druid.io/docs/latest/Libraries.html

Slide 26

Slide 26 text

DO TRY THIS AT HOME

Slide 27

Slide 27 text

2013 STEP BY STEP ‣ https://github.com/gianm/druid-monitorama-2015 ‣ Kafka for ingestion ‣ Druid for analytics ‣ Grafana for visualization

Slide 28

Slide 28 text

2013 STEP BY STEP ‣ https://github.com/gianm/druid-monitorama-2015 ‣ Single machine setup ‣ Distributed setup needs a bit more conﬁguration (see the docs)

Slide 29

Slide 29 text

2013 MORE RESOURCES ‣ http://druid.io/ ‣ http://druid.io/docs/latest/Tutorials.html

Slide 30

Slide 30 text

INSIDE DRUID: DISTRIBUTION

Slide 31

Slide 31 text

2013 ARCHITECTURE Realtime Nodes Query API

Slide 32

Slide 32 text

2013 ‣ Ingest event streams ‣ Query data in-memory as soon as it is ingested ‣ Periodically create and “hand off” immutable segments REAL-TIME NODES

Slide 33

Slide 33 text

2013 ARCHITECTURE Realtime Nodes Query API Query API Historical Nodes a Hand Off Data

Slide 34

Slide 34 text

2013 ‣ Main workhorses of a Druid cluster ‣ Store immutable data segments ‣ Respond to queries HISTORICAL NODES

Slide 35

Slide 35 text

2013 ARCHITECTURE Query API Historical Nodes Realtime Nodes Query API Broker Nodes Query API Query Rewrite Scatter/Gather a Hand Off Data

Slide 36

Slide 36 text

2013 ‣ Knows which other nodes hold what data ‣ Query scatter/gather (send requests to nodes and merge results) BROKER NODES

Slide 37

Slide 37 text

ONE WEIRD TIP FOR FAST QUERIES ‣ Doctors hate it!

Slide 38

Slide 38 text

ONE WEIRD TIP FOR FAST QUERIES ‣ Two storage engines

Slide 39

Slide 39 text

ONE WEIRD TIP FOR FAST QUERIES ‣ Two storage engines ‣ Historical ‣ Time-partitioned, immutable, mmapped “Druid segments” ‣ Locality: Compute partial results on data nodes ‣ Fast ﬁltering: Global time index, local CONCISE/Roaring bitmaps ‣ Fast scans: Column-oriented, compressed

Slide 40

Slide 40 text

ONE WEIRD TIP FOR FAST QUERIES ‣ Two storage engines ‣ Real-time ‣ In-memory k/v tree + mmapped Druid segments ‣ Similar to memtable + sstable in RocksDB ‣ …but Druid segments can be queried much faster than sstables ‣ Periodically, merge and hand off Druid segments

Slide 41

Slide 41 text

INSIDE DRUID: SEGMENTS

Slide 42

Slide 42 text

2013 RAW DATA timestamp publisher advertiser gender country click price 2011-01-01T01:01:35Z bieberfever.com google.com Male USA 0 0.65 2011-01-01T01:03:63Z bieberfever.com google.com Male USA 0 0.62 2011-01-01T01:04:51Z bieberfever.com google.com Male USA 1 0.45 ... 2011-01-01T01:00:00Z ultratrimfast.com google.com Female UK 0 0.87 2011-01-01T02:00:00Z ultratrimfast.com google.com Female UK 0 0.99 2011-01-01T02:00:00Z ultratrimfast.com google.com Female UK 1 1.53

Slide 43

Slide 43 text

2013 ROLLUP DATA timestamp publisher advertiser gender country impressions clicks revenue 2011-01-01T01:00:00Z ultratrimfast.com google.com Male USA 1800 25 15.70 2011-01-01T01:00:00Z bieberfever.com google.com Male USA 2912 42 29.18 2011-01-01T02:00:00Z ultratrimfast.com google.com Male UK 1953 17 17.31 2011-01-01T02:00:00Z bieberfever.com google.com Male UK 3194 170 34.01 ‣ Truncate timestamps ‣ GroupBy over string columns (dimensions) ‣ Aggregate during ingestion when possible ‣ Can incrementally update aggregate rows

Slide 44

Slide 44 text

2013 PARTITION DATA timestamp publisher advertiser gender country impressions clicks revenue 2011-01-01T01:00:00Z ultratrimfast.com google.com Male USA 1800 25 15.70 2011-01-01T01:00:00Z bieberfever.com google.com Male USA 2912 42 29.18 2011-01-01T02:00:00Z ultratrimfast.com google.com Male UK 1953 17 17.31 2011-01-01T02:00:00Z bieberfever.com google.com Male UK 3194 170 34.01 ‣ Shard segments by time Segment 2011-01-01T02/2011-01-01T03 Segment 2011-01-01T01/2011-01-01T02

Slide 45

Slide 45 text

2013 Segment 2011-01-01T01/2011-01-01T02 COLUMN ORIENTED timestamp publisher advertiser gender country impressions clicks revenue 2011-01-01T01:00:00Z ultratrimfast.com google.com Male USA 1800 25 15.70 2011-01-01T01:00:00Z bieberfever.com google.com Male USA 2912 42 29.18 ‣ Scan/load only what you need ‣ Per-column compression (dictionary encoding, LZ4) ‣ Per-column indexes (CONCISE/Roaring bitmaps)

Slide 46

Slide 46 text

DRUID POWERED VISUALIZATIONS

Slide 47

Slide 47 text

METAMARKETS ‣ Dogfooded our BI tool as an ops monitoring tool

Slide 48

Slide 48 text

STREAM PROCESSING

Slide 49

Slide 49 text

STREAM PROCESSING

Slide 50

Slide 50 text

QUANTIPLY ‣ Financial services company ‣ Lots of microservices (1000+) ‣ Using Druid to ﬁnd and debug latency hot spots ‣ Graphics courtesy of Roger Hoover ([email protected])

Slide 51

Slide 51 text

LATENCY HEAT MAP ‣ Credit: Roger Hoover ([email protected])

Slide 52

Slide 52 text

LATENCY TREE MAP BY SERVICE ‣ Credit: Roger Hoover ([email protected]) ‣ Size by total time ‣ Color by deviation from norm

Slide 53

Slide 53 text

LATENCY TREE MAP BY DB BACKEND ‣ Credit: Roger Hoover ([email protected]) ‣ View by DB backend instead of service

Slide 54

Slide 54 text

DRILL DOWN ‣ Credit: Roger Hoover ([email protected]) ‣ Drill down on any dimension ‣ Approximate quantiles

Slide 55

Slide 55 text

GRAFANA ‣ https://github.com/Quantiply/grafana-plugins/tree/master/ features/druid ‣ Written by Roger Hoover ([email protected]) ‣ Works with Grafana 1.9.x

Slide 56

Slide 56 text

GRAFANA ‣ Credit: Roger Hoover ([email protected])

Slide 57

Slide 57 text

GRAFANA ‣ Setup instructions in the workshop repo! ‣ https://github.com/gianm/druid-monitorama-2015

Slide 58

Slide 58 text

INGESTION

Slide 59

Slide 59 text

2013 INGESTION Druid Realtime Workers Immediate Druid Historical Nodes Periodic Druid Broker Nodes Data Source User queries

Slide 60

Slide 60 text

2013 INGESTION Druid Realtime Workers Immediate Druid Historical Nodes Periodic Druid Broker Nodes User queries Kafka

Slide 61

Slide 61 text

2013 INGESTION Druid Realtime Workers Druid Historical Nodes Periodic Druid Broker Nodes Data Source User queries

Slide 62

Slide 62 text

2013 INGESTION Druid Realtime Workers Immediate Druid Historical Nodes Periodic Druid Broker Nodes Data Source Stream Processor User queries

Slide 63

Slide 63 text

2013 INGESTION Druid Realtime Workers Immediate Druid Historical Nodes Periodic Druid Broker Nodes User queries Kafka

Slide 64

Slide 64 text

THE MYSTERIOUS FUTURE

Slide 65

Slide 65 text

DEPENDENCY HIT LIST ‣ ZooKeeper for coordination ‣ MySQL for metadata storage

Slide 66

Slide 66 text

SIMPLER ARCHITECTURE ‣ Arose from lots of experimentation ‣ Consolidate node types ‣ Consolidate ingestion methods ‣ …we support four methods, that should probably be one or two

Slide 67

Slide 67 text

INGESTION WINDOW ‣ Allow real-time writes for any time period ‣ …in 0.8.x, real-time writes must be “recent” ‣ …although batch writes can cover any time period

Slide 68

Slide 68 text

PLUGGABLE INDEXES ‣ CONCISE/Roaring bitmap indexes built in ‣ Also an experimental R-tree spatial index ‣ Would like new indexes to be possible as extensions

Slide 69

Slide 69 text

VISUALIZATIONS ‣ Free, interactive, exploratory dashboard ‣ Grafana nice but a bit too static, lacks context

Slide 70

Slide 70 text

TAKE AWAYS

Slide 71

Slide 71 text

TAKE AWAYS ‣ Think about metrics as event streams rather than time series ‣ Druid is good for large datasets you want to query interactively ‣ Supporting infrastructure is a bit complex in current versions ‣ But not too bad if you already use Kafka ‣ …which you should!

Slide 72

Slide 72 text

THANK YOU