Druid @ Monitorama 2015

DRUID MONITORAMA 2015 GIAN MERLINO · METRICS ENTHUSIAST · DRUID
COMMITTER @GIANMERLINO · @DRUIDIO

ORIGINS WHY DOES DRUID EXIST? DRUID HOW DRUID WORKS MINI-WORKSHOP
TRY IT OUT FOR YOURSELF VISUALIZATIONS POWERED BY DRUID THE FUTURE SPOOKY! OVERVIEW

THE PROBLEM

THE PROBLEM ‣ Arbitrary, interactive exploration ‣ Multi-tenancy: thousands of
concurrent users ‣ Recency: explore current data, alert on major changes ‣ Efﬁciency: each event is individually very low-value ‣ Scale: petabytes of raw data

THE PROBLEM ‣ Questions lead to more questions ‣ Interested
not just in what happened, but why ‣ Dig into the dataset using ﬁlters, aggregates, and comparisons ‣ All interesting queries cannot be determined upfront

DRUID ‣ Druid project started in 2011, went open source
in 2012 ‣ Druid is an event stream database ‣ Low latency ingestion ‣ Ad-hoc aggregations (no precomputation) ‣ Can keep around a lot of history ‣ Community driven • 90+ contributors • In production at Yahoo!, Netﬂix, Metamarkets, many others

EVENT STREAMS ‣ Unifying feature: events happening over time ‣
Questions often time-oriented ‣ Monitoring: CPU usage over the past 3 days, in 5-min buckets ‣ Web analytics: Top pages by number of unique users this month ‣ Performance: 99%ile latency over the past hour

EVENT STREAMS {  “timestamp”: “2015-06-01T01:22:33Z”,   “page”: “Augustinian theodicy”,  “user”:
“Glory of Space”, “user_country”: “USA”,   “delta_bytes”: -47,  “delta_words”: -7  }

EVENT STREAMS {  “timestamp”: “2015-06-01T01:22:33Z”,   “user”: “Gian Merlino”, “action”:
“edit profile”, “host”: “host001.example.com”, “fields_edited”: [“email”, “phone”], “response_latency”: 38, “response_bytes”: 2041  }

TIME SERIES ‣ Measure your world with some resolution ‣
Timestamp each data point ‣ Name each series ‣ Maybe include some tags for ﬁltering ‣ Examples: %cpu, disk usage, network trafﬁc

TIME SERIES ‣ Druid is not a time series database!
‣ But that’s okay… ‣ …because time series are actually event streams

TIME SERIES AS EVENT STREAMS {  “timestamp”: “2015-06-01T01:22:33Z”,   “series_name”:
“cpu”, “host”: “host001.example.com”, “cpu”: “cpu0”, “value”: 0.81  }

DRUID IN PRODUCTION

2014 REALTIME INGESTION >500K EVENTS / SECOND AVERAGE >1M EVENTS
/ SECOND PEAK 10 – 100K EVENTS / SECOND / CORE DRUID IN PRODUCTION

2014 CLUSTER SIZE  >500TB OF SEGMENTS (>20 TRILLION RAW EVENTS) 
>5000 CORES (>350 NODES, >100TB RAM) IT’S CHEAP  MOST COST EFFECTIVE AT THIS SCALE DRUID IN PRODUCTION

2014 0.0 0.5 1.0 1.5 0 1 2 3 4
0 5 10 15 20 90%ile 95%ile 99%ile Feb 03 Feb 10 Feb 17 Feb 24 time query time (seconds) datasource a b c d e f g h Query latency percentiles QUERY LATENCY (500MS AVERAGE) 90% < 1S 95% < 5S 99% < 10S DRUID IN PRODUCTION

USING DRUID

QUERIES ‣ JSON over HTTP ‣ All computation pushed down
to the data nodes

QUERIES ‣ TimeBoundary ‣ Timeseries ‣ GroupBy ‣ Approximate TopN
‣ …like GroupBy + limit, but faster

METRICS ‣ Count ‣ Sum ‣ Average ‣ Min/Max ‣
Approximate cardinality (HyperLogLog) ‣ Approximate histograms and quantiles ‣ Extend with custom metrics and sketches

CLIENT LIBRARIES ‣ Python: https://github.com/metamx/pydruid ‣ R: https://github.com/metamx/RDruid ‣ Ruby:
https://github.com/madvertise/ruby-druid ‣ JavaScript: https://github.com/facetjs/facetjs ‣ SQL: https://github.com/facetjs/facet-cli ‣ More at http://druid.io/docs/latest/Libraries.html

DO TRY THIS AT HOME

2013 STEP BY STEP ‣ https://github.com/gianm/druid-monitorama-2015 ‣ Kafka for ingestion
‣ Druid for analytics ‣ Grafana for visualization

2013 STEP BY STEP ‣ https://github.com/gianm/druid-monitorama-2015 ‣ Single machine setup
‣ Distributed setup needs a bit more conﬁguration (see the docs)

2013 MORE RESOURCES ‣ http://druid.io/ ‣ http://druid.io/docs/latest/Tutorials.html

INSIDE DRUID: DISTRIBUTION

2013 ARCHITECTURE Realtime Nodes Query API

2013 ‣ Ingest event streams ‣ Query data in-memory as
soon as it is ingested ‣ Periodically create and “hand off” immutable segments REAL-TIME NODES

2013 ARCHITECTURE Realtime Nodes Query API Query API Historical Nodes
a Hand Off Data

2013 ‣ Main workhorses of a Druid cluster ‣ Store
immutable data segments ‣ Respond to queries HISTORICAL NODES

2013 ARCHITECTURE Query API Historical Nodes Realtime Nodes Query API
Broker Nodes Query API Query Rewrite Scatter/Gather a Hand Off Data

2013 ‣ Knows which other nodes hold what data ‣
Query scatter/gather (send requests to nodes and merge results) BROKER NODES

ONE WEIRD TIP FOR FAST QUERIES ‣ Doctors hate it!

ONE WEIRD TIP FOR FAST QUERIES ‣ Two storage engines

‣ Historical ‣ Time-partitioned, immutable, mmapped “Druid segments” ‣ Locality: Compute partial results on data nodes ‣ Fast ﬁltering: Global time index, local CONCISE/Roaring bitmaps ‣ Fast scans: Column-oriented, compressed

‣ Real-time ‣ In-memory k/v tree + mmapped Druid segments ‣ Similar to memtable + sstable in RocksDB ‣ …but Druid segments can be queried much faster than sstables ‣ Periodically, merge and hand off Druid segments

INSIDE DRUID: SEGMENTS

2013 RAW DATA timestamp publisher advertiser gender country click price
2011-01-01T01:01:35Z bieberfever.com google.com Male USA 0 0.65 2011-01-01T01:03:63Z bieberfever.com google.com Male USA 0 0.62 2011-01-01T01:04:51Z bieberfever.com google.com Male USA 1 0.45 ... 2011-01-01T01:00:00Z ultratrimfast.com google.com Female UK 0 0.87 2011-01-01T02:00:00Z ultratrimfast.com google.com Female UK 0 0.99 2011-01-01T02:00:00Z ultratrimfast.com google.com Female UK 1 1.53

2013 ROLLUP DATA timestamp publisher advertiser gender country impressions clicks
revenue 2011-01-01T01:00:00Z ultratrimfast.com google.com Male USA 1800 25 15.70 2011-01-01T01:00:00Z bieberfever.com google.com Male USA 2912 42 29.18 2011-01-01T02:00:00Z ultratrimfast.com google.com Male UK 1953 17 17.31 2011-01-01T02:00:00Z bieberfever.com google.com Male UK 3194 170 34.01 ‣ Truncate timestamps ‣ GroupBy over string columns (dimensions) ‣ Aggregate during ingestion when possible ‣ Can incrementally update aggregate rows

2013 PARTITION DATA timestamp publisher advertiser gender country impressions clicks
revenue 2011-01-01T01:00:00Z ultratrimfast.com google.com Male USA 1800 25 15.70 2011-01-01T01:00:00Z bieberfever.com google.com Male USA 2912 42 29.18 2011-01-01T02:00:00Z ultratrimfast.com google.com Male UK 1953 17 17.31 2011-01-01T02:00:00Z bieberfever.com google.com Male UK 3194 170 34.01 ‣ Shard segments by time Segment 2011-01-01T02/2011-01-01T03 Segment 2011-01-01T01/2011-01-01T02

2013 Segment 2011-01-01T01/2011-01-01T02 COLUMN ORIENTED timestamp publisher advertiser gender country
impressions clicks revenue 2011-01-01T01:00:00Z ultratrimfast.com google.com Male USA 1800 25 15.70 2011-01-01T01:00:00Z bieberfever.com google.com Male USA 2912 42 29.18 ‣ Scan/load only what you need ‣ Per-column compression (dictionary encoding, LZ4) ‣ Per-column indexes (CONCISE/Roaring bitmaps)

DRUID POWERED VISUALIZATIONS

METAMARKETS ‣ Dogfooded our BI tool as an ops monitoring
tool

STREAM PROCESSING

QUANTIPLY ‣ Financial services company ‣ Lots of microservices (1000+)
‣ Using Druid to ﬁnd and debug latency hot spots ‣ Graphics courtesy of Roger Hoover ([email protected])

LATENCY HEAT MAP ‣ Credit: Roger Hoover ([email protected])

LATENCY TREE MAP BY SERVICE ‣ Credit: Roger Hoover ([email protected])
‣ Size by total time ‣ Color by deviation from norm

LATENCY TREE MAP BY DB BACKEND ‣ Credit: Roger Hoover
([email protected]) ‣ View by DB backend instead of service

DRILL DOWN ‣ Credit: Roger Hoover ([email protected]) ‣ Drill down
on any dimension ‣ Approximate quantiles

GRAFANA ‣ https://github.com/Quantiply/grafana-plugins/tree/master/ features/druid ‣ Written by Roger Hoover ([email protected])
‣ Works with Grafana 1.9.x

GRAFANA ‣ Credit: Roger Hoover ([email protected])

GRAFANA ‣ Setup instructions in the workshop repo! ‣ https://github.com/gianm/druid-monitorama-2015

INGESTION

2013 INGESTION Druid Realtime Workers Immediate Druid Historical Nodes Periodic
Druid Broker Nodes Data Source User queries

Druid Broker Nodes User queries Kafka

2013 INGESTION Druid Realtime Workers Druid Historical Nodes Periodic Druid
Broker Nodes Data Source User queries

Druid Broker Nodes Data Source Stream Processor User queries

Druid Broker Nodes User queries Kafka

THE MYSTERIOUS FUTURE

DEPENDENCY HIT LIST ‣ ZooKeeper for coordination ‣ MySQL for
metadata storage

SIMPLER ARCHITECTURE ‣ Arose from lots of experimentation ‣ Consolidate
node types ‣ Consolidate ingestion methods ‣ …we support four methods, that should probably be one or two

INGESTION WINDOW ‣ Allow real-time writes for any time period
‣ …in 0.8.x, real-time writes must be “recent” ‣ …although batch writes can cover any time period

PLUGGABLE INDEXES ‣ CONCISE/Roaring bitmap indexes built in ‣ Also
an experimental R-tree spatial index ‣ Would like new indexes to be possible as extensions

VISUALIZATIONS ‣ Free, interactive, exploratory dashboard ‣ Grafana nice but
a bit too static, lacks context

TAKE AWAYS

TAKE AWAYS ‣ Think about metrics as event streams rather
than time series ‣ Druid is good for large datasets you want to query interactively ‣ Supporting infrastructure is a bit complex in current versions ‣ But not too bad if you already use Kafka ‣ …which you should!

THANK YOU

Druid @ Monitorama 2015

Druid @ Monitorama 2015

Other Decks in Technology

Featured

Transcript