ORIGINS WHY DOES DRUID EXIST?
DRUID HOW DRUID WORKS
MINI-WORKSHOP TRY IT OUT FOR YOURSELF
VISUALIZATIONS POWERED BY DRUID
THE FUTURE SPOOKY!
OVERVIEW
Slide 3
Slide 3 text
No content
Slide 4
Slide 4 text
THE PROBLEM
Slide 5
Slide 5 text
THE PROBLEM
Slide 6
Slide 6 text
THE PROBLEM
Slide 7
Slide 7 text
THE PROBLEM
‣ Arbitrary, interactive exploration
‣ Multi-tenancy: thousands of concurrent users
‣ Recency: explore current data, alert on major changes
‣ Efficiency: each event is individually very low-value
‣ Scale: petabytes of raw data
Slide 8
Slide 8 text
THE PROBLEM
‣ Questions lead to more questions
‣ Interested not just in what happened, but why
‣ Dig into the dataset using filters, aggregates, and comparisons
‣ All interesting queries cannot be determined upfront
Slide 9
Slide 9 text
DRUID
Slide 10
Slide 10 text
DRUID
‣ Druid project started in 2011, went open source in 2012
‣ Druid is an event stream database
‣ Low latency ingestion
‣ Ad-hoc aggregations (no precomputation)
‣ Can keep around a lot of history
‣ Community driven
• 90+ contributors
• In production at Yahoo!, Netflix, Metamarkets, many others
Slide 11
Slide 11 text
EVENT STREAMS
‣ Unifying feature: events happening over time
‣ Questions often time-oriented
‣ Monitoring: CPU usage over the past 3 days, in 5-min buckets
‣ Web analytics: Top pages by number of unique users this month
‣ Performance: 99%ile latency over the past hour
TIME SERIES
‣ Measure your world with some resolution
‣ Timestamp each data point
‣ Name each series
‣ Maybe include some tags for filtering
‣ Examples: %cpu, disk usage, network traffic
Slide 15
Slide 15 text
TIME SERIES
‣ Druid is not a time series database!
‣ But that’s okay…
‣ …because time series are actually event streams
Slide 16
Slide 16 text
TIME SERIES AS EVENT STREAMS
{
“timestamp”: “2015-06-01T01:22:33Z”,
“series_name”: “cpu”,
“host”: “host001.example.com”,
“cpu”: “cpu0”,
“value”: 0.81
}
Slide 17
Slide 17 text
DRUID IN PRODUCTION
Slide 18
Slide 18 text
2014
REALTIME INGESTION
>500K EVENTS / SECOND AVERAGE
>1M EVENTS / SECOND PEAK
10 – 100K EVENTS / SECOND / CORE
DRUID IN PRODUCTION
Slide 19
Slide 19 text
2014
CLUSTER SIZE
>500TB OF SEGMENTS (>20 TRILLION RAW EVENTS)
>5000 CORES (>350 NODES, >100TB RAM)
IT’S CHEAP
MOST COST EFFECTIVE AT THIS SCALE
DRUID IN PRODUCTION
Slide 20
Slide 20 text
2014
0.0
0.5
1.0
1.5
0
1
2
3
4
0
5
10
15
20
90%ile 95%ile 99%ile
Feb 03 Feb 10 Feb 17 Feb 24
time
query time (seconds)
datasource
a
b
c
d
e
f
g
h
Query latency percentiles
QUERY LATENCY (500MS AVERAGE)
90% < 1S 95% < 5S 99% < 10S
DRUID IN PRODUCTION
Slide 21
Slide 21 text
USING DRUID
Slide 22
Slide 22 text
QUERIES
‣ JSON over HTTP
‣ All computation pushed down to the data nodes
METRICS
‣ Count
‣ Sum
‣ Average
‣ Min/Max
‣ Approximate cardinality (HyperLogLog)
‣ Approximate histograms and quantiles
‣ Extend with custom metrics and sketches
2013
STEP BY STEP
‣ https://github.com/gianm/druid-monitorama-2015
‣ Kafka for ingestion
‣ Druid for analytics
‣ Grafana for visualization
Slide 28
Slide 28 text
2013
STEP BY STEP
‣ https://github.com/gianm/druid-monitorama-2015
‣ Single machine setup
‣ Distributed setup needs a bit more configuration (see the docs)
Slide 29
Slide 29 text
2013
MORE RESOURCES
‣ http://druid.io/
‣ http://druid.io/docs/latest/Tutorials.html
Slide 30
Slide 30 text
INSIDE DRUID: DISTRIBUTION
Slide 31
Slide 31 text
2013
ARCHITECTURE
Realtime Nodes
Query API
Slide 32
Slide 32 text
2013
‣ Ingest event streams
‣ Query data in-memory as soon as it is ingested
‣ Periodically create and “hand off” immutable segments
REAL-TIME NODES
Slide 33
Slide 33 text
2013
ARCHITECTURE
Realtime Nodes
Query API
Query API
Historical Nodes
a
Hand Off Data
Slide 34
Slide 34 text
2013
‣ Main workhorses of a Druid cluster
‣ Store immutable data segments
‣ Respond to queries
HISTORICAL NODES
Slide 35
Slide 35 text
2013
ARCHITECTURE
Query API
Historical Nodes
Realtime Nodes
Query API
Broker Nodes
Query API
Query Rewrite
Scatter/Gather
a
Hand Off Data
Slide 36
Slide 36 text
2013
‣ Knows which other nodes hold what data
‣ Query scatter/gather (send requests to nodes and merge results)
BROKER NODES
Slide 37
Slide 37 text
ONE WEIRD TIP FOR FAST QUERIES
‣ Doctors hate it!
Slide 38
Slide 38 text
ONE WEIRD TIP FOR FAST QUERIES
‣ Two storage engines
Slide 39
Slide 39 text
ONE WEIRD TIP FOR FAST QUERIES
‣ Two storage engines
‣ Historical
‣ Time-partitioned, immutable, mmapped “Druid segments”
‣ Locality: Compute partial results on data nodes
‣ Fast filtering: Global time index, local CONCISE/Roaring bitmaps
‣ Fast scans: Column-oriented, compressed
Slide 40
Slide 40 text
ONE WEIRD TIP FOR FAST QUERIES
‣ Two storage engines
‣ Real-time
‣ In-memory k/v tree + mmapped Druid segments
‣ Similar to memtable + sstable in RocksDB
‣ …but Druid segments can be queried much faster than sstables
‣ Periodically, merge and hand off Druid segments
Slide 41
Slide 41 text
INSIDE DRUID: SEGMENTS
Slide 42
Slide 42 text
2013
RAW DATA
timestamp publisher advertiser gender country click price
2011-01-01T01:01:35Z bieberfever.com google.com Male USA 0 0.65
2011-01-01T01:03:63Z bieberfever.com google.com Male USA 0 0.62
2011-01-01T01:04:51Z bieberfever.com google.com Male USA 1 0.45
...
2011-01-01T01:00:00Z ultratrimfast.com google.com Female UK 0 0.87
2011-01-01T02:00:00Z ultratrimfast.com google.com Female UK 0 0.99
2011-01-01T02:00:00Z ultratrimfast.com google.com Female UK 1 1.53
Slide 43
Slide 43 text
2013
ROLLUP DATA
timestamp publisher advertiser gender country impressions clicks revenue
2011-01-01T01:00:00Z ultratrimfast.com google.com Male USA 1800 25 15.70
2011-01-01T01:00:00Z bieberfever.com google.com Male USA 2912 42 29.18
2011-01-01T02:00:00Z ultratrimfast.com google.com Male UK 1953 17 17.31
2011-01-01T02:00:00Z bieberfever.com google.com Male UK 3194 170 34.01
‣ Truncate timestamps
‣ GroupBy over string columns (dimensions)
‣ Aggregate during ingestion when possible
‣ Can incrementally update aggregate rows
Slide 44
Slide 44 text
2013
PARTITION DATA
timestamp publisher advertiser gender country impressions clicks revenue
2011-01-01T01:00:00Z ultratrimfast.com google.com Male USA 1800 25 15.70
2011-01-01T01:00:00Z bieberfever.com google.com Male USA 2912 42 29.18
2011-01-01T02:00:00Z ultratrimfast.com google.com Male UK 1953 17 17.31
2011-01-01T02:00:00Z bieberfever.com google.com Male UK 3194 170 34.01
‣ Shard segments by time
Segment 2011-01-01T02/2011-01-01T03
Segment 2011-01-01T01/2011-01-01T02
Slide 45
Slide 45 text
2013
Segment 2011-01-01T01/2011-01-01T02
COLUMN ORIENTED
timestamp publisher advertiser gender country impressions clicks revenue
2011-01-01T01:00:00Z ultratrimfast.com google.com Male USA 1800 25 15.70
2011-01-01T01:00:00Z bieberfever.com google.com Male USA 2912 42 29.18
‣ Scan/load only what you need
‣ Per-column compression (dictionary encoding, LZ4)
‣ Per-column indexes (CONCISE/Roaring bitmaps)
Slide 46
Slide 46 text
DRUID POWERED VISUALIZATIONS
Slide 47
Slide 47 text
METAMARKETS
‣ Dogfooded our BI tool as an ops monitoring tool
Slide 48
Slide 48 text
STREAM PROCESSING
Slide 49
Slide 49 text
STREAM PROCESSING
Slide 50
Slide 50 text
QUANTIPLY
‣ Financial services company
‣ Lots of microservices (1000+)
‣ Using Druid to find and debug latency hot spots
‣ Graphics courtesy of Roger Hoover ([email protected])
LATENCY TREE MAP BY SERVICE
‣ Credit: Roger Hoover ([email protected])
‣ Size by total time
‣ Color by deviation
from norm
Slide 53
Slide 53 text
LATENCY TREE MAP BY DB BACKEND
‣ Credit: Roger Hoover ([email protected])
‣ View by DB backend
instead of service
Slide 54
Slide 54 text
DRILL DOWN
‣ Credit: Roger Hoover ([email protected])
‣ Drill down on any
dimension
‣ Approximate
quantiles
Slide 55
Slide 55 text
GRAFANA
‣ https://github.com/Quantiply/grafana-plugins/tree/master/
features/druid
‣ Written by Roger Hoover ([email protected])
‣ Works with Grafana 1.9.x
DEPENDENCY HIT LIST
‣ ZooKeeper for coordination
‣ MySQL for metadata storage
Slide 66
Slide 66 text
SIMPLER ARCHITECTURE
‣ Arose from lots of experimentation
‣ Consolidate node types
‣ Consolidate ingestion methods
‣ …we support four methods, that should probably be one or two
Slide 67
Slide 67 text
INGESTION WINDOW
‣ Allow real-time writes for any time period
‣ …in 0.8.x, real-time writes must be “recent”
‣ …although batch writes can cover any time period
Slide 68
Slide 68 text
PLUGGABLE INDEXES
‣ CONCISE/Roaring bitmap indexes built in
‣ Also an experimental R-tree spatial index
‣ Would like new indexes to be possible as extensions
Slide 69
Slide 69 text
VISUALIZATIONS
‣ Free, interactive, exploratory dashboard
‣ Grafana nice but a bit too static, lacks context
Slide 70
Slide 70 text
TAKE AWAYS
Slide 71
Slide 71 text
TAKE AWAYS
‣ Think about metrics as event streams rather than time series
‣ Druid is good for large datasets you want to query interactively
‣ Supporting infrastructure is a bit complex in current versions
‣ But not too bad if you already use Kafka
‣ …which you should!