Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Druid @ Monitorama 2015

Druid @ Monitorama 2015

Gian Merlino

June 17, 2015
Tweet

Other Decks in Technology

Transcript

  1. ORIGINS WHY DOES DRUID EXIST? DRUID HOW DRUID WORKS MINI-WORKSHOP

    TRY IT OUT FOR YOURSELF VISUALIZATIONS POWERED BY DRUID THE FUTURE SPOOKY! OVERVIEW
  2. THE PROBLEM ‣ Arbitrary, interactive exploration ‣ Multi-tenancy: thousands of

    concurrent users ‣ Recency: explore current data, alert on major changes ‣ Efficiency: each event is individually very low-value ‣ Scale: petabytes of raw data
  3. THE PROBLEM ‣ Questions lead to more questions ‣ Interested

    not just in what happened, but why ‣ Dig into the dataset using filters, aggregates, and comparisons ‣ All interesting queries cannot be determined upfront
  4. DRUID ‣ Druid project started in 2011, went open source

    in 2012 ‣ Druid is an event stream database ‣ Low latency ingestion ‣ Ad-hoc aggregations (no precomputation) ‣ Can keep around a lot of history ‣ Community driven • 90+ contributors • In production at Yahoo!, Netflix, Metamarkets, many others
  5. EVENT STREAMS ‣ Unifying feature: events happening over time ‣

    Questions often time-oriented ‣ Monitoring: CPU usage over the past 3 days, in 5-min buckets ‣ Web analytics: Top pages by number of unique users this month ‣ Performance: 99%ile latency over the past hour
  6. EVENT STREAMS {
 “timestamp”: “2015-06-01T01:22:33Z”, 
 “page”: “Augustinian theodicy”,
 “user”:

    “Glory of Space”, “user_country”: “USA”, 
 “delta_bytes”: -47,
 “delta_words”: -7
 }
  7. EVENT STREAMS {
 “timestamp”: “2015-06-01T01:22:33Z”, 
 “user”: “Gian Merlino”, “action”:

    “edit profile”, “host”: “host001.example.com”, “fields_edited”: [“email”, “phone”], “response_latency”: 38, “response_bytes”: 2041
 }
  8. TIME SERIES ‣ Measure your world with some resolution ‣

    Timestamp each data point ‣ Name each series ‣ Maybe include some tags for filtering ‣ Examples: %cpu, disk usage, network traffic
  9. TIME SERIES ‣ Druid is not a time series database!

    ‣ But that’s okay… ‣ …because time series are actually event streams
  10. TIME SERIES AS EVENT STREAMS {
 “timestamp”: “2015-06-01T01:22:33Z”, 
 “series_name”:

    “cpu”, “host”: “host001.example.com”, “cpu”: “cpu0”, “value”: 0.81
 }
  11. 2014 REALTIME INGESTION >500K EVENTS / SECOND AVERAGE >1M EVENTS

    / SECOND PEAK 10 – 100K EVENTS / SECOND / CORE DRUID IN PRODUCTION
  12. 2014 CLUSTER SIZE
 >500TB OF SEGMENTS (>20 TRILLION RAW EVENTS)


    >5000 CORES (>350 NODES, >100TB RAM) IT’S CHEAP
 MOST COST EFFECTIVE AT THIS SCALE DRUID IN PRODUCTION
  13. 2014 0.0 0.5 1.0 1.5 0 1 2 3 4

    0 5 10 15 20 90%ile 95%ile 99%ile Feb 03 Feb 10 Feb 17 Feb 24 time query time (seconds) datasource a b c d e f g h Query latency percentiles QUERY LATENCY (500MS AVERAGE) 90% < 1S 95% < 5S 99% < 10S DRUID IN PRODUCTION
  14. METRICS ‣ Count ‣ Sum ‣ Average ‣ Min/Max ‣

    Approximate cardinality (HyperLogLog) ‣ Approximate histograms and quantiles ‣ Extend with custom metrics and sketches
  15. CLIENT LIBRARIES ‣ Python: https://github.com/metamx/pydruid ‣ R: https://github.com/metamx/RDruid ‣ Ruby:

    https://github.com/madvertise/ruby-druid ‣ JavaScript: https://github.com/facetjs/facetjs ‣ SQL: https://github.com/facetjs/facet-cli ‣ More at http://druid.io/docs/latest/Libraries.html
  16. 2013 STEP BY STEP ‣ https://github.com/gianm/druid-monitorama-2015 ‣ Single machine setup

    ‣ Distributed setup needs a bit more configuration (see the docs)
  17. 2013 ‣ Ingest event streams ‣ Query data in-memory as

    soon as it is ingested ‣ Periodically create and “hand off” immutable segments REAL-TIME NODES
  18. 2013 ‣ Main workhorses of a Druid cluster ‣ Store

    immutable data segments ‣ Respond to queries HISTORICAL NODES
  19. 2013 ARCHITECTURE Query API Historical Nodes Realtime Nodes Query API

    Broker Nodes Query API Query Rewrite Scatter/Gather a Hand Off Data
  20. 2013 ‣ Knows which other nodes hold what data ‣

    Query scatter/gather (send requests to nodes and merge results) BROKER NODES
  21. ONE WEIRD TIP FOR FAST QUERIES ‣ Two storage engines

    ‣ Historical ‣ Time-partitioned, immutable, mmapped “Druid segments” ‣ Locality: Compute partial results on data nodes ‣ Fast filtering: Global time index, local CONCISE/Roaring bitmaps ‣ Fast scans: Column-oriented, compressed
  22. ONE WEIRD TIP FOR FAST QUERIES ‣ Two storage engines

    ‣ Real-time ‣ In-memory k/v tree + mmapped Druid segments ‣ Similar to memtable + sstable in RocksDB ‣ …but Druid segments can be queried much faster than sstables ‣ Periodically, merge and hand off Druid segments
  23. 2013 RAW DATA timestamp publisher advertiser gender country click price

    2011-01-01T01:01:35Z bieberfever.com google.com Male USA 0 0.65 2011-01-01T01:03:63Z bieberfever.com google.com Male USA 0 0.62 2011-01-01T01:04:51Z bieberfever.com google.com Male USA 1 0.45 ... 2011-01-01T01:00:00Z ultratrimfast.com google.com Female UK 0 0.87 2011-01-01T02:00:00Z ultratrimfast.com google.com Female UK 0 0.99 2011-01-01T02:00:00Z ultratrimfast.com google.com Female UK 1 1.53
  24. 2013 ROLLUP DATA timestamp publisher advertiser gender country impressions clicks

    revenue 2011-01-01T01:00:00Z ultratrimfast.com google.com Male USA 1800 25 15.70 2011-01-01T01:00:00Z bieberfever.com google.com Male USA 2912 42 29.18 2011-01-01T02:00:00Z ultratrimfast.com google.com Male UK 1953 17 17.31 2011-01-01T02:00:00Z bieberfever.com google.com Male UK 3194 170 34.01 ‣ Truncate timestamps ‣ GroupBy over string columns (dimensions) ‣ Aggregate during ingestion when possible ‣ Can incrementally update aggregate rows
  25. 2013 PARTITION DATA timestamp publisher advertiser gender country impressions clicks

    revenue 2011-01-01T01:00:00Z ultratrimfast.com google.com Male USA 1800 25 15.70 2011-01-01T01:00:00Z bieberfever.com google.com Male USA 2912 42 29.18 2011-01-01T02:00:00Z ultratrimfast.com google.com Male UK 1953 17 17.31 2011-01-01T02:00:00Z bieberfever.com google.com Male UK 3194 170 34.01 ‣ Shard segments by time Segment 2011-01-01T02/2011-01-01T03 Segment 2011-01-01T01/2011-01-01T02
  26. 2013 Segment 2011-01-01T01/2011-01-01T02 COLUMN ORIENTED timestamp publisher advertiser gender country

    impressions clicks revenue 2011-01-01T01:00:00Z ultratrimfast.com google.com Male USA 1800 25 15.70 2011-01-01T01:00:00Z bieberfever.com google.com Male USA 2912 42 29.18 ‣ Scan/load only what you need ‣ Per-column compression (dictionary encoding, LZ4) ‣ Per-column indexes (CONCISE/Roaring bitmaps)
  27. QUANTIPLY ‣ Financial services company ‣ Lots of microservices (1000+)

    ‣ Using Druid to find and debug latency hot spots ‣ Graphics courtesy of Roger Hoover ([email protected])
  28. LATENCY TREE MAP BY SERVICE ‣ Credit: Roger Hoover ([email protected])

    ‣ Size by total time ‣ Color by deviation from norm
  29. 2013 INGESTION Druid Realtime Workers Immediate Druid Historical Nodes Periodic

    Druid Broker Nodes Data Source Stream Processor User queries
  30. SIMPLER ARCHITECTURE ‣ Arose from lots of experimentation ‣ Consolidate

    node types ‣ Consolidate ingestion methods ‣ …we support four methods, that should probably be one or two
  31. INGESTION WINDOW ‣ Allow real-time writes for any time period

    ‣ …in 0.8.x, real-time writes must be “recent” ‣ …although batch writes can cover any time period
  32. PLUGGABLE INDEXES ‣ CONCISE/Roaring bitmap indexes built in ‣ Also

    an experimental R-tree spatial index ‣ Would like new indexes to be possible as extensions
  33. TAKE AWAYS ‣ Think about metrics as event streams rather

    than time series ‣ Druid is good for large datasets you want to query interactively ‣ Supporting infrastructure is a bit complex in current versions ‣ But not too bad if you already use Kafka ‣ …which you should!