Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Druid: Interactive Analytics at Scale

Druid
May 01, 2015

Druid: Interactive Analytics at Scale

Introduction to Druid. 04/2015

Druid

May 01, 2015
Tweet

More Decks by Druid

Other Decks in Technology

Transcript

  1. OVERVIEW DEMO SEE SOME NEAT THINGS MOTIVATION WHY DRUID? ARCHITECTURE

    PICTURES WITH ARROWS COMMUNITY CONTRIBUTE TO DRUID
  2. 2013 THE PROBLEM ‣ Arbitrary and interactive exploration of event

    data • Online advertising • System/application metrics • Network traffic monitoring • Activity stream analysis ‣ Multi-tenancy: lots of concurrent users ‣ Scalability: 20+ TB/day, ad-hoc queries on trillions of events ‣ Recency matters! Real-time analysis
  3. 2013 FINDING A SOLUTION ‣ Load all your data into

    Hadoop. Query it. Done! ‣ Good job guys, let’s go home
  4. 2013 PROBLEMS WITH THE NAIVE SOLUTION ‣ MapReduce can handle

    almost every distributed computing problem ‣ MapReduce over your raw data is flexible but slow ‣ Hadoop is not optimized for query latency ‣ To optimize queries, we need a query layer
  5. 2013 MAKE QUERIES FASTER ‣ What types of queries to

    optimize for? • Revenue over time broken down by demographic • Top publishers by clicks over the last month • Number of unique visitors broken down by any dimension • Not dumping the entire dataset • Not examining individual events
  6. 2013 ‣ Common solution in data warehousing: • Star Schema

    • Aggregate Tables • Query Caching I. RDBMS - THE SETUP
  7. 2013 ‣ Queries that were cached • fast ‣ Queries

    against aggregate tables • fast to acceptable ‣ Queries against base fact table • generally unacceptable I. RDBMS - THE RESULTS
  8. 2013 I. RDBMS - PERFORMANCE Naive benchmark scan rate ~5.5M

    rows / second / core 1 day of summarized aggregates 60M+ rows 1 query over 1 week, 16 cores ~5 seconds Page load with 20 queries over a week of data long time
  9. 2013 ‣ Pre-aggregate all dimensional combinations ‣ Store results in

    a NoSQL store II. NOSQL - THE SETUP ts gender age revenue 1 M 18 $0.15 1 F 25 $1.03 1 F 18 $0.01 Key Value 1 revenue=$1.19 1,M revenue=$0.15 1,F revenue=$1.04 1,18 revenue=$0.16 1,25 revenue=$1.03 1,M,18 revenue=$0.15 1,F,18 revenue=$0.01 1,F,25 revenue=$1.03
  10. 2013 ‣ Queries were fast • range scan on primary

    key ‣ Inflexible • not aggregated, not available ‣ Not continuously updated • aggregate first, then display ‣ Processing scales exponentially II. NOSQL - THE RESULTS
  11. 2013 ‣ Dimensional combinations => exponential increase ‣ Tried limiting

    dimensional depth • still expands exponentially ‣ Example: ~500k records • 11 dimensions, 5-deep • 4.5 hours on a 15-node Hadoop cluster • 14 dimensions, 5-deep • 9 hours on a 25-node Hadoop cluster II. NOSQL - PERFORMANCE
  12. 2013 KEY FEATURES LOW LATENCY INGESTION FAST AGGREGATIONS ARBITRARY SLICE-N-DICE

    CAPABILITIES HIGHLY AVAILABLE APPROXIMATE & EXACT CALCULATIONS DRUID
  13. 2013 DATA! timestamp page language city country ... added deleted

    2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65 2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62 2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45 2011-01-01T01:00:00Z Ke$ha en Calgary CA 17 87 2011-01-01T02:00:00Z Ke$ha en Calgary CA 43 99 2011-01-01T02:00:00Z Ke$ha en Calgary CA 12 53 ...
  14. 2013 PARTITION DATA timestamp page language city country ... added

    deleted 2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65 2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62 2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45 2011-01-01T01:00:00Z Ke$ha en Calgary CA 17 87 2011-01-01T02:00:00Z Ke$ha en Calgary CA 43 99 2011-01-01T02:00:00Z Ke$ha en Calgary CA 12 53 ‣ Shard data by time ‣ Immutable chunks of data called “segments” Segment 2011-01-01T02/2011-01-01T03 Segment 2011-01-01T01/2011-01-01T02 Segment 2011-01-01T00/2011-01-01T01
  15. 2013 IMMUTABLE SEGMENTS ‣ Fundamental storage unit in Druid ‣

    No contention between reads and writes ‣ One thread scans one segment ‣ Multiple threads can access same underlying data ‣ Segment sizes -> computation completes in 100s of ms ‣ Simplifies distribution & replication
  16. 2013 COLUMNAR STORAGE ‣ Scan/load only what you need ‣

    Compression! ‣ Indexes! timestamp page language city country ... added deleted 2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65 2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62 2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45 2011-01-01T01:00:00Z Ke$ha en Calgary CA 17 87 2011-01-01T02:00:00Z Ke$ha en Calgary CA 43 99 2011-01-01T02:00:00Z Ke$ha en Calgary CA 12 53 ...
  17. 2013 COLUMN COMPRESSION · DICTIONARIES ‣ Create ids • Justin

    Bieber -> 0, Ke$ha -> 1 ‣ Store • page -> [0 0 0 1 1 1] • language -> [0 0 0 0 0 0] timestamp page language city country ... added deleted 2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65 2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62 2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45 2011-01-01T01:00:00Z Ke$ha en Calgary CA 17 87 2011-01-01T02:00:00Z Ke$ha en Calgary CA 43 99 2011-01-01T02:00:00Z Ke$ha en Calgary CA 12 53 ...
  18. 2013 BITMAP INDICES ‣ Justin Bieber -> [0, 1, 2]

    -> [111000] ‣ Ke$ha -> [3, 4, 5] -> [000111] timestamp page language city country ... added deleted 2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65 2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62 2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45 2011-01-01T01:00:00Z Ke$ha en Calgary CA 17 87 2011-01-01T02:00:00Z Ke$ha en Calgary CA 43 99 2011-01-01T02:00:00Z Ke$ha en Calgary CA 12 53 ...
  19. 2013 FAST AND FLEXIBLE QUERIES JUSTIN BIEBER [1, 1, 0,

    0] KE$HA [0, 0, 1, 1] JUSTIN BIEBER OR KE$HA [1, 1, 1, 1] row page 0 Justin(Bieber 1 Justin(Bieber 2 Ke$ha 3 Ke$ha
  20. 2013 BITMAP INDEX COMPRESSION ‣ Supports CONCISE and Roaring •

    Boolean operations directly on compressed indices • Less memory => faster scan rates ‣ More details • http://ricerca.mat.uniroma3.it/users/colanton/concise.html • http://roaringbitmap.org/
  21. 2013 ‣ Main workhorses of a Druid cluster ‣ Shared-nothing

    architecture ‣ Load immutable read-optimized data ‣ Respond to queries HISTORICAL NODES
  22. 2013 ARCHITECTURE (BATCH ONLY) Broker Node Historical Node Historical Node

    Historical Node Broker Node Queries Hadoop Data Segments
  23. 2013 ‣ Knows which nodes hold what data ‣ Query

    scatter/gather (send requests to nodes and merge results) ‣ Caching BROKER NODES
  24. 2013 MORE PROBLEMS ‣ We’ve solved the query problem •

    Druid gave us arbitrary data exploration & fast queries ‣ But what about data freshness? • Batch loading is slow! • We want “real-time” • Alerts, operational monitoring, etc.
  25. 2013 FAST LOADING WITH DRUID ‣ We have an indexing

    system ‣ We have a serving system that runs queries on data ‣ We can serve queries while building indexes! ‣ Real-time indexing workers do this
  26. 2013 ‣ Log-structured merge-tree ‣ Ingest data and buffer events

    in memory in a write-optimized data structure ‣ Periodically persist collected events to disk (converting to a read-optimized format) ‣ Query data as soon as it is ingested REAL-TIME NODES
  27. 2013 ARCHITECTURE (STREAMING-ONLY) Broker Node Historical Node Historical Node Historical

    Node Broker Node Queries Real-time Nodes Streaming Data Segments
  28. 2013 ARCHITECTURE (LAMBDA) Broker Node Historical Node Historical Node Historical

    Node Broker Node Queries Hadoop Batch Data Segments Real-time Nodes Streaming Data Segments
  29. 2013 REPLICATION ROLLING DEPLOYMENTS + RESTARTS GROW = START PROCESSES

    SHRINK = KILL PROCESSES 3 YEARS · NO DOWNTIME FOR SOFTWARE UPDATE AVAILABILITY
  30. 2013 ASK ABOUT OUR PLUGINS ‣ Extensible architecture • Build

    your own modules and extend Druid • Add your own complex metrics (cardinality estimation, approximate histograms and quantiles, approximate top-K algorithms, etc.) • Add your own proprietary modules
  31. 2013 THE COMMUNITY ‣ Growing Community • 50+ contributors from

    many different companies • In production at multiple companies, we’re hoping for more! • Ad-tech, network traffic, operations, activity streams, etc. • Support through community forums and IRC • We love contributions!
  32. 2014 REALTIME INGESTION >500K EVENTS / SECOND SUSTAINED >1M EVENTS

    / SECOND AT PEAK 10 – 100K EVENTS / SECOND / CORE DRUID IN PRODUCTION
  33. 2014 CLUSTER SIZE
 >500TB OF SEGMENTS (>20 TRILLION RAW EVENTS)


    >5000 CORES (>400 NODES, >100TB RAM) IT’S CHEAP
 MOST COST EFFECTIVE AT THIS SCALE DRUID IN PRODUCTION
  34. 2014 0.0 0.5 1.0 1.5 0 1 2 3 4

    0 5 10 15 20 90%ile 95%ile 99%ile Feb 03 Feb 10 Feb 17 Feb 24 time query time (seconds) datasource a b c d e f g h Query latency percentiles QUERY LATENCY (500MS AVERAGE) 95% < 1S 99% < 10S DRUID IN PRODUCTION
  35. 2014 QUERY VOLUME SEVERAL HUNDRED QUERIES / SECOND VARIETY OF

    GROUP BY & TOP-K QUERIES DRUID IN PRODUCTION
  36. 2013 STREAMING ONLY INGESTION ‣ Stream processing isn’t perfect ‣

    Difficult to handle corrections of existing data ‣ Windows may be too small for fully accurate operations ‣ Hadoop was actually good at these things
  37. 2013 OPEN SOURCE LAMBDA ARCHITECTURE Event Streams Insight Kafka Hadoop

    Druid Samza ‣ Real-time ‣ Only on-time data ‣ Some hours later ‣ All data
  38. 2013 TAKE-AWAYS ‣ When Druid? • Interactive, fast exploration of

    large amounts of data • You need analytics (not key value store) • You want to do your analysis on data as it’s happening (realtime) • You need availability, extensibility and flexibility