Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Druid - A Realtime Analytical Data Store

Nishant
August 07, 2014

Druid - A Realtime Analytical Data Store

Nishant

August 07, 2014
Tweet

More Decks by Nishant

Other Decks in Technology

Transcript

  1. 2014 THIS PRESENTATION ‣ Demo ‣ Problems ‣ Druid Architecture

    ‣ Benchmarks and numbers ‣ Many hard problems looking for solutions
  2. 2013 WE HAVE LOT OF EVENT DATA! timestamp page language

    city country ... added deleted! 2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65! 2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62! 2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45! 2011-01-01T01:00:00Z Ke$ha en Calgary CA 17 87! 2011-01-01T02:00:00Z Ke$ha en Calgary CA 43 99! 2011-01-01T02:00:00Z Ke$ha en Calgary CA 12 53! ...
  3. 2014 ‣ Queries that were cached • fast ‣ Queries

    against aggregate tables • fast to acceptable ‣ Queries against base fact table • generally unacceptable I. RDBMS - THE RESULTS
  4. 2014 I. RDBMS - PERFORMANCE Naive benchmark scan rate ~5.5M

    rows / second / core 1 day of summarized aggregates 60M+ rows 1 query over 1 week, 16 cores ~5 seconds Page load with 20 queries over a week of data long time
  5. 2014 ‣ Pre-aggregate all dimensional combinations ‣ Store results in

    a NoSQL store II. NOSQL - THE SETUP ts gender age revenue 1 M 18 US$ 0.15 1 F 25 US$ 1.03 1 F 18 US$ 0.01 Key Value 1 revenue=$1.19 1,M revenue=$0.15 1,F revenue=$1.04 1,18 revenue=$0.16 1,25 revenue=$1.03 1,M,18 revenue=$0.15 1,F,18 revenue=$0.01 1,F,25 revenue=$1.03
  6. 2014 ‣ Queries were fast • range scan on primary

    key ‣ Inflexible • not aggregated, not available ‣ Not continuously updated • aggregate first, then display ‣ Processing scales exponentially II. NOSQL - THE RESULTS
  7. 2014 ‣ Dimensional combinations => exponential increase ‣ Tried limiting

    dimensional depth • still expands exponentially ‣ Example: ~500k records • 11 dimensions, 5-deep • 4.5 hours on a 15-node Hadoop cluster • 14 dimensions, 5-deep • 9 hours on a 25-node Hadoop cluster II. NOSQL - PERFORMANCE
  8. 2014 ‣ Problem with RDBMS: scans are slow ‣ Problem

    with NoSQL: computationally intractable WHAT WE LEARNED
  9. 2014 ‣ Problem with RDBMS: scans are slow ‣ Problem

    with NoSQL: computationally intractable ! ! ! ! ! ‣ Tackling the RDBMS issue seems easier WHAT WE LEARNED
  10. 2014 WHAT IS DRUID? ‣ Open-source ‣ Column-oriented ‣ Distributed

    ‣ Fast ‣ Real-time ‣ Highly-Available ‣ Approximate or Exact ‣ Data store
  11. 2014 ‣ Log-structured merge-tree ‣ Ingest data and buffer events

    in memory in a write-optimized data structure ‣ Periodically persist collected events to disk (converting to a read-optimized format) ‣ Query data as soon as it is ingested REAL-TIME NODES
  12. 2014 ‣ Main workhorses of a Druid cluster ‣ Shared-nothing

    architecture ‣ Load immutable read-optimized data ‣ Respond to queries HISTORICAL NODES
  13. 2014 ARCHITECTURE Query API Historical Nodes Realtime Nodes Query API

    Broker Nodes Query API Query Rewrite Scatter/Gather a Hand Off Data
  14. 2014 ‣ Query scatter/gather (send requests to nodes and merge

    results) ‣ (Distributed) Caching BROKER NODES
  15. 2013 • Distributes data across historical nodes • Assigns historical

    nodes to load & drop data • Manages replication COORDINATOR NODES
  16. 2013 DATA! timestamp page language city country ... added deleted!

    2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65! 2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62! 2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45! 2011-01-01T01:00:00Z Ke$ha en Calgary CA 17 87! 2011-01-01T02:00:00Z Ke$ha en Calgary CA 43 99! 2011-01-01T02:00:00Z Ke$ha en Calgary CA 12 53! ...
  17. 2014 COLUMN COMPRESSION · DICTIONARIES ‣ Create ids • Justin

    Bieber -> 0, Ke$ha -> 1 ‣ Store • page -> [0 0 0 1 1 1] • language -> [0 0 0 0 0 0] timestamp page language city country ... added deleted! 2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65! 2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62! 2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45! 2011-01-01T01:00:00Z Ke$ha en Calgary CA 17 87! 2011-01-01T02:00:00Z Ke$ha en Calgary CA 43 99! 2011-01-01T02:00:00Z Ke$ha en Calgary CA 12 53! ...
  18. 2014 BITMAP INDICES ‣ Justin Bieber -> [0, 1, 2]

    -> [111000] ‣ Ke$ha -> [3, 4, 5] -> [000111] ‣ Compressed with CONCISE timestamp page language city country ... added deleted! 2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65! 2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62! 2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45! 2011-01-01T01:00:00Z Ke$ha en Calgary CA 17 87! 2011-01-01T02:00:00Z Ke$ha en Calgary CA 43 99! 2011-01-01T02:00:00Z Ke$ha en Calgary CA 12 53! ...
  19. 2014 FAST AND FLEXIBLE QUERIES JUSTIN BIEBER [1, 1, 0,

    0] KE$HA [0, 0, 1, 1] JUSTIN BIEBER OR KE$HA [1, 1, 1, 1] row page 0 Justin(Bieber 1 Justin(Bieber 2 Ke$ha 3 Ke$ha
  20. 2014 REALTIME INGESTION >200K EVENTS / SECOND SUSTAINED 10 –

    100K EVENTS / SECOND / CORE ! ! ! DRUID IN PRODUCTION
  21. 2014 CLUSTER SIZE
 150TB OF DATA (~10 TRILLION RAW EVENTS)


    >2000 CORES (>100 NODES, 15TB RAM) ! IT’S CHEAP
 MOST COST EFFECTIVE AT THIS SCALE DRUID IN PRODUCTION
  22. 2014 0.0 0.5 1.0 1.5 0 1 2 3 4

    0 5 10 15 20 90%ile 95%ile 99%ile Feb 03 Feb 10 Feb 17 Feb 24 time query time (seconds) datasource a b c d e f g h Query latency percentiles QUERY LATENCY (500MS AVERAGE) 90% < 1S 95% < 5S 99% < 10S ! ! ! DRUID IN PRODUCTION
  23. 2014 QUERY VOLUME >1000 QUERIES / MINUTE VARIETY OF GROUP

    BY & TOP-K QUERIES ! ! ! DRUID IN PRODUCTION
  24. 2013 REPLICATION ROLLING DEPLOYMENTS + RESTARTS GROW = START PROCESSES

    SHRINK = KILL PROCESSES 2 YEARS · NO DOWNTIME FOR SOFTWARE UPDATE AVAILABILITY
  25. 2013 DRUID AS A PLATFORM Batch Ingestion (Hadoop) Streaming Ingestion

    (Storm) Druid Approximate Algorithms Machine Learning (SciPy, R, ScalaNLP) Visualizations
  26. 2014 LOOKING FOR COLLABORATION ‣ Druid is open source !

    ‣ Multi-tenancy • Tied Requests, Better resource management, query prioritization ‣ Approximate Algorithms • must be memory bounded and allow distributed folding • e.g. HyperLogLog, approximate histograms, t-digest ‣ Memory bounded, fast, distributed Joins
  27. Fangjin Yang & Nelson Ray 2013 • Eric Tschetter •

    Fangjin Yang • Xavier Léauté • Gian Merlino ACKNOWLEDGEMENTS
  28. Compute Nodes Broker Nodes Real-time Nodes Zookeeper Queries MySQL Data

    Stream Master Nodes ARCHITECTURE Deep Storage