Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ACM/SIGMOD 2014

Druid
June 24, 2014

ACM/SIGMOD 2014

Druid – A Real-time Analytical Data Store

Druid

June 24, 2014
Tweet

More Decks by Druid

Other Decks in Technology

Transcript

  1. DRUID A REAL-TIME ANALYTICAL DATA STORE ERIC TSCHETTER · FANGJIN

    YANG · XAVIER LÉAUTÉ DRUID COMMITTERS DRUID.IO @DRUIDIO
  2. 2013 THIS PRESENTATION ‣ Description of a production system ‣

    Benchmarks and numbers ‣ Many hard problems looking for solutions
  3. 2013 WHAT IS DRUID? ‣ Open-source ‣ Column-oriented ‣ Distributed

    ‣ Approximate or Exact ‣ Real-time ‣ Highly-Available ‣ Data store
  4. 2013 I.E. IT IS SIMILAR/COMPLIMENTARY TO ‣ Apache Hadoop ‣

    Apache Storm ‣ Apache Elastic Search ‣ Apache Spark ‣ Facebook Presto ‣ Google Dremel ‣ Google PowerDrill ‣ etc.
  5. 2013 ‣ Log-structured merge-tree ‣ Ingest data and buffer events

    in memory in a write-optimized data structure ‣ Periodically persist collected events to disk (converting to a read-optimized format) ‣ Query data as soon as it is ingested REAL-TIME NODES
  6. 2013 ‣ Main workhorses of a Druid cluster ‣ Shared-nothing

    architecture ‣ Load immutable read-optimized data ‣ Respond to queries HISTORICAL NODES
  7. 2013 ARCHITECTURE Query API Historical Nodes Realtime Nodes Query API

    Broker Nodes Query API Query Rewrite Scatter/Gather a Hand Off Data
  8. 2013 ‣ Query scatter/gather (send requests to nodes and merge

    results) ‣ (Distributed) Caching BROKER NODES
  9. 2013 COLUMN COMPRESSION · DICTIONARIES ‣ Create ids • Justin

    Bieber -> 0, Ke$ha -> 1 ‣ Store • page -> [0 0 0 1 1 1] • language -> [0 0 0 0 0 0] timestamp page language city country ... added deleted! 2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65! 2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62! 2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45! 2011-01-01T01:00:00Z Ke$ha en Calgary CA 17 87! 2011-01-01T02:00:00Z Ke$ha en Calgary CA 43 99! 2011-01-01T02:00:00Z Ke$ha en Calgary CA 12 53! ...
  10. 2013 BITMAP INDICES ‣ Justin Bieber -> [0, 1, 2]

    -> [111000] ‣ Ke$ha -> [3, 4, 5] -> [000111] ‣ Compressed with CONCISE timestamp page language city country ... added deleted! 2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65! 2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62! 2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45! 2011-01-01T01:00:00Z Ke$ha en Calgary CA 17 87! 2011-01-01T02:00:00Z Ke$ha en Calgary CA 43 99! 2011-01-01T02:00:00Z Ke$ha en Calgary CA 12 53! ...
  11. 2013 FAST AND FLEXIBLE QUERIES JUSTIN BIEBER [1, 1, 0,

    0] KE$HA [0, 0, 1, 1] JUSTIN BIEBER OR KE$HA [1, 1, 1, 1] row page 0 Justin(Bieber 1 Justin(Bieber 2 Ke$ha 3 Ke$ha
  12. 2014 REALTIME INGESTION >200K EVENTS / SECOND SUSTAINED 10 –

    100K EVENTS / SECOND / CORE ! ! ! DRUID IN PRODUCTION
  13. 2014 CLUSTER SIZE
 120TB OF DATA (~8 TRILLION RAW EVENTS)


    >2000 CORES (>100 NODES, 15TB RAM) ! IT’S CHEAP
 MOST COST EFFECTIVE AT THIS SCALE DRUID IN PRODUCTION
  14. 2014 0.0 0.5 1.0 1.5 0 1 2 3 4

    0 5 10 15 20 90%ile 95%ile 99%ile Feb 03 Feb 10 Feb 17 Feb 24 time query time (seconds) datasource a b c d e f g h Query latency percentiles QUERY LATENCY (500MS AVERAGE) 90% < 1S 95% < 5S 99% < 10S ! ! ! DRUID IN PRODUCTION
  15. 2014 QUERY VOLUME >1000 QUERIES / MINUTE VARIETY OF GROUP

    BY & TOP-K QUERIES ! ! ! DRUID IN PRODUCTION
  16. 2013 MULTI-TENANCY IS HARD ‣ Prioritize • Recent data over

    old data, short queries over long queries ‣ Make it Stop • Ability to time-out / cancel queries ‣ Small units of computation • Keep shards small ‣ Cost / performance trade-offs • Put redundant copies on cheaper hardware • Tweak ratio of data in memory / on-disk
  17. 2013 DRUID AS A PLATFORM Batch Ingestion (Hadoop) Streaming Ingestion

    (Storm) Druid Approximate Algorithms Machine Learning (SciPy, R, ScalaNLP) Visualizations
  18. 2013 LOOKING FOR COLLABORATION ‣ Druid is open source !

    ‣ Multi-tenancy • Tied Requests, Better resource management, query prioritization ‣ Approximate Algorithms • must be memory bounded and allow distributed folding • e.g. HyperLogLog, approximate histograms, t-digest ‣ Memory bounded, fast, distributed Joins