Inside Apache Druid: Designed for Performance

26290e7e829b985a6bcb44da8213029e?s=47 Imply
May 13, 2019

Inside Apache Druid: Designed for Performance

Apache Druid is a modern analytical database that implements a memory-mappable storage format, indexes, compression, late tuple materialization, and a query engine that can operate directly on compressed data. There is a patch out to add vectorized processing as well, which we can expect to see show up in a future release. This talk goes into detail on how Druid's query processing layer works and how each component contributes to achieving top performance for analytical queries.

26290e7e829b985a6bcb44da8213029e?s=128

Imply

May 13, 2019
Tweet

Transcript

  1. Inside Apache Druid Designed for Performance Gian Merlino gian@imply.io

  2. Who am I? Gian Merlino Committer & PMC member on

    Cofounder at 10 years working on scalable systems 2
  3. Agenda • Why does Apache Druid exist? • The problem

    • How does it work? • Do try this at home! 3
  4. Why does Apache Druid exist? 4

  5. Where Druid fits in 5 Confidential. Do not redistribute. Data

    Data Data Data Sources Stream processors Data lake / stream hub Real-time analytics Stream processing ETL Storage Apps Archive to data lake
  6. The problem 6

  7. 7 “Druid is an open source data store designed for

    real-time exploratory analytics on large data sets. Druid was originally developed to power a slice-and-dice analytical UI built on top of large event streams. The original use case for Druid targeted ingest rates of millions of records/sec, retention of over a year of data, and query latencies of sub-second to a few seconds.” Source: https://wiki.apache.org/incubator/DruidProposal
  8. The problem 8

  9. Challenges • Scale: when data is large, we need a

    lot of servers • Speed: aiming for sub-second response time • Complexity: too much fine grain to precompute • High dimensionality: 10s or 100s of dimensions • Concurrency: many users and tenants • Freshness: load from streams 9
  10. 10 high performance real-time analytics database

  11. What is Druid? • “high performance”: bread-and-butter fast scan rates

    + ‘tricks’ • “real-time”: streaming ingestion, interactive query speeds • “analytics”: counting, ranking, groupBy, time trend • “database”: the cluster stores a copy of your data and helps you manage it 11
  12. Key features • Column oriented • High concurrency • Scalable

    to 100s of servers, millions of messages/sec • Continuous, real-time ingest • Query through SQL • Target query latency sub-second to a few seconds 12
  13. Use cases • Clickstreams, user behavior • Digital advertising •

    Application performance management • Network flows • IoT 13
  14. Powered by Druid 14 Source: http://druid.io/druid-powered.html + many more!

  15. Powered by Druid “The performance is great ... some of

    the tables that we have internally in Druid have billions and billions of events in them, and we’re scanning them in under a second.” 15 Source: https://www.infoworld.com/article/2949168/hadoop/yahoo-struts-its-hadoop-stuff.html From Yahoo:
  16. Why this works • Computers are fast these days •

    Indexes help save work and cost • But don’t be afraid to scan tables — it can be done efficiently 16
  17. 17 How does it work?

  18. “Bad programmers worry about the code. Good programmers worry about

    data structures and their relationships.” ― Linus Torvalds
  19. Tricks of the trade • Druid stores data in immutable

    segments • Global time index • Secondary indexes on individual columns • Memory-mappable compressed columns • Late tuple materialization • Query engines operate directly on compressed data • Vectorization (coming soon) • Rollup (partial aggregation)
  20. Druid’s logical data model Timestamp Dimensions Metrics

  21. Druid Segments 2011-01-01T00:01:35Z Justin Bieber SF 10 5 2011-01-01T00:03:45Z Justin

    Bieber LA 25 37 2011-01-01T00:05:62Z Justin Bieber SF 15 19 2011-01-01T01:06:33Z Ke$ha LA 30 45 2011-01-01T01:08:51Z Ke$ha LA 16 8 2011-01-01T01:09:17Z Miley Cyrus DC 75 10 2011-01-01T02:23:30Z Miley Cyrus DC 22 12 2011-01-01T02:49:33Z Miley Cyrus DC 90 41 Segment 2011-01-01T00/2011-01-01T01 Segment 2011-01-01T01/2011-01-01T02 Segment 2011-01-01T02/2011-01-01T03 timestamp page city added deleted Enables global time index.
  22. page (STRING) Anatomy of a Druid Segment Physical storage format

    removed (LONG) __time (LONG) 1293840000000 1293840000000 1293840000000 1293840000000 1293840000000 1293840000000 1293840000000 1293840000000 DATA DICT INDEX 0 0 0 1 1 2 2 2 Justin = 0 Ke$ha = 1 Miley = 2 [0,1,2](11100000) [3,4] (00011000) [5,6,7](0000111) 25 42 17 170 112 67 53 94 DATA 2 1 2 1 1 0 0 0 IND [0,2] (10100000) [1,3,4](01011000) [5,6,7](00000111) DICT DC = 0 LA=1 SF = 2 INDEX city (STRING) added (LONG) 1800 2912 1953 3194 5690 1100 8423 9080 Dict encoded (sorted) Bitmap index (stored compressed)
  23. Filtering with indexes timestamp page 2011-01-01T00:01:35Z Justin Bieber 2011-01-01T00:03:45Z Justin

    Bieber 2011-01-01T00:05:62Z Justin Bieber 2011-01-01T00:06:33Z Ke$ha 2011-01-01T00:08:51Z Ke$ha JB or KS [ 1 1 1 1 1] Justin Bieber [1 1 1 0 0] Ke$ha [0 0 0 1 1]
  24. Query execution pipeline 24 Scan Filter Project Aggregate Filter Project

    Sort Druid’s query execution pipeline
  25. Late materialization 25 Scan Filter Aggregate Scan is a no-op!

    Load column data necessary for filtering, cache it in case it’s used for later operators too Load column data necessary for aggregation
  26. Operating on compressed data 26 • Recall that columnar data

    is dictionary-encoded as a form of compression • Aggregation operators read dictionary codes and use them as array slots or keys in hashtable • Only when merging inter-segment results (different dictionaries) is a dictionary lookup done Aggregate
  27. Vectorization 27 public interface SumAggregator { void aggregate(int input); int

    get(); } public interface VectorAggregator { void aggregate(int[] input); int get(); } vs. Aggregate
  28. Vectorization 28 public interface SumAggregator { void aggregate(int input); int

    get(); } public interface VectorAggregator { void aggregate(int[] input); int get(); } vs. • Minimize # of JVM function calls and related overhead • Improve CPU cache locality • Opens up possibility of using vectorized CPU instructions
  29. Rollup • Pre-aggregation at ingestion time • Saves space, better

    compression • Query performance boost
  30. Rollup timestamp page city count sum_added sum_deleted 2011-01-01T00:00:00Z Justin Bieber

    SF 3 50 61 2011-01-01T00:00:00Z Ke$ha LA 2 46 53 2011-01-01T00:00:00Z Miley Cyrus DC 4 198 88 timestamp page city added deleted 2011-01-01T00:01:35Z Justin Bieber SF 10 5 2011-01-01T00:03:45Z Justin Bieber SF 25 37 2011-01-01T00:05:62Z Justin Bieber SF 15 19 2011-01-01T00:06:33Z Ke$ha LA 30 45 2011-01-01T00:08:51Z Ke$ha LA 16 8 2011-01-01T00:09:17Z Miley Cyrus DC 75 10 2011-01-01T00:11:25Z Miley Cyrus DC 11 25 2011-01-01T00:23:30Z Miley Cyrus DC 22 12 2011-01-01T00:49:33Z Miley Cyrus DC 90 41
  31. Roll-up vs no roll-up Do roll-up • No need to

    retain high cardinality dimensions (like user id, precise location information). • All queries are some form of “GROUP BY”. Don’t roll-up • Need the ability to retrieve individual events. • May need to group or filter on any column.
  32. 32

  33. Download Druid community site: https://druid.apache.org/ Druid community site (legacy): http://druid.io/

    Imply distribution: https://imply.io/get-started 33
  34. Contribute 34 https://github.com/apache/druid

  35. Stay in touch 35 @druidio Join the community! http://druid.apache.org/ Follow

    Apache Druid on Twitter!