Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Inside Apache Druid: Designed for Performance

Imply
May 13, 2019

Inside Apache Druid: Designed for Performance

Apache Druid is a modern analytical database that implements a memory-mappable storage format, indexes, compression, late tuple materialization, and a query engine that can operate directly on compressed data. There is a patch out to add vectorized processing as well, which we can expect to see show up in a future release. This talk goes into detail on how Druid's query processing layer works and how each component contributes to achieving top performance for analytical queries.

Imply

May 13, 2019
Tweet

More Decks by Imply

Other Decks in Technology

Transcript

  1. Who am I? Gian Merlino Committer & PMC member on

    Cofounder at 10 years working on scalable systems 2
  2. Agenda • Why does Apache Druid exist? • The problem

    • How does it work? • Do try this at home! 3
  3. Where Druid fits in 5 Confidential. Do not redistribute. Data

    Data Data Data Sources Stream processors Data lake / stream hub Real-time analytics Stream processing ETL Storage Apps Archive to data lake
  4. 7 “Druid is an open source data store designed for

    real-time exploratory analytics on large data sets. Druid was originally developed to power a slice-and-dice analytical UI built on top of large event streams. The original use case for Druid targeted ingest rates of millions of records/sec, retention of over a year of data, and query latencies of sub-second to a few seconds.” Source: https://wiki.apache.org/incubator/DruidProposal
  5. Challenges • Scale: when data is large, we need a

    lot of servers • Speed: aiming for sub-second response time • Complexity: too much fine grain to precompute • High dimensionality: 10s or 100s of dimensions • Concurrency: many users and tenants • Freshness: load from streams 9
  6. What is Druid? • “high performance”: bread-and-butter fast scan rates

    + ‘tricks’ • “real-time”: streaming ingestion, interactive query speeds • “analytics”: counting, ranking, groupBy, time trend • “database”: the cluster stores a copy of your data and helps you manage it 11
  7. Key features • Column oriented • High concurrency • Scalable

    to 100s of servers, millions of messages/sec • Continuous, real-time ingest • Query through SQL • Target query latency sub-second to a few seconds 12
  8. Use cases • Clickstreams, user behavior • Digital advertising •

    Application performance management • Network flows • IoT 13
  9. Powered by Druid “The performance is great ... some of

    the tables that we have internally in Druid have billions and billions of events in them, and we’re scanning them in under a second.” 15 Source: https://www.infoworld.com/article/2949168/hadoop/yahoo-struts-its-hadoop-stuff.html From Yahoo:
  10. Why this works • Computers are fast these days •

    Indexes help save work and cost • But don’t be afraid to scan tables — it can be done efficiently 16
  11. “Bad programmers worry about the code. Good programmers worry about

    data structures and their relationships.” ― Linus Torvalds
  12. Tricks of the trade • Druid stores data in immutable

    segments • Global time index • Secondary indexes on individual columns • Memory-mappable compressed columns • Late tuple materialization • Query engines operate directly on compressed data • Vectorization (coming soon) • Rollup (partial aggregation)
  13. Druid Segments 2011-01-01T00:01:35Z Justin Bieber SF 10 5 2011-01-01T00:03:45Z Justin

    Bieber LA 25 37 2011-01-01T00:05:62Z Justin Bieber SF 15 19 2011-01-01T01:06:33Z Ke$ha LA 30 45 2011-01-01T01:08:51Z Ke$ha LA 16 8 2011-01-01T01:09:17Z Miley Cyrus DC 75 10 2011-01-01T02:23:30Z Miley Cyrus DC 22 12 2011-01-01T02:49:33Z Miley Cyrus DC 90 41 Segment 2011-01-01T00/2011-01-01T01 Segment 2011-01-01T01/2011-01-01T02 Segment 2011-01-01T02/2011-01-01T03 timestamp page city added deleted Enables global time index.
  14. page (STRING) Anatomy of a Druid Segment Physical storage format

    removed (LONG) __time (LONG) 1293840000000 1293840000000 1293840000000 1293840000000 1293840000000 1293840000000 1293840000000 1293840000000 DATA DICT INDEX 0 0 0 1 1 2 2 2 Justin = 0 Ke$ha = 1 Miley = 2 [0,1,2](11100000) [3,4] (00011000) [5,6,7](0000111) 25 42 17 170 112 67 53 94 DATA 2 1 2 1 1 0 0 0 IND [0,2] (10100000) [1,3,4](01011000) [5,6,7](00000111) DICT DC = 0 LA=1 SF = 2 INDEX city (STRING) added (LONG) 1800 2912 1953 3194 5690 1100 8423 9080 Dict encoded (sorted) Bitmap index (stored compressed)
  15. Filtering with indexes timestamp page 2011-01-01T00:01:35Z Justin Bieber 2011-01-01T00:03:45Z Justin

    Bieber 2011-01-01T00:05:62Z Justin Bieber 2011-01-01T00:06:33Z Ke$ha 2011-01-01T00:08:51Z Ke$ha JB or KS [ 1 1 1 1 1] Justin Bieber [1 1 1 0 0] Ke$ha [0 0 0 1 1]
  16. Late materialization 25 Scan Filter Aggregate Scan is a no-op!

    Load column data necessary for filtering, cache it in case it’s used for later operators too Load column data necessary for aggregation
  17. Operating on compressed data 26 • Recall that columnar data

    is dictionary-encoded as a form of compression • Aggregation operators read dictionary codes and use them as array slots or keys in hashtable • Only when merging inter-segment results (different dictionaries) is a dictionary lookup done Aggregate
  18. Vectorization 27 public interface SumAggregator { void aggregate(int input); int

    get(); } public interface VectorAggregator { void aggregate(int[] input); int get(); } vs. Aggregate
  19. Vectorization 28 public interface SumAggregator { void aggregate(int input); int

    get(); } public interface VectorAggregator { void aggregate(int[] input); int get(); } vs. • Minimize # of JVM function calls and related overhead • Improve CPU cache locality • Opens up possibility of using vectorized CPU instructions
  20. Rollup timestamp page city count sum_added sum_deleted 2011-01-01T00:00:00Z Justin Bieber

    SF 3 50 61 2011-01-01T00:00:00Z Ke$ha LA 2 46 53 2011-01-01T00:00:00Z Miley Cyrus DC 4 198 88 timestamp page city added deleted 2011-01-01T00:01:35Z Justin Bieber SF 10 5 2011-01-01T00:03:45Z Justin Bieber SF 25 37 2011-01-01T00:05:62Z Justin Bieber SF 15 19 2011-01-01T00:06:33Z Ke$ha LA 30 45 2011-01-01T00:08:51Z Ke$ha LA 16 8 2011-01-01T00:09:17Z Miley Cyrus DC 75 10 2011-01-01T00:11:25Z Miley Cyrus DC 11 25 2011-01-01T00:23:30Z Miley Cyrus DC 22 12 2011-01-01T00:49:33Z Miley Cyrus DC 90 41
  21. Roll-up vs no roll-up Do roll-up • No need to

    retain high cardinality dimensions (like user id, precise location information). • All queries are some form of “GROUP BY”. Don’t roll-up • Need the ability to retrieve individual events. • May need to group or filter on any column.
  22. 32