Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Inside Apache Druid: Designed for Performance

Imply
May 13, 2019

Inside Apache Druid: Designed for Performance

Apache Druid is a modern analytical database that implements a memory-mappable storage format, indexes, compression, late tuple materialization, and a query engine that can operate directly on compressed data. There is a patch out to add vectorized processing as well, which we can expect to see show up in a future release. This talk goes into detail on how Druid's query processing layer works and how each component contributes to achieving top performance for analytical queries.

Imply

May 13, 2019
Tweet

More Decks by Imply

Other Decks in Technology

Transcript

  1. Inside Apache Druid
    Designed for Performance
    Gian Merlino
    [email protected]

    View Slide

  2. Who am I?
    Gian Merlino
    Committer & PMC member on
    Cofounder at
    10 years working on scalable systems
    2

    View Slide

  3. Agenda
    ● Why does Apache Druid exist?
    ● The problem
    ● How does it work?
    ● Do try this at home!
    3

    View Slide

  4. Why does Apache Druid exist?
    4

    View Slide

  5. Where Druid fits in
    5
    Confidential. Do not redistribute.
    Data
    Data
    Data
    Data Sources
    Stream processors
    Data lake /
    stream hub
    Real-time
    analytics
    Stream
    processing
    ETL
    Storage
    Apps
    Archive to
    data lake

    View Slide

  6. The problem
    6

    View Slide

  7. 7
    “Druid is an open source data store designed for real-time
    exploratory analytics on large data sets. Druid was originally
    developed to power a slice-and-dice analytical UI built on top of
    large event streams. The original use case for Druid targeted
    ingest rates of millions of records/sec, retention of over a year of
    data, and query latencies of sub-second to a few seconds.”
    Source: https://wiki.apache.org/incubator/DruidProposal

    View Slide

  8. The problem
    8

    View Slide

  9. Challenges
    ● Scale: when data is large, we need a lot of servers
    ● Speed: aiming for sub-second response time
    ● Complexity: too much fine grain to precompute
    ● High dimensionality: 10s or 100s of dimensions
    ● Concurrency: many users and tenants
    ● Freshness: load from streams
    9

    View Slide

  10. 10
    high performance
    real-time analytics
    database

    View Slide

  11. What is Druid?
    ● “high performance”: bread-and-butter fast scan rates + ‘tricks’
    ● “real-time”: streaming ingestion, interactive query speeds
    ● “analytics”: counting, ranking, groupBy, time trend
    ● “database”: the cluster stores a copy of your data and helps
    you manage it
    11

    View Slide

  12. Key features
    ● Column oriented
    ● High concurrency
    ● Scalable to 100s of servers, millions of messages/sec
    ● Continuous, real-time ingest
    ● Query through SQL
    ● Target query latency sub-second to a few seconds
    12

    View Slide

  13. Use cases
    ● Clickstreams, user behavior
    ● Digital advertising
    ● Application performance management
    ● Network flows
    ● IoT
    13

    View Slide

  14. Powered by Druid
    14
    Source: http://druid.io/druid-powered.html
    + many more!

    View Slide

  15. Powered by Druid
    “The performance is great ... some of the tables that we have
    internally in Druid have billions and billions of events in them,
    and we’re scanning them in under a second.”
    15
    Source: https://www.infoworld.com/article/2949168/hadoop/yahoo-struts-its-hadoop-stuff.html
    From Yahoo:

    View Slide

  16. Why this works
    ● Computers are fast these days
    ● Indexes help save work and cost
    ● But don’t be afraid to scan tables — it can be done efficiently
    16

    View Slide

  17. 17
    How does it
    work?

    View Slide

  18. “Bad programmers worry about the code. Good
    programmers worry about data structures and their
    relationships.”
    ― Linus Torvalds

    View Slide

  19. Tricks of the trade
    ● Druid stores data in immutable segments
    ● Global time index
    ● Secondary indexes on individual columns
    ● Memory-mappable compressed columns
    ● Late tuple materialization
    ● Query engines operate directly on compressed data
    ● Vectorization (coming soon)
    ● Rollup (partial aggregation)

    View Slide

  20. Druid’s logical data model
    Timestamp Dimensions Metrics

    View Slide

  21. Druid Segments
    2011-01-01T00:01:35Z Justin Bieber SF 10 5
    2011-01-01T00:03:45Z Justin Bieber LA 25 37
    2011-01-01T00:05:62Z Justin Bieber SF 15 19
    2011-01-01T01:06:33Z Ke$ha LA 30 45
    2011-01-01T01:08:51Z Ke$ha LA 16 8
    2011-01-01T01:09:17Z Miley Cyrus DC 75 10
    2011-01-01T02:23:30Z Miley Cyrus DC 22 12
    2011-01-01T02:49:33Z Miley Cyrus DC 90 41
    Segment 2011-01-01T00/2011-01-01T01
    Segment 2011-01-01T01/2011-01-01T02
    Segment 2011-01-01T02/2011-01-01T03
    timestamp page city added deleted
    Enables global time index.

    View Slide

  22. page (STRING)
    Anatomy of a Druid Segment
    Physical storage format
    removed
    (LONG)
    __time (LONG)
    1293840000000
    1293840000000
    1293840000000
    1293840000000
    1293840000000
    1293840000000
    1293840000000
    1293840000000
    DATA
    DICT
    INDEX
    0
    0
    0
    1
    1
    2
    2
    2
    Justin = 0
    Ke$ha = 1
    Miley = 2
    [0,1,2](11100000)
    [3,4] (00011000)
    [5,6,7](0000111)
    25
    42
    17
    170
    112
    67
    53
    94
    DATA
    2
    1
    2
    1
    1
    0
    0
    0
    IND
    [0,2] (10100000)
    [1,3,4](01011000)
    [5,6,7](00000111)
    DICT
    DC = 0
    LA=1
    SF = 2
    INDEX
    city (STRING)
    added
    (LONG)
    1800
    2912
    1953
    3194
    5690
    1100
    8423
    9080
    Dict encoded
    (sorted)
    Bitmap index
    (stored compressed)

    View Slide

  23. Filtering with indexes
    timestamp page
    2011-01-01T00:01:35Z Justin Bieber
    2011-01-01T00:03:45Z Justin Bieber
    2011-01-01T00:05:62Z Justin Bieber
    2011-01-01T00:06:33Z Ke$ha
    2011-01-01T00:08:51Z Ke$ha
    JB or KS
    [ 1 1 1 1 1]
    Justin Bieber
    [1 1 1 0 0]
    Ke$ha
    [0 0 0 1 1]

    View Slide

  24. Query execution pipeline
    24
    Scan
    Filter
    Project
    Aggregate
    Filter
    Project
    Sort
    Druid’s query
    execution
    pipeline

    View Slide

  25. Late materialization
    25
    Scan
    Filter
    Aggregate
    Scan is a no-op!
    Load column data necessary for
    filtering, cache it in case it’s used for
    later operators too
    Load column data necessary for aggregation

    View Slide

  26. Operating on compressed data
    26
    ● Recall that columnar data is
    dictionary-encoded as a form of
    compression
    ● Aggregation operators read
    dictionary codes and use them as
    array slots or keys in hashtable
    ● Only when merging
    inter-segment results (different
    dictionaries) is a dictionary lookup
    done
    Aggregate

    View Slide

  27. Vectorization
    27
    public interface SumAggregator
    {
    void aggregate(int input);
    int get();
    }
    public interface VectorAggregator
    {
    void aggregate(int[] input);
    int get();
    }
    vs.
    Aggregate

    View Slide

  28. Vectorization
    28
    public interface SumAggregator
    {
    void aggregate(int input);
    int get();
    }
    public interface VectorAggregator
    {
    void aggregate(int[] input);
    int get();
    }
    vs.
    ● Minimize # of JVM function calls and related overhead
    ● Improve CPU cache locality
    ● Opens up possibility of using vectorized CPU instructions

    View Slide

  29. Rollup
    ● Pre-aggregation at ingestion
    time
    ● Saves space, better
    compression
    ● Query performance boost

    View Slide

  30. Rollup
    timestamp page city count sum_added sum_deleted
    2011-01-01T00:00:00Z
    Justin
    Bieber
    SF 3 50 61
    2011-01-01T00:00:00Z Ke$ha LA 2 46 53
    2011-01-01T00:00:00Z Miley
    Cyrus
    DC 4 198 88
    timestamp page city added deleted
    2011-01-01T00:01:35Z
    Justin
    Bieber
    SF 10 5
    2011-01-01T00:03:45Z
    Justin
    Bieber
    SF 25 37
    2011-01-01T00:05:62Z
    Justin
    Bieber
    SF 15 19
    2011-01-01T00:06:33Z Ke$ha LA 30 45
    2011-01-01T00:08:51Z Ke$ha LA 16 8
    2011-01-01T00:09:17Z
    Miley
    Cyrus
    DC 75 10
    2011-01-01T00:11:25Z
    Miley
    Cyrus
    DC 11 25
    2011-01-01T00:23:30Z
    Miley
    Cyrus
    DC 22 12
    2011-01-01T00:49:33Z
    Miley
    Cyrus
    DC 90 41

    View Slide

  31. Roll-up vs no roll-up
    Do roll-up
    ● No need to retain high cardinality dimensions (like user id, precise
    location information).
    ● All queries are some form of “GROUP BY”.
    Don’t roll-up
    ● Need the ability to retrieve individual events.
    ● May need to group or filter on any column.

    View Slide

  32. 32

    View Slide

  33. Download
    Druid community site: https://druid.apache.org/
    Druid community site (legacy): http://druid.io/
    Imply distribution: https://imply.io/get-started
    33

    View Slide

  34. Contribute
    34
    https://github.com/apache/druid

    View Slide

  35. Stay in touch
    35
    @druidio
    Join the community!
    http://druid.apache.org/
    Follow Apache Druid on Twitter!

    View Slide