Slide 1

Slide 1 text

Druid High performance, column-oriented, distributed data store Fangjin Yang Cofounder @ Imply

Slide 2

Slide 2 text

Overview History & Motivation Demo Alternative Architectures Druid Architecture

Slide 3

Slide 3 text

History & Motivation First lines of Druid started in 2011 Initial use case: power ad-tech analytics product Requirements: - Scalable (trillions of events/day, petabytes of data) - Multi-tenant (thousands of current users) - Interactive (low latency queries) - “Real-time” (low latency data ingestion)

Slide 4

Slide 4 text

History & Motivation Druid went open source in late 2012 - GPL license initially - Part-time development until early 2014 Community growth - Apache v2 licensed in early 2015 - 150+ contributors from 100+ organizations In production at many different companies across many verticals - Ad-tech, network traffic, security, finance, gaming, operations, activity streams, etc.

Slide 5

Slide 5 text

Use Cases Powering user-facing analytic applications Unify historical and real-time events Business intelligence/OLAP queries (slice and dice and drill into data) Behavioral analysis (measuring distinct counts, retention analysis, funnel analysis, A/B testing) Exploratory analytics/root cause analysis

Slide 6

Slide 6 text

Demo In case the internet didn’t work, pretend you saw something cool

Slide 7

Slide 7 text

Business Intelligence Queries Event data - Time, dimensions (attributes), measures Business intelligence/OLAP queries - “How much revenue did product X generate last quarter in SF”? - “How many of my users that visited last week returned this week?” - Not dumping entire data set - Not examining single events - Filtering, grouping, and aggregating data - Result set < input set (aggregations)

Slide 8

Slide 8 text

Solution Space Relational databases (MySQL, Postgres) Key/value stores (HBase, Cassandra) General Compute Engines (Hadoop, Spark) Column stores

Slide 9

Slide 9 text

Relational Database Traditional Data Warehouse - Row oriented - Star schema - Aggregates tables & query caches Fast becoming outdated Slow!

Slide 10

Slide 10 text

Key/Value Stores Very fast writes Very fast lookups Timeseries databases often have K/V storage engines

Slide 11

Slide 11 text

Key/Value Stores Pre-computation - Pre-compute every possible query - Pre-compute a set of queries - Exponential scaling costs

Slide 12

Slide 12 text

Key/Value Stores Range scans - Primary key: dimensions/attributes - Value: measures/metrics (things to aggregate) - Still too slow!

Slide 13

Slide 13 text

Key/Value Stores

Slide 14

Slide 14 text

SQL-on-Hadoop Enable ad-hoc queries on different input formats Examples: Impala, Hive, Spark SQL, Drill, Presto

Slide 15

Slide 15 text

SQL-on-Hadoop

Slide 16

Slide 16 text

Column stores Load/scan exactly what you need for a query Different compression algorithms for different columns - Encoding for string columns - Compression for measure columns Different indexes for different columns

Slide 17

Slide 17 text

Druid

Slide 18

Slide 18 text

Druid Custom column format optimized for event data and BI queries Supports lots of concurrent reads Streaming data ingestion Supports extremely fast filters Ideal for powering user-facing analytic applications

Slide 19

Slide 19 text

Storage Format

Slide 20

Slide 20 text

Raw data timestamp domain gender clicked 2011-01-01T00:01:35Z bieber.com Female 1 2011-01-01T00:03:03Z bieber.com Female 0 2011-01-01T00:04:51Z ultra.com Male 1 2011-01-01T00:05:33Z ultra.com Male 1 2011-01-01T00:05:53Z ultra.com Female 0 2011-01-01T00:06:17Z ultra.com Female 1 2011-01-01T00:23:15Z bieber.com Female 0 2011-01-01T00:38:51Z ultra.com Male 1 2011-01-01T00:49:33Z bieber.com Female 1 2011-01-01T00:49:53Z ultra.com Female 0

Slide 21

Slide 21 text

Summarization timestamp domain gender clicked 2011-01-01T00:00:00Z bieber.com Female 1 2011-01-01T00:00:00Z ultra.com Female 2 2011-01-01T00:00:00Z ultra.com Male 3 timestamp domain gender clicked 2011-01-01T00:01:35Z bieber.com Female 1 2011-01-01T00:03:03Z bieber.com Female 0 2011-01-01T00:04:51Z ultra.com Male 1 2011-01-01T00:05:33Z ultra.com Male 1 2011-01-01T00:05:53Z ultra.com Female 0 2011-01-01T00:06:17Z ultra.com Female 1 2011-01-01T00:23:15Z bieber.com Female 0 2011-01-01T00:38:51Z ultra.com Male 1 2011-01-01T00:49:33Z bieber.com Female 1 2011-01-01T00:49:53Z ultra.com Female 0

Slide 22

Slide 22 text

Segmentation

Slide 23

Slide 23 text

Immutable Segments Fundamental storage unit in Druid No contention between reads and writes One thread scans one segment

Slide 24

Slide 24 text

Columnar Storage Create IDs ● Justin Bieber -> 0, Ke$ha -> 1 Store ● page → [0 0 0 1 1 1] ● language → [0 0 0 0 0 0]

Slide 25

Slide 25 text

Columnar Storage Justin Bieber → [0, 1, 2] → [111000] Ke$ha → [3, 4, 5] → [000111] Justin Bieber OR Ke$ha → [111111] Compression!

Slide 26

Slide 26 text

Plugin Architecture Write your own plugins for different computations and components Often used for approximate algorithms - Count distinct (Hyperloglog) - Approximate Histograms - Funnel/behavioral analysis (theta sketches) Approximate algorithms are very powerful for fast queries

Slide 27

Slide 27 text

Approximate Algorithms timestamp domain gender clicked 2011-01-01T00:00:00Z bieber.com Female 1 2011-01-01T00:00:00Z ultra.com Female 2 2011-01-01T00:00:00Z ultra.com Male 3 timestamp domain gender clicked 2011-01-01T00:01:35Z bieber.com Female 1 2011-01-01T00:03:03Z bieber.com Female 0 2011-01-01T00:04:51Z ultra.com Male 1 2011-01-01T00:05:33Z ultra.com Male 1 2011-01-01T00:05:53Z ultra.com Female 0 2011-01-01T00:06:17Z ultra.com Female 1 2011-01-01T00:23:15Z bieber.com Female 0 2011-01-01T00:38:51Z ultra.com Male 1 2011-01-01T00:49:33Z bieber.com Female 1 2011-01-01T00:49:53Z ultra.com Female 0

Slide 28

Slide 28 text

Approximate Algorithms timestamp domain user gender clicked 2011-01-01T00:01:35Z bieber.com 4312345532 Female 1 2011-01-01T00:03:03Z bieber.com 3484920241 Female 0 2011-01-01T00:04:51Z ultra.com 9530174728 Male 1 2011-01-01T00:05:33Z ultra.com 4098310573 Male 1 2011-01-01T00:05:53Z ultra.com 5832058870 Female 0 2011-01-01T00:06:17Z ultra.com 5789283478 Female 1 2011-01-01T00:23:15Z bieber.com 4730093842 Female 0 2011-01-01T00:38:51Z ultra.com 9530174728 Male 1 2011-01-01T00:49:33Z bieber.com 4930097162 Female 1 2011-01-01T00:49:53Z ultra.com 3081837193 Female 0

Slide 29

Slide 29 text

Approximate Algorithms timestamp domain user gender clicked 2011-01-01T00:01:35Z bieber.com 4312345532 Female 1 2011-01-01T00:03:03Z bieber.com 3484920241 Female 0 2011-01-01T00:04:51Z ultra.com 9530174728 Male 1 2011-01-01T00:05:33Z ultra.com 4098310573 Male 1 2011-01-01T00:05:53Z ultra.com 5832058870 Female 0 2011-01-01T00:06:17Z ultra.com 5789283478 Female 1 2011-01-01T00:23:15Z bieber.com 4730093842 Female 0 2011-01-01T00:38:51Z ultra.com 9530174728 Male 1 2011-01-01T00:49:33Z bieber.com 4930097162 Female 1 2011-01-01T00:49:53Z ultra.com 3081837193 Female 0 timestamp domain gender clicked users 2011-01-01T00:00:00Z bieber.com Female 1 {sketch_data structure} 2011-01-01T00:00:00Z ultra.com Female 2 {sketch_data_structure} 2011-01-01T00:00:00Z ultra.com Male 3 {sketch_data_structure}

Slide 30

Slide 30 text

Architecture

Slide 31

Slide 31 text

Architecture (Batch Ingestion)

Slide 32

Slide 32 text

Architecture (Batch Ingestion)

Slide 33

Slide 33 text

Real-time Nodes Write-optimized data structure: hash map in heap Convert write optimized -> read optimized Read-optimized data structure: Druid segments Query data immediately

Slide 34

Slide 34 text

Architecture (Streaming Ingestion)

Slide 35

Slide 35 text

Architecture (Lambda)

Slide 36

Slide 36 text

Querying Query libraries: - JSON over HTTP - SQL - R - Python - Ruby Open source UIs - Pivot - Grafana - Caravel

Slide 37

Slide 37 text

Druid in Production

Slide 38

Slide 38 text

Ingestion >3M events / second sustained (200B+ events/day) 10 – 100k events / second / core

Slide 39

Slide 39 text

Volume Largest known cluster - >500 TB of segments (>50 trillion raw events, >50 PB raw data) Extremely cost effective at scale

Slide 40

Slide 40 text

Queries 500ms average query latency 90% < 1s, 95% < 2S, 99% < 10s

Slide 41

Slide 41 text

Multi-tenancy Several Hundred queries / second Variety of group by & top-K queries

Slide 42

Slide 42 text

Druid & the Data Space

Slide 43

Slide 43 text

End-to-end Data Stack Druid Stream Processor Batch Processor Message bus Events Apps

Slide 44

Slide 44 text

Integration Druid is complementary to many solutions - SQL-on-Hadoop (Hive, Impala, Spark SQL, Drill, Presto) - Stream processors (Storm, Spark streaming, Flink, Samza) - Batch processors (Spark, Hadoop, Flink) - Messages buses (Kafka, RabbitMQ)

Slide 45

Slide 45 text

Takeaway Druid is pretty good for analytic applications Druid is pretty good at fast OLAP queries Druid is pretty good at streaming ingestion Druid works well with existing data infrastructure systems

Slide 46

Slide 46 text

Thanks! http://imply.io/docs/latest/quickstart