Druid at Charter

Druid at Charter Spectrum January 24th, 2019 1 N AT
E V O G E L A N D Y A M I C K

Agenda • Product Data Platform and Dataset • What we
do with the Data • Our Druid Story • Future work 2

3 Supports 30+ consumer facing applications and portals Allows us
to provide the best experience possible to our customers The data is only as good as the platform (and vice-versa) Product Data Platform

4 Product Data Platform

5 Product Data Quantum • Avro format • Hundreds of
fields • Heavily iterated data model • Billions of events per day The evaluative data set

Data Use Cases 6 We provide three core data access
tiers - real-time aggregate, real-time raw, traditional D/W - each with a designated set of use-cases Use Case Realtime Dashboards X X X Alerting X X X A/B testing X X X Anomaly ML X Scheduled business reporting X Data Science X Developer tooling X X Self-service UI X X X Self-service AdHoc X X X

Self-service • Evaluate Pivot UI • Develop complex queries across
aggregates Performance and Scalability • Develop ingest specs • Understand infrastructure scaling formulae • Query performance Cluster Operations • AWS infrastructure and Imply cloud • Security and monitoring 7 Challenges addressed through success criteria The Druid Story – Proof of Concept

8 Self-Service

9 Pivot UI

10 Pivot UI Queries ... { "queryType": "topN", "dataSource": "quantum",
"intervals": "2019-01-15T20Z/2019-01-16T20Z", "granularity": "all", "filter": { "type": "selector", "dimension": ”charter.api.category", "value": "setup" }, "dimension": { "type": "default", "dimension": ”charter.api.name", "outputName": ”charter.api.name" }, "aggregations": [ { "name": "cd_distinct_ids", "type": "hyperUnique", "fieldName": "distinct_ids", "round": true } ], "metric": "cd_distinct_ids", "threshold": 50 } ...

11 Performance and Scalability

12 Self-service goal drives the data design Data Ingest Spec
Druid Data Design 5 minute granularity for 90 days 1 hour granularity for 1+ years 60 dimensions 20 transforms 30 metrics

13 Data Ingest Spec – Transforms "transformSpec": { "filter": {
"type": "selector", "dimension": ”event.discard", "value": ”false" }, "transforms": [ { "type": "expression", "name": "is_apiresponsetime_positive", "expression": "nvl("apiResponseTime", 0) > 0" }, { "type": "expression", "name": "apiresponsetime_normalized", "expression": "min(nvl("apiResponseTime", 0), 60000)" } ] ... }

14 Data Ingest Spec – Metrics "metricsSpec": [ { "name":
”event_count", "type": "count" }, { "name": "apiResponseTime_min", "type": "longMin", "fieldName": "apiresponsetime_normalized " }, { "name": "apiResponseTime_max", "type": "longMax", "fieldName": "apiresponsetime_normalized" }, { "name": "apiResponseTime_sum", "type": "longSum", "fieldName": "apiresponsetime_normalized" } ... ]

15 Data Ingest Spec – Approximate Aggregates "metricsSpec": [ {
"name": "apiResponseTime", "type": "approxHistogramFold", "fieldName": "apiresponsetime_normalized" }, { "name": "distinct_ids", "type": "hyperUnique", "fieldName": ”event.id" } ... ]

16 Self-service data recency and ingest tasks load at peak
Ingestion Depends On • Ingest broken down into tasks • Ingest must keep up at peak • Maintain task buffer for ingest spec changes In our case • 20 tasks per source (5m and 1h) • 40 total ingest tasks • 40 buffer tasks for ingest changes • 80 tasks for current load

Depends On • Cardinality of the set • Width of
rows • Replication • Retention In our case • 20M rows / hour • 2x replication • 300GB/day for 5 minute ingest • 180GB/day for 1 hour ingest • 100TB for 90 days + 1 year 17 Self-service data availability and storage requirements Storage

• i3.8xlarge Data Nodes 32 CPUs 244GB RAM 7TB disk
• 20 Data Nodes Supports 200 ingest tasks 5TB RAM 128TB disk 18 AWS EC2 Footprint POC Infrastructure

19 Query Performance – Time

20 Query Performance – Rows Processed

21 Cluster Operations

22 AWS VPCs

23 Imply Cloud Management UI Imply Cloud UI • Provisioning
& deployment • Upgrades • Elastic scaling • Druid extensions • Clarity metrics

• Cloudwatch metrics 24 Running it right means running it
well Monitoring and Alerting

25 Identified next steps on the road to full production
status Druid POC follow up • Upgrades • Historical production backfill • Expected improvements • Security • User adoption phases

T H A N K Y O U . Questions?
26

Druid at Charter

Druid at Charter

Imply

More Decks by Imply

Featured

Transcript

Druid at Charter Spectrum January 24th, 2019 1 N AT

Agenda • Product Data Platform and Dataset • What we

3 Supports 30+ consumer facing applications and portals Allows us

4 Product Data Platform

5 Product Data Quantum • Avro format • Hundreds of

Data Use Cases 6 We provide three core data access

Self-service • Evaluate Pivot UI • Develop complex queries across

8 Self-Service

9 Pivot UI

10 Pivot UI Queries ... { "queryType": "topN", "dataSource": "quantum",

11 Performance and Scalability

12 Self-service goal drives the data design Data Ingest Spec

13 Data Ingest Spec – Transforms "transformSpec": { "filter": {

14 Data Ingest Spec – Metrics "metricsSpec": [ { "name":

15 Data Ingest Spec – Approximate Aggregates "metricsSpec": [ {

16 Self-service data recency and ingest tasks load at peak

Depends On • Cardinality of the set • Width of

• i3.8xlarge Data Nodes 32 CPUs 244GB RAM 7TB disk

19 Query Performance – Time

20 Query Performance – Rows Processed

21 Cluster Operations

22 AWS VPCs

23 Imply Cloud Management UI Imply Cloud UI • Provisioning

• Cloudwatch metrics 24 Running it right means running it

25 Identified next steps on the road to full production

T H A N K Y O U . Questions?