Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Druid at Charter

January 24, 2019

Druid at Charter

Learn more about how Druid and Imply are used at Charter Communications. Presented at Druid's Denver meetup in 01/2019.


January 24, 2019


  1. Agenda • Product Data Platform and Dataset • What we

    do with the Data • Our Druid Story • Future work 2
  2. 3 Supports 30+ consumer facing applications and portals Allows us

    to provide the best experience possible to our customers The data is only as good as the platform (and vice-versa) Product Data Platform
  3. 5 Product Data Quantum • Avro format • Hundreds of

    fields • Heavily iterated data model • Billions of events per day The evaluative data set
  4. Data Use Cases 6 We provide three core data access

    tiers - real-time aggregate, real-time raw, traditional D/W - each with a designated set of use-cases Use Case Realtime Dashboards X X X Alerting X X X A/B testing X X X Anomaly ML X Scheduled business reporting X Data Science X Developer tooling X X Self-service UI X X X Self-service AdHoc X X X
  5. Self-service • Evaluate Pivot UI • Develop complex queries across

    aggregates Performance and Scalability • Develop ingest specs • Understand infrastructure scaling formulae • Query performance Cluster Operations • AWS infrastructure and Imply cloud • Security and monitoring 7 Challenges addressed through success criteria The Druid Story – Proof of Concept
  6. 10 Pivot UI Queries ... { "queryType": "topN", "dataSource": "quantum",

    "intervals": "2019-01-15T20Z/2019-01-16T20Z", "granularity": "all", "filter": { "type": "selector", "dimension": ”charter.api.category", "value": "setup" }, "dimension": { "type": "default", "dimension": ”charter.api.name", "outputName": ”charter.api.name" }, "aggregations": [ { "name": "cd_distinct_ids", "type": "hyperUnique", "fieldName": "distinct_ids", "round": true } ], "metric": "cd_distinct_ids", "threshold": 50 } ...
  7. 12 Self-service goal drives the data design Data Ingest Spec

    Druid Data Design 5 minute granularity for 90 days 1 hour granularity for 1+ years 60 dimensions 20 transforms 30 metrics
  8. 13 Data Ingest Spec – Transforms "transformSpec": { "filter": {

    "type": "selector", "dimension": ”event.discard", "value": ”false" }, "transforms": [ { "type": "expression", "name": "is_apiresponsetime_positive", "expression": "nvl("apiResponseTime", 0) > 0" }, { "type": "expression", "name": "apiresponsetime_normalized", "expression": "min(nvl("apiResponseTime", 0), 60000)" } ] ... }
  9. 14 Data Ingest Spec – Metrics "metricsSpec": [ { "name":

    ”event_count", "type": "count" }, { "name": "apiResponseTime_min", "type": "longMin", "fieldName": "apiresponsetime_normalized " }, { "name": "apiResponseTime_max", "type": "longMax", "fieldName": "apiresponsetime_normalized" }, { "name": "apiResponseTime_sum", "type": "longSum", "fieldName": "apiresponsetime_normalized" } ... ]
  10. 15 Data Ingest Spec – Approximate Aggregates "metricsSpec": [ {

    "name": "apiResponseTime", "type": "approxHistogramFold", "fieldName": "apiresponsetime_normalized" }, { "name": "distinct_ids", "type": "hyperUnique", "fieldName": ”event.id" } ... ]
  11. 16 Self-service data recency and ingest tasks load at peak

    Ingestion Depends On • Ingest broken down into tasks • Ingest must keep up at peak • Maintain task buffer for ingest spec changes In our case • 20 tasks per source (5m and 1h) • 40 total ingest tasks • 40 buffer tasks for ingest changes • 80 tasks for current load
  12. Depends On • Cardinality of the set • Width of

    rows • Replication • Retention In our case • 20M rows / hour • 2x replication • 300GB/day for 5 minute ingest • 180GB/day for 1 hour ingest • 100TB for 90 days + 1 year 17 Self-service data availability and storage requirements Storage
  13. • i3.8xlarge Data Nodes 32 CPUs 244GB RAM 7TB disk

    • 20 Data Nodes Supports 200 ingest tasks 5TB RAM 128TB disk 18 AWS EC2 Footprint POC Infrastructure
  14. 23 Imply Cloud Management UI Imply Cloud UI • Provisioning

    & deployment • Upgrades • Elastic scaling • Druid extensions • Clarity metrics
  15. 25 Identified next steps on the road to full production

    status Druid POC follow up • Upgrades • Historical production backfill • Expected improvements • Security • User adoption phases