Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Druid at Charter

Avatar for Imply Imply
January 24, 2019
2.4k

Druid at Charter

Learn more about how Druid and Imply are used at Charter Communications. Presented at Druid's Denver meetup in 01/2019.

Avatar for Imply

Imply

January 24, 2019
Tweet

Transcript

  1. Agenda • Product Data Platform and Dataset • What we

    do with the Data • Our Druid Story • Future work 2
  2. 3 Supports 30+ consumer facing applications and portals Allows us

    to provide the best experience possible to our customers The data is only as good as the platform (and vice-versa) Product Data Platform
  3. 5 Product Data Quantum • Avro format • Hundreds of

    fields • Heavily iterated data model • Billions of events per day The evaluative data set
  4. Data Use Cases 6 We provide three core data access

    tiers - real-time aggregate, real-time raw, traditional D/W - each with a designated set of use-cases Use Case Realtime Dashboards X X X Alerting X X X A/B testing X X X Anomaly ML X Scheduled business reporting X Data Science X Developer tooling X X Self-service UI X X X Self-service AdHoc X X X
  5. Self-service • Evaluate Pivot UI • Develop complex queries across

    aggregates Performance and Scalability • Develop ingest specs • Understand infrastructure scaling formulae • Query performance Cluster Operations • AWS infrastructure and Imply cloud • Security and monitoring 7 Challenges addressed through success criteria The Druid Story – Proof of Concept
  6. 10 Pivot UI Queries ... { "queryType": "topN", "dataSource": "quantum",

    "intervals": "2019-01-15T20Z/2019-01-16T20Z", "granularity": "all", "filter": { "type": "selector", "dimension": ”charter.api.category", "value": "setup" }, "dimension": { "type": "default", "dimension": ”charter.api.name", "outputName": ”charter.api.name" }, "aggregations": [ { "name": "cd_distinct_ids", "type": "hyperUnique", "fieldName": "distinct_ids", "round": true } ], "metric": "cd_distinct_ids", "threshold": 50 } ...
  7. 12 Self-service goal drives the data design Data Ingest Spec

    Druid Data Design 5 minute granularity for 90 days 1 hour granularity for 1+ years 60 dimensions 20 transforms 30 metrics
  8. 13 Data Ingest Spec – Transforms "transformSpec": { "filter": {

    "type": "selector", "dimension": ”event.discard", "value": ”false" }, "transforms": [ { "type": "expression", "name": "is_apiresponsetime_positive", "expression": "nvl("apiResponseTime", 0) > 0" }, { "type": "expression", "name": "apiresponsetime_normalized", "expression": "min(nvl("apiResponseTime", 0), 60000)" } ] ... }
  9. 14 Data Ingest Spec – Metrics "metricsSpec": [ { "name":

    ”event_count", "type": "count" }, { "name": "apiResponseTime_min", "type": "longMin", "fieldName": "apiresponsetime_normalized " }, { "name": "apiResponseTime_max", "type": "longMax", "fieldName": "apiresponsetime_normalized" }, { "name": "apiResponseTime_sum", "type": "longSum", "fieldName": "apiresponsetime_normalized" } ... ]
  10. 15 Data Ingest Spec – Approximate Aggregates "metricsSpec": [ {

    "name": "apiResponseTime", "type": "approxHistogramFold", "fieldName": "apiresponsetime_normalized" }, { "name": "distinct_ids", "type": "hyperUnique", "fieldName": ”event.id" } ... ]
  11. 16 Self-service data recency and ingest tasks load at peak

    Ingestion Depends On • Ingest broken down into tasks • Ingest must keep up at peak • Maintain task buffer for ingest spec changes In our case • 20 tasks per source (5m and 1h) • 40 total ingest tasks • 40 buffer tasks for ingest changes • 80 tasks for current load
  12. Depends On • Cardinality of the set • Width of

    rows • Replication • Retention In our case • 20M rows / hour • 2x replication • 300GB/day for 5 minute ingest • 180GB/day for 1 hour ingest • 100TB for 90 days + 1 year 17 Self-service data availability and storage requirements Storage
  13. • i3.8xlarge Data Nodes 32 CPUs 244GB RAM 7TB disk

    • 20 Data Nodes Supports 200 ingest tasks 5TB RAM 128TB disk 18 AWS EC2 Footprint POC Infrastructure
  14. 23 Imply Cloud Management UI Imply Cloud UI • Provisioning

    & deployment • Upgrades • Elastic scaling • Druid extensions • Clarity metrics
  15. 25 Identified next steps on the road to full production

    status Druid POC follow up • Upgrades • Historical production backfill • Expected improvements • Security • User adoption phases