Druid at Charter

26290e7e829b985a6bcb44da8213029e?s=47 Imply
January 24, 2019
1.4k

Druid at Charter

Learn more about how Druid and Imply are used at Charter Communications. Presented at Druid's Denver meetup in 01/2019.

26290e7e829b985a6bcb44da8213029e?s=128

Imply

January 24, 2019
Tweet

Transcript

  1. Druid at Charter Spectrum January 24th, 2019 1 N AT

    E V O G E L A N D Y A M I C K
  2. Agenda • Product Data Platform and Dataset • What we

    do with the Data • Our Druid Story • Future work 2
  3. 3 Supports 30+ consumer facing applications and portals Allows us

    to provide the best experience possible to our customers The data is only as good as the platform (and vice-versa) Product Data Platform
  4. 4 Product Data Platform

  5. 5 Product Data Quantum • Avro format • Hundreds of

    fields • Heavily iterated data model • Billions of events per day The evaluative data set
  6. Data Use Cases 6 We provide three core data access

    tiers - real-time aggregate, real-time raw, traditional D/W - each with a designated set of use-cases Use Case Realtime Dashboards X X X Alerting X X X A/B testing X X X Anomaly ML X Scheduled business reporting X Data Science X Developer tooling X X Self-service UI X X X Self-service AdHoc X X X
  7. Self-service • Evaluate Pivot UI • Develop complex queries across

    aggregates Performance and Scalability • Develop ingest specs • Understand infrastructure scaling formulae • Query performance Cluster Operations • AWS infrastructure and Imply cloud • Security and monitoring 7 Challenges addressed through success criteria The Druid Story – Proof of Concept
  8. 8 Self-Service

  9. 9 Pivot UI

  10. 10 Pivot UI Queries ... { "queryType": "topN", "dataSource": "quantum",

    "intervals": "2019-01-15T20Z/2019-01-16T20Z", "granularity": "all", "filter": { "type": "selector", "dimension": ”charter.api.category", "value": "setup" }, "dimension": { "type": "default", "dimension": ”charter.api.name", "outputName": ”charter.api.name" }, "aggregations": [ { "name": "cd_distinct_ids", "type": "hyperUnique", "fieldName": "distinct_ids", "round": true } ], "metric": "cd_distinct_ids", "threshold": 50 } ...
  11. 11 Performance and Scalability

  12. 12 Self-service goal drives the data design Data Ingest Spec

    Druid Data Design 5 minute granularity for 90 days 1 hour granularity for 1+ years 60 dimensions 20 transforms 30 metrics
  13. 13 Data Ingest Spec – Transforms "transformSpec": { "filter": {

    "type": "selector", "dimension": ”event.discard", "value": ”false" }, "transforms": [ { "type": "expression", "name": "is_apiresponsetime_positive", "expression": "nvl("apiResponseTime", 0) > 0" }, { "type": "expression", "name": "apiresponsetime_normalized", "expression": "min(nvl("apiResponseTime", 0), 60000)" } ] ... }
  14. 14 Data Ingest Spec – Metrics "metricsSpec": [ { "name":

    ”event_count", "type": "count" }, { "name": "apiResponseTime_min", "type": "longMin", "fieldName": "apiresponsetime_normalized " }, { "name": "apiResponseTime_max", "type": "longMax", "fieldName": "apiresponsetime_normalized" }, { "name": "apiResponseTime_sum", "type": "longSum", "fieldName": "apiresponsetime_normalized" } ... ]
  15. 15 Data Ingest Spec – Approximate Aggregates "metricsSpec": [ {

    "name": "apiResponseTime", "type": "approxHistogramFold", "fieldName": "apiresponsetime_normalized" }, { "name": "distinct_ids", "type": "hyperUnique", "fieldName": ”event.id" } ... ]
  16. 16 Self-service data recency and ingest tasks load at peak

    Ingestion Depends On • Ingest broken down into tasks • Ingest must keep up at peak • Maintain task buffer for ingest spec changes In our case • 20 tasks per source (5m and 1h) • 40 total ingest tasks • 40 buffer tasks for ingest changes • 80 tasks for current load
  17. Depends On • Cardinality of the set • Width of

    rows • Replication • Retention In our case • 20M rows / hour • 2x replication • 300GB/day for 5 minute ingest • 180GB/day for 1 hour ingest • 100TB for 90 days + 1 year 17 Self-service data availability and storage requirements Storage
  18. • i3.8xlarge Data Nodes 32 CPUs 244GB RAM 7TB disk

    • 20 Data Nodes Supports 200 ingest tasks 5TB RAM 128TB disk 18 AWS EC2 Footprint POC Infrastructure
  19. 19 Query Performance – Time

  20. 20 Query Performance – Rows Processed

  21. 21 Cluster Operations

  22. 22 AWS VPCs

  23. 23 Imply Cloud Management UI Imply Cloud UI • Provisioning

    & deployment • Upgrades • Elastic scaling • Druid extensions • Clarity metrics
  24. • Cloudwatch metrics 24 Running it right means running it

    well Monitoring and Alerting
  25. 25 Identified next steps on the road to full production

    status Druid POC follow up • Upgrades • Historical production backfill • Expected improvements • Security • User adoption phases
  26. T H A N K Y O U . Questions?

    26