$30 off During Our Annual Pro Sale. View Details »

Druid at Charter

Imply
January 24, 2019
2k

Druid at Charter

Learn more about how Druid and Imply are used at Charter Communications. Presented at Druid's Denver meetup in 01/2019.

Imply

January 24, 2019
Tweet

Transcript

  1. Druid at Charter Spectrum
    January 24th, 2019
    1
    N AT E V O G E L
    A N D Y A M I C K

    View Slide

  2. Agenda
    • Product Data Platform and Dataset
    • What we do with the Data
    • Our Druid Story
    • Future work
    2

    View Slide

  3. 3
    Supports 30+ consumer facing applications and portals
    Allows us to provide the best experience possible to our customers
    The data is only as good as the platform (and vice-versa)
    Product Data Platform

    View Slide

  4. 4
    Product Data Platform

    View Slide

  5. 5
    Product Data Quantum
    • Avro format
    • Hundreds of fields
    • Heavily iterated data model
    • Billions of events per day
    The evaluative data set

    View Slide

  6. Data Use Cases
    6
    We provide three core data access tiers - real-time aggregate, real-time raw, traditional D/W - each with a designated set of use-cases
    Use Case
    Realtime Dashboards X X X
    Alerting X X X
    A/B testing X X X
    Anomaly ML X
    Scheduled business reporting X
    Data Science X
    Developer tooling X X
    Self-service UI X X X
    Self-service AdHoc X X X

    View Slide

  7. Self-service
    • Evaluate Pivot UI
    • Develop complex queries across aggregates
    Performance and Scalability
    • Develop ingest specs
    • Understand infrastructure scaling formulae
    • Query performance
    Cluster Operations
    • AWS infrastructure and Imply cloud
    • Security and monitoring
    7
    Challenges addressed through success criteria
    The Druid Story – Proof of Concept

    View Slide

  8. 8
    Self-Service

    View Slide

  9. 9
    Pivot UI

    View Slide

  10. 10
    Pivot UI Queries
    ...
    {
    "queryType": "topN",
    "dataSource": "quantum",
    "intervals": "2019-01-15T20Z/2019-01-16T20Z",
    "granularity": "all",
    "filter": {
    "type": "selector",
    "dimension": ”charter.api.category",
    "value": "setup"
    },
    "dimension": {
    "type": "default",
    "dimension": ”charter.api.name",
    "outputName": ”charter.api.name"
    },
    "aggregations": [
    {
    "name": "cd_distinct_ids",
    "type": "hyperUnique",
    "fieldName": "distinct_ids",
    "round": true
    }
    ],
    "metric": "cd_distinct_ids",
    "threshold": 50
    }
    ...

    View Slide

  11. 11
    Performance and Scalability

    View Slide

  12. 12
    Self-service goal drives the data design
    Data Ingest Spec
    Druid Data Design
    5 minute granularity for 90 days
    1 hour granularity for 1+ years
    60 dimensions
    20 transforms
    30 metrics

    View Slide

  13. 13
    Data Ingest Spec – Transforms
    "transformSpec": {
    "filter": {
    "type": "selector",
    "dimension": ”event.discard",
    "value": ”false"
    },
    "transforms": [
    {
    "type": "expression",
    "name": "is_apiresponsetime_positive",
    "expression": "nvl("apiResponseTime", 0) > 0"
    },
    {
    "type": "expression",
    "name": "apiresponsetime_normalized",
    "expression": "min(nvl("apiResponseTime", 0), 60000)"
    }
    ]
    ...
    }

    View Slide

  14. 14
    Data Ingest Spec – Metrics
    "metricsSpec": [
    {
    "name": ”event_count",
    "type": "count"
    },
    {
    "name": "apiResponseTime_min",
    "type": "longMin",
    "fieldName": "apiresponsetime_normalized "
    },
    {
    "name": "apiResponseTime_max",
    "type": "longMax",
    "fieldName": "apiresponsetime_normalized"
    },
    {
    "name": "apiResponseTime_sum",
    "type": "longSum",
    "fieldName": "apiresponsetime_normalized"
    }
    ...
    ]

    View Slide

  15. 15
    Data Ingest Spec – Approximate Aggregates
    "metricsSpec": [
    {
    "name": "apiResponseTime",
    "type": "approxHistogramFold",
    "fieldName": "apiresponsetime_normalized"
    },
    {
    "name": "distinct_ids",
    "type": "hyperUnique",
    "fieldName": ”event.id"
    }
    ...
    ]

    View Slide

  16. 16
    Self-service data recency and ingest tasks load at peak
    Ingestion
    Depends On
    • Ingest broken down into tasks
    • Ingest must keep up at peak
    • Maintain task buffer for ingest spec changes
    In our case
    • 20 tasks per source (5m and 1h)
    • 40 total ingest tasks
    • 40 buffer tasks for ingest changes
    • 80 tasks for current load

    View Slide

  17. Depends On
    • Cardinality of the set
    • Width of rows
    • Replication
    • Retention
    In our case
    • 20M rows / hour
    • 2x replication
    • 300GB/day for 5 minute ingest
    • 180GB/day for 1 hour ingest
    • 100TB for 90 days + 1 year
    17
    Self-service data availability and storage requirements
    Storage

    View Slide

  18. • i3.8xlarge Data Nodes
    32 CPUs
    244GB RAM
    7TB disk
    • 20 Data Nodes
    Supports 200 ingest tasks
    5TB RAM
    128TB disk
    18
    AWS EC2 Footprint
    POC Infrastructure

    View Slide

  19. 19
    Query Performance – Time

    View Slide

  20. 20
    Query Performance – Rows Processed

    View Slide

  21. 21
    Cluster Operations

    View Slide

  22. 22
    AWS VPCs

    View Slide

  23. 23
    Imply Cloud Management UI
    Imply Cloud UI
    • Provisioning & deployment
    • Upgrades
    • Elastic scaling
    • Druid extensions
    • Clarity metrics

    View Slide

  24. • Cloudwatch metrics
    24
    Running it right means running it well
    Monitoring and Alerting

    View Slide

  25. 25
    Identified next steps on the road to full production status
    Druid POC follow up
    • Upgrades
    • Historical production backfill
    • Expected improvements
    • Security
    • User adoption phases

    View Slide

  26. T H A N K Y O U .
    Questions?
    26

    View Slide