3 Supports 30+ consumer facing applications and portals Allows us to provide the best experience possible to our customers The data is only as good as the platform (and vice-versa) Product Data Platform
Data Use Cases 6 We provide three core data access tiers - real-time aggregate, real-time raw, traditional D/W - each with a designated set of use-cases Use Case Realtime Dashboards X X X Alerting X X X A/B testing X X X Anomaly ML X Scheduled business reporting X Data Science X Developer tooling X X Self-service UI X X X Self-service AdHoc X X X
12 Self-service goal drives the data design Data Ingest Spec Druid Data Design 5 minute granularity for 90 days 1 hour granularity for 1+ years 60 dimensions 20 transforms 30 metrics
16 Self-service data recency and ingest tasks load at peak Ingestion Depends On • Ingest broken down into tasks • Ingest must keep up at peak • Maintain task buffer for ingest spec changes In our case • 20 tasks per source (5m and 1h) • 40 total ingest tasks • 40 buffer tasks for ingest changes • 80 tasks for current load
Depends On • Cardinality of the set • Width of rows • Replication • Retention In our case • 20M rows / hour • 2x replication • 300GB/day for 5 minute ingest • 180GB/day for 1 hour ingest • 100TB for 90 days + 1 year 17 Self-service data availability and storage requirements Storage
25 Identified next steps on the road to full production status Druid POC follow up • Upgrades • Historical production backfill • Expected improvements • Security • User adoption phases