Presentation by Niketh Sabbineni, Principal Engineer @ Yahoo!, for the San Francisco Bay Area Druid Meetup at Unity.
Niketh Sabbineni[email protected][email protected]Druid Ecosystem @ Yahoo
View Slide
Who am I2 Yahoo Confidential & Proprietary▪ Principal Engineer @ Yahoo▪ CTO @ Bookpad▪ SDE @ Amazon
Flurry Overview3 Yahoo Confidential & Proprietary▪ Measure → Analyse → Insights → Action▪ 1M Apps▪ 2.1B Devices▪ 100B+ Events (daily)▪ 10B Sessions (daily)▪ Raw data well over 20PB
Features4 Yahoo Confidential & Proprietary▪ Realtime▪ Crash▪ Technical▪ Audience▪ Retention▪ ....▪ ....▪ Free Free Free!
Why Druid ?5 Yahoo Confidential & Proprietary▪ Realtime + Batch▪ Horizontally Scalable▪ Sub Second Query Latency▪ Resilient to failures▪ Custom plugins
Architecture6 Yahoo Confidential & ProprietaryCollectorsHbaseStorm DruidKafkaMapReduce
Architecture7 Yahoo Confidential & ProprietaryCollectorsHbaseStorm DruidKafkaMapReduceDruidMetricsClusterUIProgrammaticAlertsHive PivotExternal
Architecture8 Yahoo Confidential & ProprietaryCollectorsHbaseStorm DruidkafkaMapReduceDruidMetricsClusterUIProgrammaticAlertsHive PivotCollectorsHbaseStorm DruidkafkaMapReduceReplicationExternal
Architecture9 Yahoo Confidential & Proprietary● 300 Historicals - 256GB Ram / 7TB SSD● 80 Middle Managers● HDFS / Kafka● 5 clusters in Flurry● 18 Clusters in Yahoo/Oath● Imply Pivot / Superset● Hive / SQL
Lessons Learnt10 Yahoo Confidential & Proprietary▪ Querying▪ Ingestion▪ Monitoring
Querying11 Yahoo Confidential & Proprietary▪ Column Types - String/Float/Double/Long/Custom▪ Heterogenous Nodes - Ensure constant ram/disk ratio▪ Cost Balancer - diskNormalized▪ Partitioning - Broker uses Shardspec & less paging▪ Spill to Disk▪ Sketch (Count Distinct) Size - Adjust sketch sizes▪ LimitSpec
Ingestion12 Yahoo Confidential & Proprietary▪ Query / Segment Granularity (50% space)▪ Staggered Runs with Replication (30% compute)▪ Partitioning - Call result in smaller segment sizes▪ Reindexing ( Dimensions )▪ Late Arriving Data - Periodic Backfills
Monitoring13 Yahoo Confidential & Proprietary▪ Health Checks - Processes / Disks / Ram▪ Querying - Time, Failure counts, GC Time, Paging Time▪ Ingestion Tasks - Ingest lag, Waiting task counts▪ Coordinator - Load Queue Size, Disk Size▪ SSL Certificate Expiry▪ Metrics Cluster▪ Use Pivot / Turnilo for root cause analysis▪ Kafka / HDFS - Name node storage
Q/A14 Yahoo Confidential & Proprietary▪ Niketh Sabbineni[email protected][email protected]▪ Ankit Kothari[email protected]
Flurry Demo15 Yahoo Confidential & Proprietary