Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Druid Ecosystem @ Yahoo!

Imply
April 02, 2019

Druid Ecosystem @ Yahoo!

Presentation by Niketh Sabbineni, Principal Engineer @ Yahoo!, for the San Francisco Bay Area Druid Meetup at Unity.

Imply

April 02, 2019
Tweet

More Decks by Imply

Other Decks in Technology

Transcript

  1. Who am I 2 Yahoo Confidential & Proprietary ▪ Principal

    Engineer @ Yahoo ▪ CTO @ Bookpad ▪ SDE @ Amazon
  2. Flurry Overview 3 Yahoo Confidential & Proprietary ▪ Measure →

    Analyse → Insights → Action ▪ 1M Apps ▪ 2.1B Devices ▪ 100B+ Events (daily) ▪ 10B Sessions (daily) ▪ Raw data well over 20PB
  3. Features 4 Yahoo Confidential & Proprietary ▪ Realtime ▪ Crash

    ▪ Technical ▪ Audience ▪ Retention ▪ .... ▪ .... ▪ Free Free Free!
  4. Why Druid ? 5 Yahoo Confidential & Proprietary ▪ Realtime

    + Batch ▪ Horizontally Scalable ▪ Sub Second Query Latency ▪ Resilient to failures ▪ Custom plugins
  5. Architecture 7 Yahoo Confidential & Proprietary Collectors Hbase Storm Druid

    Kafka Map Reduce Druid Metrics Cluster UI Programmatic Alerts Hive Pivot External
  6. Architecture 8 Yahoo Confidential & Proprietary Collectors Hbase Storm Druid

    kafka Map Reduce Druid Metrics Cluster UI Programmatic Alerts Hive Pivot Collectors Hbase Storm Druid kafka Map Reduce R e p li c a ti o n External
  7. Architecture 9 Yahoo Confidential & Proprietary • 300 Historicals -

    256GB Ram / 7TB SSD • 80 Middle Managers • HDFS / Kafka • 5 clusters in Flurry • 18 Clusters in Yahoo/Oath • Imply Pivot / Superset • Hive / SQL
  8. Querying 11 Yahoo Confidential & Proprietary ▪ Column Types -

    String/Float/Double/Long/Custom ▪ Heterogenous Nodes - Ensure constant ram/disk ratio ▪ Cost Balancer - diskNormalized ▪ Partitioning - Broker uses Shardspec & less paging ▪ Spill to Disk ▪ Sketch (Count Distinct) Size - Adjust sketch sizes ▪ LimitSpec
  9. Ingestion 12 Yahoo Confidential & Proprietary ▪ Query / Segment

    Granularity (50% space) ▪ Staggered Runs with Replication (30% compute) ▪ Partitioning - Call result in smaller segment sizes ▪ Reindexing ( Dimensions ) ▪ Late Arriving Data - Periodic Backfills
  10. Monitoring 13 Yahoo Confidential & Proprietary ▪ Health Checks -

    Processes / Disks / Ram ▪ Querying - Time, Failure counts, GC Time, Paging Time ▪ Ingestion Tasks - Ingest lag, Waiting task counts ▪ Coordinator - Load Queue Size, Disk Size ▪ SSL Certificate Expiry ▪ Metrics Cluster ▪ Use Pivot / Turnilo for root cause analysis ▪ Kafka / HDFS - Name node storage