Streaming data pipelines @ Slack

Streaming data pipelines @ Slack

Slack data platform evolves from the batch system to near real-time. I will also touch base on how Samza helps us to build low latency data pipelines & Experimentation framework.

2c4b23630d3e6ee69efb4db16186d266?s=128

Ananth Packkildurai

December 04, 2017
Tweet

Transcript

  1. Ananth Packkildurai 1 Streaming data pipeline @ Slack

  2. Public launch: 2014 800+ employees across 7 countries worldwide HQ

    in San Francisco Diverse set of industries including software/technology, retail, media, telecom and professional services. About Slack
  3. An unprecedented adoption rate

  4. Agenda 1. A bit history 2. NRT infrastructure & Use

    cases 3. Challenges
  5. A bit history

  6. March 2016 5 350+ 2M Data Engineers Slack employees Active

    users
  7. October 2017 10 800+ 6M Data Engineers Slack employees Active

    users
  8. Data usage 1 in 3 per week 500+ tables 400k

    access data warehouse Tables Events per sec at peak
  9. It is all about Slogs

  10. Well, not exactly

  11. Slog

  12. Slog

  13. NRT infrastructure & usecases

  14. What can go wrong?

  15. We want more...

  16. Performance & Experimentation • Engineering & CE team should be

    able to detect the performance bottleneck proactively. • Engineers should be able to see their experimentation performance in near real-time.
  17. Near Real time Pipeline

  18. Keep the load in DW Kafka predictable. More comfortable to

    upgrade and verify newer Kafka version. Smaller Kafka cluster is relatively more straightforward to operate. Why Analytics Kafka
  19. Samza pipeline design

  20. • Content-based Router • Router: deserialize Kafka events and add

    instrumentation. • Processor: The processor represents abstraction for a streaming job. Add sink operation and instrumentation. • Converter: implements business logic (join, filter, projection etc) Samza pipeline design
  21. Performance monitoring

  22. • Approx percentile using Druid Histogram extension [http://jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf] • Unique

    users based on Druid HyperLogLog implementation • Slack bot integration to alert based on performance metrics Performance monitoring
  23. Experimentation framework

  24. • Both the Streams hash partitioned by Team & User

    • RocksDB Store exposure table (team_users_experimentation mapping). • Metrics events range join with exposure table. • A periodic snapshot of RocksDB to quality check with batch system Experimentation Framework
  25. Challenges

  26. Cascading failures

  27. Version mismatch among samza, kafka, scala & pants build

  28. Streaming Metrics Adoption

  29. Multi-Instance kafka clusters?

  30. Bridge the gap between batch and realtime tables.

  31. Thank You! 31