Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Reliable_Event_Pipeline___scale.pdf

 Reliable_Event_Pipeline___scale.pdf

2c4b23630d3e6ee69efb4db16186d266?s=128

Ananth Packkildurai

February 27, 2019
Tweet

Transcript

  1. Ananth Packkildurai February 27, 2019 1 Reliable Events Pipeline

  2. Events “An event is a single occurrence within an environment,

    usually involving an attempted state change.”
  3. Logs “A log is a collection of event records”

  4. Logs @ Slack 2M 4 3TB Events per second Kafka

    clusters Per hour
  5. Me ➢ @ananthdurai ➢ Data Infrastructure Engineer @ Slack ➢

    Passionate about all things related to ethical data management
  6. Team REP Derek Smith Jackson Argo

  7. Public launch: 2014 1000+ employees across 7 countries worldwide HQ

    in San Francisco $841M in capital raised Key investors include Softbank, Accel, a16z, Social Capital, Index, Thrive, GV, Kleiner Perkins, GGV, Horizons, Spark, IVP and DST. Diverse set of industries including software/technology, retail, media, telecom and professional services. About Slack
  8. An unprecedented adoption rate

  9. Data Decisions

  10. Growth Metrics

  11. Service Quality Metrics

  12. Billing Metrics

  13. How did we start?

  14. Is it reliable?

  15. REP Characteristics Trust in Logs

  16. REP Characteristics Trust in Logs High Availability

  17. REP Characteristics Trust in Logs High Availability Low Latency

  18. Efficient REP Characteristics Trust in Logs High Availability Low Latency

  19. Efficient REP Characteristics Trust in Logs High Availability Low Latency

  20. REP pipeline

  21. Murron: Murron is a sidecar running per instance based, collecting

    logs from host and containers • Guarantee at least once message delivery • Support retry, back pressure and configurable dynamic routing • Support Grpc, TCP, Http & unix domain protocol Murron logging agent
  22. Murron Protocol

  23. UID

  24. Message Signature

  25. Container

  26. Log correctness Did we log correctly? Measuring Reliability Log reliability

    Are we missing any data?
  27. Log reliability

  28. Log reliability

  29. Log Inspector

  30. Pinot is a realtime distributed OLAP datastore • A column-oriented

    database with various compression schemes such as Run Length, Fixed Bit Length • Pluggable indexing technologies - Sorted Index, Bitmap Index, Inverted Index • Near real time ingestion from Kafka and batch ingestion from Hadoop • SQL like language that supports selection, aggregation, filtering, group by, order by, distinct queries on fact data. • Horizontally scalable and fault tolerant Apache Pinot
  31. REP extended

  32. Log Inspector

  33. Thank You! 33 For more information go to: slack.com