Scaling big with Apache Kafka

Scaling big with Apache Kafka

Apache Kafka sits at the core of the modern scalable event driven architecture. It’s no longer used only as logging infrastructure, but as a core component in thousands of companies around the world. It has the unique capability to provide low-latency, fault-tolerant pipeline at scale that is very important for today’s world of big data. During this talk we’ll see what makes Apache Kafka perfect for the job. We’ll explore how to optimize it for throughput or for durability. And we’ll also go over the messaging semantics it provides. Last but not least, we’ll see how Apache Kafka can help us solve some everyday problems that we face when we build large scale systems in an elegant way.

Eb44761e0fb3a5ec8e23ec28048dd7a5?s=128

Nikolay Stoitsev

November 16, 2018
Tweet

Transcript

  1. None
  2. Going big with Apache Kafka Nikolay Stoitsev - Sr. Software

    Engineer @ Uber
  3. Kafka Cluster

  4. Brocker Brocker Brocker

  5. Topic Message Message Message Message Message Message

  6. Partition - ordered, immutable sequence Message Message Message Message Message

    Message Message Message Message Message Message Message Message Message Message Partition 0 Partition 1 Partition 2
  7. Offset Message Message Message Message Message Partition 0 Offset: 1

  8. Offset Message Message Message Message Message Partition 0 Offset: 2

  9. Multi Subscriber Message Message Message Message Message Offset: 5 Offset:

    3 Consumer: 2 Consumer: 1
  10. Broker Broker Broker P0 P0 P0 P1 P1 P1 P2

    P2 P2 Partitioned and Replicated
  11. Broker Broker Broker P0 P0 P0 P1 P1 P1 P2

    P2 P3 Fault Tolerant
  12. Broker Broker Broker P0 P0 P0 P1 P1 P1 P2

    P2 P2 Producers Producer Producer
  13. Broker Broker Broker P0 P0 P0 P1 P1 P1 P2

    P2 P2 Producer Producer Consumer Consumer Consumer Group 1 Consumer Consumer Consumer Consumer Group 2 Consumers
  14. Broker Broker P0 P0 P1 P1 P2 P2 Producer Consumer

    Consumer Consumer Group 1 ZooKeeper Get Broker ID Update Offset
  15. None
  16. System for “collecting and delivering high volumes of log data

    with low latency”
  17. Logging Kafka ELK Hadoop

  18. Publish - Subscribe Or Kafka for interservice communication

  19. Good throughput

  20. Built-in partitioning, replication, and fault-tolerance

  21. Durability

  22. Latency vs. Durability

  23. One leader for every partition Follower Leader Follower Follower Producer

    Consumer
  24. In-sync Replicas Follower Leader Follower Follower Producer Consumer

  25. In-sync Replicas Follower Leader Follower Follower Producer Consumer

  26. In-sync Replicas Leader Follower Follower Producer Consumer

  27. Tune for lower latency • Acknowledgement after persisted on the

    leader • Can lost message on leadership changes • At-most-once semantic
  28. Tune for durability • Acknowledgement after persisted on all ISR

    (after committed) • No data loss • At-least-once sematic
  29. At-most-once cluster for logging

  30. At-least-one cluster for message bus

  31. Kafka as a message bus Kafka Upstream Downstream

  32. Failure isolation

  33. Message queueing

  34. Event driven architecture

  35. Avro

  36. Kafka Upstream Downstrea m Schema Registry Publish Schema Fetch Schema

    https://docs.confluent.io/current/schema-registry/docs/index.html
  37. How to make sure something is durably stored and will

    be processed exactly once?
  38. None
  39. Process 1 OS Process 2 OS

  40. Process 1 OS Process 2 OS

  41. Process 1 OS Process 2 OS

  42. Process 1 OS Process 2 OS

  43. Process 1 OS Process 2 OS

  44. Idempotency

  45. Idempotency + at least once delivery Process 1 Process 2

    Kafka Consumer
  46. Out of the box exactly once delivery after 0.11

  47. https://www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it/

  48. Order Service Kafka Payment Consumer Payment Provider Handling Failures

  49. Order Service Kafka Payment Consumer Payment Provider Clogged processing

  50. Order Service Kafka Payment Consumer Payment Provider Dead Letter Queue

    Payment Retry 0 Payment Retry 1 Dead Letter Queue
  51. Order Service Kafka Payment Consumer Payment Provider Dead Letter Queue

    Payment Retry 0 Payment Retry 1 Dead Letter Queue Payment Consumer Payment Consumer
  52. Multi Data Center Application

  53. Regional Cluster Local Producer Regional Kafka DC1 Local Producer Regional

    Kafka DC2
  54. Aggregated Cluster Local Producer Regional Kafka DC1 Local Producer Regional

    Kafka DC2 Kafka Replicator Aggregated Kafka DC3
  55. https://github.com/uber/uReplicator

  56. https://github.com/confluentinc/kafka-rest

  57. Kafka Upstream Downstream Kafka REST Proxy Kafka REST Proxy

  58. How to monitor Kafka?

  59. https://github.com/uber/chaperone

  60. Detect data loss, lag and duplication

  61. Audit Library Regional Kafka Service Kafka REST Proxy Audit Library

    Audit Library
  62. Chaperone Service Regional Kafka Service Kafka REST Proxy Audit Library

    Audit Library Chaperone Service
  63. Chaperone Collector Regional Kafka Service Kafka REST Proxy Audit Library

    Audit Library Aggregate Kafka Chaperone Service Chaperone Service Chaperone Collector DB
  64. None
  65. None
  66. None
  67. Summary Tune for durability

  68. Summary Tune for durability Define Avro Schemas

  69. Summary Tune for durability Define Avro Schemas Use Kafka REST

    Proxy
  70. Summary Tune for durability Define Avro Schemas Use Kafka REST

    Proxy Add idempotency checks
  71. Summary Tune for durability Define Avro Schemas Use Kafka REST

    Proxy Add idempotency checks Use Dead Letter Queue
  72. Summary Tune for durability Define Avro Schemas Use Kafka REST

    Proxy Add idempotency checks Use Dead Letter Queue Monitor everything
  73. THANK YOU! Nikolay Stoitsev Sr. Software Engineer @ Uber