Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Event streaming fundamentals with Apache Kafka

Keith Resar
February 24, 2022

Event streaming fundamentals with Apache Kafka

Keith Resar

February 24, 2022
Tweet

More Decks by Keith Resar

Other Decks in Technology

Transcript

  1. The Rise of Event Streaming 2010 Apache Kafka created at

    LinkedIn 2022 Most fortune 100 companies trust and use Kafka
  2. Example Application Architecture Serving Layer (Microservices, Elastic, etc.) Java Apps

    with Kafka Streams or ksqlDB Continuous Computation High-Throughput Event Streaming Platform API-Based Clustering @KeithResar
  3. Apache Kafka is an Event Streaming Platform 1. Storage 2.

    Pub / Sub 3. Processing @KeithResar
  4. LOG

  5. Topics divide into Partitions Messages are guaranteed to be strictly

    ordered within a partition @KeithResar P 0 Clicks P 1 P 2
  6. Consuming Data New Consume via sequential data access starting from

    a specific offset. Old @KeithResar Read to offset & scan
  7. Producing to Kafka - No Key @KeithResar P 0 P

    1 P 2 P 3 Messages will be produced in a round robin fashion
  8. Producing to Kafka - No Key @KeithResar P 0 P

    1 P 2 P 3 Messages will be produced in a round robin fashion
  9. Producing to Kafka - With Key @KeithResar P 0 P

    1 P 2 P 3 hash(key) % numPartitions = N
  10. Producing to Kafka - With Key @KeithResar P 0 P

    1 P 2 P 3 hash(key) % numPartitions = N
  11. Consumer from Kafka - Single @KeithResar P 0 P 1

    P 2 P 3 Single consumer reads from all partitions
  12. Consumer from Kafka - Multiple @KeithResar P 0 P 1

    P 2 P 3 Consumers can be split into multiple groups each of which operate in isolation
  13. Consumer from Kafka - Multiple @KeithResar P 0 P 1

    P 2 P 3 Consumers can be split into multiple groups each of which operate in isolation
  14. Consumer from Kafka - Multiple @KeithResar P 0 P 1

    P 2 P 3 Consumers can be split into multiple groups each of which operate in isolation
  15. Grouped Consumers @KeithResar P 0 P 1 P 2 P

    3 Consumers can be split into multiple groups each of which operate in isolation
  16. Grouped Consumers @KeithResar P 0 P 1 P 2 P

    3 Consumers can be split into multiple groups each of which operate in isolation X
  17. Linearly Scalable Architecture @KeithResar Producers • Many producers machines •

    Many consumer machines • Many Broker machines Consumers Single topic, No Bottleneck!
  18. Partition Leadership / Replication @KeithResar Broker 1 Broker 2 Broker

    3 Broker 4 P 0 P 1 P 2 P 3 Partition 0 Partition 2 Partition 3 Partition 0 Partition 1 Partition 3 Partition 0 Partition 1 Partition 2 Partition 1 Partition 2 Partition 3 Follower Leader
  19. Partition Leadership / Replication @KeithResar Broker 1 Broker 2 Broker

    3 Broker 4 P 0 P 1 P 2 P 3 Partition 0 Partition 2 Partition 3 Partition 0 Partition 1 Partition 3 Partition 0 Partition 1 Partition 2 Partition 1 Partition 2 Partition 3 Follower Leader
  20. Partition Leadership / Replication @KeithResar Broker 1 Broker 2 Broker

    3 Broker 4 P 0 P 1 P 2 P 3 Partition 0 Partition 2 Partition 3 Partition 0 Partition 1 Partition 3 Partition 0 Partition 1 Partition 2 Partition 1 Partition 2 Partition 3 Follower Leader
  21. Partition Leadership / Replication @KeithResar Broker 1 Broker 2 Broker

    3 Broker 4 P 0 P 1 P 2 P 3 Partition 0 Partition 2 Partition 3 Partition 0 Partition 1 Partition 3 Partition 0 Partition 1 Partition 2 Partition 1 Partition 2 Partition 3 Follower Leader
  22. Partition Leadership / Replication @KeithResar Broker 1 Broker 2 Broker

    3 Broker 4 P 0 P 1 P 2 P 3 Partition 0 Partition 2 Partition 3 Partition 0 Partition 1 Partition 3 Partition 0 Partition 1 Partition 2 Partition 1 Partition 2 Partition 3 Follower Leader Partition 2 Partition 1 Partition 3
  23. Partition Leadership / Replication @KeithResar Broker 1 Broker 2 Broker

    3 Broker 4 P 0 P 1 P 2 P 3 Partition 0 Partition 2 Partition 3 Partition 0 Partition 1 Partition 3 Partition 0 Partition 1 Partition 2 Follower Leader Partition 2 Partition 1 Partition 3
  24. The log is a type of durable messaging system @KeithResar

    Similar to a traditional messaging system (ActiveMQ, Rabbit, etc.) but with: • Far better scalability • Built-in fault tolerance/HA • Storage
  25. Origins in Stream Processing Serving Layer (Microservices, Elastic, etc.) Java

    Apps with Kafka Streams or ksqlDB Continuous Computation High-Throughput Event Streaming Platform API-Based Clustering
  26. What is stream processing? @KeithResar User Population Coding Sophistication Core

    developers who use Java/Scala Core developers who don’t use Java/Scala Data engineers, architects, DevOps/SRE BI analysts streams
  27. Standing on the Shoulders of Streaming Giants Producer, Consumer APIs

    Kafka Streams ksqlDB Ease of use Flexibility ksqlDB UDFs Powered by Powered by
  28. What is stream processing? @KeithResar CREATE STREAM possible_fraud AS SELECT

    card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3;
  29. What is stream processing? @KeithResar CREATE STREAM possible_fraud AS SELECT

    card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3;
  30. What is stream processing? @KeithResar CREATE STREAM possible_fraud AS SELECT

    card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3;
  31. What is stream processing? @KeithResar CREATE STREAM possible_fraud AS SELECT

    card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3;
  32. What is stream processing? @KeithResar CREATE STREAM possible_fraud AS SELECT

    card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3;
  33. What is stream processing? @KeithResar CREATE STREAM possible_fraud AS SELECT

    card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3;
  34. What is stream processing? @KeithResar CREATE STREAM possible_fraud AS SELECT

    card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3;
  35. Free eBooks Designing Event-Driven Systems Ben Stopford Kafka: The Definitive

    Guide Neha Narkhede, Gwen Shapira, Todd Palino Making Sense of Stream Processing Martin Kleppmann I ❤ Logs Jay Kreps http://cnfl.io/book-bundle