Slide 1

Slide 1 text

Event Streaming Fundamentals with Apache Kafka Keith Resar Sr. Kafka Developer @KeithResar

Slide 2

Slide 2 text

Data-Driven Operations

Slide 3

Slide 3 text

Data-Driven Operations

Slide 4

Slide 4 text

Data-Driven Operations

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

@KeithResar

Slide 7

Slide 7 text

@KeithResar

Slide 8

Slide 8 text

The Rise of Event Streaming 2010 Apache Kafka created at LinkedIn 2022 Most fortune 100 companies trust and use Kafka

Slide 9

Slide 9 text

A company is built on _DATA FLOWS_ but all we have are _DATA STORES_

Slide 10

Slide 10 text

Example Application Architecture Serving Layer (Microservices, Elastic, etc.) Java Apps with Kafka Streams or ksqlDB Continuous Computation High-Throughput Event Streaming Platform API-Based Clustering @KeithResar

Slide 11

Slide 11 text

Apache Kafka is an Event Streaming Platform 1. Storage 2. Pub / Sub 3. Processing @KeithResar

Slide 12

Slide 12 text

Storage 12 @KeithResar

Slide 13

Slide 13 text

Core Abstractions @KeithResar • DB → table • Hadoop → file • Kafka - ?

Slide 14

Slide 14 text

LOG

Slide 15

Slide 15 text

Immutable Event Log New Messages are added at the end of the log Old @KeithResar

Slide 16

Slide 16 text

Messages are KV Bytes key: byte[] value: byte[] Headers => [Header] @KeithResar

Slide 17

Slide 17 text

Messages Inside Topics Clicks Orders Customers Topics are similar to database tables @KeithResar

Slide 18

Slide 18 text

Topics divide into Partitions Messages are guaranteed to be strictly ordered within a partition @KeithResar P 0 Clicks P 1 P 2

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

Pub / Sub 20 @KeithResar

Slide 21

Slide 21 text

Producing Data New Messages are added at the end of the log Old @KeithResar

Slide 22

Slide 22 text

Consuming Data New Consume via sequential data access starting from a specific offset. Old @KeithResar Read to offset & scan

Slide 23

Slide 23 text

Distinct Consumer Positions New Old @KeithResar Sally offset 12 Fred offset 3 Rick offset 9

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

Messages are KV Bytes key: byte[] value: byte[] Headers => [Header] @KeithResar

Slide 26

Slide 26 text

Producing to Kafka - No Key @KeithResar P 0 P 1 P 2 P 3 Messages will be produced in a round robin fashion

Slide 27

Slide 27 text

Producing to Kafka - No Key @KeithResar P 0 P 1 P 2 P 3 Messages will be produced in a round robin fashion

Slide 28

Slide 28 text

Producing to Kafka - With Key @KeithResar P 0 P 1 P 2 P 3 hash(key) % numPartitions = N

Slide 29

Slide 29 text

Producing to Kafka - With Key @KeithResar P 0 P 1 P 2 P 3 hash(key) % numPartitions = N

Slide 30

Slide 30 text

Consumer from Kafka - Single @KeithResar P 0 P 1 P 2 P 3 Single consumer reads from all partitions

Slide 31

Slide 31 text

Consumer from Kafka - Multiple @KeithResar P 0 P 1 P 2 P 3 Consumers can be split into multiple groups each of which operate in isolation

Slide 32

Slide 32 text

CONSUMER GROUP COORDINATOR CONSUMERS CONSUMER GROUP

Slide 33

Slide 33 text

Consumer from Kafka - Multiple @KeithResar P 0 P 1 P 2 P 3 Consumers can be split into multiple groups each of which operate in isolation

Slide 34

Slide 34 text

Consumer from Kafka - Multiple @KeithResar P 0 P 1 P 2 P 3 Consumers can be split into multiple groups each of which operate in isolation

Slide 35

Slide 35 text

Grouped Consumers @KeithResar P 0 P 1 P 2 P 3 Consumers can be split into multiple groups each of which operate in isolation

Slide 36

Slide 36 text

Grouped Consumers @KeithResar P 0 P 1 P 2 P 3 Consumers can be split into multiple groups each of which operate in isolation X

Slide 37

Slide 37 text

No content

Slide 38

Slide 38 text

Linearly Scalable Architecture @KeithResar Producers ● Many producers machines ● Many consumer machines ● Many Broker machines Consumers Single topic, No Bottleneck!

Slide 39

Slide 39 text

Replicate for Fault Tolerance @KeithResar Broker A Broker B Message ✓ Leader Replicate

Slide 40

Slide 40 text

Partition Leadership / Replication @KeithResar Broker 1 Broker 2 Broker 3 Broker 4 P 0 P 1 P 2 P 3 Partition 0 Partition 2 Partition 3 Partition 0 Partition 1 Partition 3 Partition 0 Partition 1 Partition 2 Partition 1 Partition 2 Partition 3 Follower Leader

Slide 41

Slide 41 text

Replication Provides Resiliency @KeithResar Producers Consumers Replica followers become leaders on machine failure X X X X X

Slide 42

Slide 42 text

Partition Leadership / Replication @KeithResar Broker 1 Broker 2 Broker 3 Broker 4 P 0 P 1 P 2 P 3 Partition 0 Partition 2 Partition 3 Partition 0 Partition 1 Partition 3 Partition 0 Partition 1 Partition 2 Partition 1 Partition 2 Partition 3 Follower Leader

Slide 43

Slide 43 text

Partition Leadership / Replication @KeithResar Broker 1 Broker 2 Broker 3 Broker 4 P 0 P 1 P 2 P 3 Partition 0 Partition 2 Partition 3 Partition 0 Partition 1 Partition 3 Partition 0 Partition 1 Partition 2 Partition 1 Partition 2 Partition 3 Follower Leader

Slide 44

Slide 44 text

Partition Leadership / Replication @KeithResar Broker 1 Broker 2 Broker 3 Broker 4 P 0 P 1 P 2 P 3 Partition 0 Partition 2 Partition 3 Partition 0 Partition 1 Partition 3 Partition 0 Partition 1 Partition 2 Partition 1 Partition 2 Partition 3 Follower Leader

Slide 45

Slide 45 text

Partition Leadership / Replication @KeithResar Broker 1 Broker 2 Broker 3 Broker 4 P 0 P 1 P 2 P 3 Partition 0 Partition 2 Partition 3 Partition 0 Partition 1 Partition 3 Partition 0 Partition 1 Partition 2 Partition 1 Partition 2 Partition 3 Follower Leader Partition 2 Partition 1 Partition 3

Slide 46

Slide 46 text

Partition Leadership / Replication @KeithResar Broker 1 Broker 2 Broker 3 Broker 4 P 0 P 1 P 2 P 3 Partition 0 Partition 2 Partition 3 Partition 0 Partition 1 Partition 3 Partition 0 Partition 1 Partition 2 Follower Leader Partition 2 Partition 1 Partition 3

Slide 47

Slide 47 text

No content

Slide 48

Slide 48 text

The log is a type of durable messaging system @KeithResar Similar to a traditional messaging system (ActiveMQ, Rabbit, etc.) but with: • Far better scalability • Built-in fault tolerance/HA • Storage

Slide 49

Slide 49 text

No content

Slide 50

Slide 50 text

Origins in Stream Processing Serving Layer (Microservices, Elastic, etc.) Java Apps with Kafka Streams or ksqlDB Continuous Computation High-Throughput Event Streaming Platform API-Based Clustering

Slide 51

Slide 51 text

Processing 51 @KeithResar

Slide 52

Slide 52 text

Streaming is the toolset for working with events as they move! @KeithResar

Slide 53

Slide 53 text

What is stream processing? @KeithResar auth attempts possible fraud

Slide 54

Slide 54 text

What is stream processing? @KeithResar User Population Coding Sophistication Core developers who use Java/Scala Core developers who don’t use Java/Scala Data engineers, architects, DevOps/SRE BI analysts streams

Slide 55

Slide 55 text

Standing on the Shoulders of Streaming Giants Producer, Consumer APIs Kafka Streams ksqlDB Ease of use Flexibility ksqlDB UDFs Powered by Powered by

Slide 56

Slide 56 text

What is stream processing? @KeithResar CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3;

Slide 57

Slide 57 text

What is stream processing? @KeithResar CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3;

Slide 58

Slide 58 text

What is stream processing? @KeithResar CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3;

Slide 59

Slide 59 text

What is stream processing? @KeithResar CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3;

Slide 60

Slide 60 text

What is stream processing? @KeithResar CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3;

Slide 61

Slide 61 text

What is stream processing? @KeithResar CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3;

Slide 62

Slide 62 text

What is stream processing? @KeithResar CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3;

Slide 63

Slide 63 text

No content

Slide 64

Slide 64 text

Wrap Up 64 @KeithResar

Slide 65

Slide 65 text

developer.confluent.io Learn Kafka. Start building with Apache Kafka at Confluent Developer.

Slide 66

Slide 66 text

Free eBooks Designing Event-Driven Systems Ben Stopford Kafka: The Definitive Guide Neha Narkhede, Gwen Shapira, Todd Palino Making Sense of Stream Processing Martin Kleppmann I ❤ Logs Jay Kreps http://cnfl.io/book-bundle

Slide 67

Slide 67 text

No content

Slide 68

Slide 68 text

Thank You @KeithResar Kafka Developer confluent.io