Lock in $30 Savings on PRO—Offer Ends Soon! ⏳
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Event streaming fundamentals with Apache Kafka
Search
Keith Resar
February 24, 2022
Technology
1
500
Event streaming fundamentals with Apache Kafka
Keith Resar
February 24, 2022
Tweet
Share
More Decks by Keith Resar
See All by Keith Resar
Real-Time Data Transformation by Example
keithresar
0
61
Exactly-Once Semantics and Transactions in Kafka
keithresar
0
150
Implementing Strangler pattern for microservices migrations
keithresar
0
390
Stream processing with ksqlDB and Apache Kafka
keithresar
1
420
How Nagios is leveraging Ansible Network Automation
keithresar
1
90
Automating Satellite Installation and Configuration With the Ansible Foreman Modules
keithresar
1
710
Writing your first Ansible operator for OpenShift
keithresar
1
230
Intro to CI/CD in GitLab and Anatomy of a Pipeline
keithresar
2
380
Ansible Ecosystem Future Directions
keithresar
0
170
Other Decks in Technology
See All in Technology
.NET 10 のパフォーマンス改善
nenonaninu
2
4.7k
原理から解き明かす AIと人間の成長 - Progate BAR
teba_eleven
2
300
こがヘンだよ!Snowflake?サービス名称へのこだわり
tarotaro0129
0
110
AI 時代のデータ戦略
na0
8
3.2k
M5UnifiedとPicoRubyで楽しむM5シリーズ
kishima
0
110
MS Ignite 2025で発表されたFoundry IQをRecap
satodayo
3
230
モバイルゲーム開発におけるエージェント技術活用への試行錯誤 ~開発効率化へのアプローチの紹介と未来に向けた展望~
qualiarts
0
280
Databricksによるエージェント構築
taka_aki
1
120
ブロックテーマとこれからの WordPress サイト制作 / Toyama WordPress Meetup Vol.81
torounit
0
240
Ryzen NPUにおけるAI Engineプログラミング
anjn
0
210
How native lazy objects will change Doctrine and Symfony forever
beberlei
1
380
Claude Code Getting Started Guide(en)
oikon48
0
140
Featured
See All Featured
Designing for humans not robots
tammielis
254
26k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
31
2.7k
Designing for Performance
lara
610
69k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
34
2.5k
Building Flexible Design Systems
yeseniaperezcruz
329
39k
Visualization
eitanlees
150
16k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
55
3.1k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
4.1k
Music & Morning Musume
bryan
46
7k
The Language of Interfaces
destraynor
162
25k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
132
19k
Navigating Team Friction
lara
191
16k
Transcript
Event Streaming Fundamentals with Apache Kafka Keith Resar Sr. Kafka
Developer @KeithResar
Data-Driven Operations
Data-Driven Operations
Data-Driven Operations
None
@KeithResar
@KeithResar
The Rise of Event Streaming 2010 Apache Kafka created at
LinkedIn 2022 Most fortune 100 companies trust and use Kafka
A company is built on _DATA FLOWS_ but all we
have are _DATA STORES_
Example Application Architecture Serving Layer (Microservices, Elastic, etc.) Java Apps
with Kafka Streams or ksqlDB Continuous Computation High-Throughput Event Streaming Platform API-Based Clustering @KeithResar
Apache Kafka is an Event Streaming Platform 1. Storage 2.
Pub / Sub 3. Processing @KeithResar
Storage 12 @KeithResar
Core Abstractions @KeithResar • DB → table • Hadoop →
file • Kafka - ?
LOG
Immutable Event Log New Messages are added at the end
of the log Old @KeithResar
Messages are KV Bytes key: byte[] value: byte[] Headers =>
[Header] @KeithResar
Messages Inside Topics Clicks Orders Customers Topics are similar to
database tables @KeithResar
Topics divide into Partitions Messages are guaranteed to be strictly
ordered within a partition @KeithResar P 0 Clicks P 1 P 2
None
Pub / Sub 20 @KeithResar
Producing Data New Messages are added at the end of
the log Old @KeithResar
Consuming Data New Consume via sequential data access starting from
a specific offset. Old @KeithResar Read to offset & scan
Distinct Consumer Positions New Old @KeithResar Sally offset 12 Fred
offset 3 Rick offset 9
None
Messages are KV Bytes key: byte[] value: byte[] Headers =>
[Header] @KeithResar
Producing to Kafka - No Key @KeithResar P 0 P
1 P 2 P 3 Messages will be produced in a round robin fashion
Producing to Kafka - No Key @KeithResar P 0 P
1 P 2 P 3 Messages will be produced in a round robin fashion
Producing to Kafka - With Key @KeithResar P 0 P
1 P 2 P 3 hash(key) % numPartitions = N
Producing to Kafka - With Key @KeithResar P 0 P
1 P 2 P 3 hash(key) % numPartitions = N
Consumer from Kafka - Single @KeithResar P 0 P 1
P 2 P 3 Single consumer reads from all partitions
Consumer from Kafka - Multiple @KeithResar P 0 P 1
P 2 P 3 Consumers can be split into multiple groups each of which operate in isolation
CONSUMER GROUP COORDINATOR CONSUMERS CONSUMER GROUP
Consumer from Kafka - Multiple @KeithResar P 0 P 1
P 2 P 3 Consumers can be split into multiple groups each of which operate in isolation
Consumer from Kafka - Multiple @KeithResar P 0 P 1
P 2 P 3 Consumers can be split into multiple groups each of which operate in isolation
Grouped Consumers @KeithResar P 0 P 1 P 2 P
3 Consumers can be split into multiple groups each of which operate in isolation
Grouped Consumers @KeithResar P 0 P 1 P 2 P
3 Consumers can be split into multiple groups each of which operate in isolation X
None
Linearly Scalable Architecture @KeithResar Producers • Many producers machines •
Many consumer machines • Many Broker machines Consumers Single topic, No Bottleneck!
Replicate for Fault Tolerance @KeithResar Broker A Broker B Message
✓ Leader Replicate
Partition Leadership / Replication @KeithResar Broker 1 Broker 2 Broker
3 Broker 4 P 0 P 1 P 2 P 3 Partition 0 Partition 2 Partition 3 Partition 0 Partition 1 Partition 3 Partition 0 Partition 1 Partition 2 Partition 1 Partition 2 Partition 3 Follower Leader
Replication Provides Resiliency @KeithResar Producers Consumers Replica followers become leaders
on machine failure X X X X X
Partition Leadership / Replication @KeithResar Broker 1 Broker 2 Broker
3 Broker 4 P 0 P 1 P 2 P 3 Partition 0 Partition 2 Partition 3 Partition 0 Partition 1 Partition 3 Partition 0 Partition 1 Partition 2 Partition 1 Partition 2 Partition 3 Follower Leader
Partition Leadership / Replication @KeithResar Broker 1 Broker 2 Broker
3 Broker 4 P 0 P 1 P 2 P 3 Partition 0 Partition 2 Partition 3 Partition 0 Partition 1 Partition 3 Partition 0 Partition 1 Partition 2 Partition 1 Partition 2 Partition 3 Follower Leader
Partition Leadership / Replication @KeithResar Broker 1 Broker 2 Broker
3 Broker 4 P 0 P 1 P 2 P 3 Partition 0 Partition 2 Partition 3 Partition 0 Partition 1 Partition 3 Partition 0 Partition 1 Partition 2 Partition 1 Partition 2 Partition 3 Follower Leader
Partition Leadership / Replication @KeithResar Broker 1 Broker 2 Broker
3 Broker 4 P 0 P 1 P 2 P 3 Partition 0 Partition 2 Partition 3 Partition 0 Partition 1 Partition 3 Partition 0 Partition 1 Partition 2 Partition 1 Partition 2 Partition 3 Follower Leader Partition 2 Partition 1 Partition 3
Partition Leadership / Replication @KeithResar Broker 1 Broker 2 Broker
3 Broker 4 P 0 P 1 P 2 P 3 Partition 0 Partition 2 Partition 3 Partition 0 Partition 1 Partition 3 Partition 0 Partition 1 Partition 2 Follower Leader Partition 2 Partition 1 Partition 3
None
The log is a type of durable messaging system @KeithResar
Similar to a traditional messaging system (ActiveMQ, Rabbit, etc.) but with: • Far better scalability • Built-in fault tolerance/HA • Storage
None
Origins in Stream Processing Serving Layer (Microservices, Elastic, etc.) Java
Apps with Kafka Streams or ksqlDB Continuous Computation High-Throughput Event Streaming Platform API-Based Clustering
Processing 51 @KeithResar
Streaming is the toolset for working with events as they
move! @KeithResar
What is stream processing? @KeithResar auth attempts possible fraud
What is stream processing? @KeithResar User Population Coding Sophistication Core
developers who use Java/Scala Core developers who don’t use Java/Scala Data engineers, architects, DevOps/SRE BI analysts streams
Standing on the Shoulders of Streaming Giants Producer, Consumer APIs
Kafka Streams ksqlDB Ease of use Flexibility ksqlDB UDFs Powered by Powered by
What is stream processing? @KeithResar CREATE STREAM possible_fraud AS SELECT
card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3;
What is stream processing? @KeithResar CREATE STREAM possible_fraud AS SELECT
card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3;
What is stream processing? @KeithResar CREATE STREAM possible_fraud AS SELECT
card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3;
What is stream processing? @KeithResar CREATE STREAM possible_fraud AS SELECT
card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3;
What is stream processing? @KeithResar CREATE STREAM possible_fraud AS SELECT
card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3;
What is stream processing? @KeithResar CREATE STREAM possible_fraud AS SELECT
card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3;
What is stream processing? @KeithResar CREATE STREAM possible_fraud AS SELECT
card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3;
None
Wrap Up 64 @KeithResar
developer.confluent.io Learn Kafka. Start building with Apache Kafka at Confluent
Developer.
Free eBooks Designing Event-Driven Systems Ben Stopford Kafka: The Definitive
Guide Neha Narkhede, Gwen Shapira, Todd Palino Making Sense of Stream Processing Martin Kleppmann I ❤ Logs Jay Kreps http://cnfl.io/book-bundle
None
Thank You @KeithResar Kafka Developer confluent.io