Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Event streaming fundamentals with Apache Kafka
Search
Keith Resar
February 24, 2022
Technology
1
490
Event streaming fundamentals with Apache Kafka
Keith Resar
February 24, 2022
Tweet
Share
More Decks by Keith Resar
See All by Keith Resar
Real-Time Data Transformation by Example
keithresar
0
60
Exactly-Once Semantics and Transactions in Kafka
keithresar
0
130
Implementing Strangler pattern for microservices migrations
keithresar
0
380
Stream processing with ksqlDB and Apache Kafka
keithresar
1
410
How Nagios is leveraging Ansible Network Automation
keithresar
1
85
Automating Satellite Installation and Configuration With the Ansible Foreman Modules
keithresar
1
700
Writing your first Ansible operator for OpenShift
keithresar
1
220
Intro to CI/CD in GitLab and Anatomy of a Pipeline
keithresar
2
370
Ansible Ecosystem Future Directions
keithresar
0
160
Other Decks in Technology
See All in Technology
NOT A HOTEL SOFTWARE DECK (2025/11/06)
notahotel
0
3.4k
設計に疎いエンジニアでも始めやすいアーキテクチャドキュメント
phaya72
29
20k
Boxを“使われる場”にする統制と自動化の仕組み
demaecan
0
210
データエンジニアとして生存するために 〜界隈を盛り上げる「お祭り」が必要な理由〜 / data_summit_findy_Session_1
sansan_randd
1
990
How Fast Is Fast Enough? [PerfNow 2025]
tammyeverts
2
280
Kotlinで型安全にバイテンポラルデータを扱いたい! ReladomoラッパーをAIと実装してみた話
itohiro73
3
290
進化する大規模言語モデル評価: Swallowプロジェクトにおける実践と知見
chokkan
PRO
3
480
AWS IAM Identity Centerによる権限設定をグラフ構造で可視化+グラフRAGへの挑戦
ykimi
2
380
[AWS 秋のオブザーバビリティ祭り 2025 〜最新アップデートと生成 AI × オブザーバビリティ〜] Amazon Bedrock AgentCore で実現!お手軽 AI エージェントオブザーバビリティ
0nihajim
2
380
Findy Data Engineering Summit 2025/11/06 mybest
jonnojun
0
110
CLIPでマルチモーダル画像検索 →とても良い
wm3
2
810
QAEが生成AIと越える、ソフトウェア開発の境界線
rinchsan
0
520
Featured
See All Featured
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
234
17k
VelocityConf: Rendering Performance Case Studies
addyosmani
333
24k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
132
19k
Optimizing for Happiness
mojombo
379
70k
Become a Pro
speakerdeck
PRO
29
5.6k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
35
3.2k
Side Projects
sachag
455
43k
Building a Scalable Design System with Sketch
lauravandoore
463
33k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.6k
The Cost Of JavaScript in 2023
addyosmani
55
9.1k
Java REST API Framework Comparison - PWX 2021
mraible
34
8.9k
Build your cross-platform service in a week with App Engine
jlugia
234
18k
Transcript
Event Streaming Fundamentals with Apache Kafka Keith Resar Sr. Kafka
Developer @KeithResar
Data-Driven Operations
Data-Driven Operations
Data-Driven Operations
None
@KeithResar
@KeithResar
The Rise of Event Streaming 2010 Apache Kafka created at
LinkedIn 2022 Most fortune 100 companies trust and use Kafka
A company is built on _DATA FLOWS_ but all we
have are _DATA STORES_
Example Application Architecture Serving Layer (Microservices, Elastic, etc.) Java Apps
with Kafka Streams or ksqlDB Continuous Computation High-Throughput Event Streaming Platform API-Based Clustering @KeithResar
Apache Kafka is an Event Streaming Platform 1. Storage 2.
Pub / Sub 3. Processing @KeithResar
Storage 12 @KeithResar
Core Abstractions @KeithResar • DB → table • Hadoop →
file • Kafka - ?
LOG
Immutable Event Log New Messages are added at the end
of the log Old @KeithResar
Messages are KV Bytes key: byte[] value: byte[] Headers =>
[Header] @KeithResar
Messages Inside Topics Clicks Orders Customers Topics are similar to
database tables @KeithResar
Topics divide into Partitions Messages are guaranteed to be strictly
ordered within a partition @KeithResar P 0 Clicks P 1 P 2
None
Pub / Sub 20 @KeithResar
Producing Data New Messages are added at the end of
the log Old @KeithResar
Consuming Data New Consume via sequential data access starting from
a specific offset. Old @KeithResar Read to offset & scan
Distinct Consumer Positions New Old @KeithResar Sally offset 12 Fred
offset 3 Rick offset 9
None
Messages are KV Bytes key: byte[] value: byte[] Headers =>
[Header] @KeithResar
Producing to Kafka - No Key @KeithResar P 0 P
1 P 2 P 3 Messages will be produced in a round robin fashion
Producing to Kafka - No Key @KeithResar P 0 P
1 P 2 P 3 Messages will be produced in a round robin fashion
Producing to Kafka - With Key @KeithResar P 0 P
1 P 2 P 3 hash(key) % numPartitions = N
Producing to Kafka - With Key @KeithResar P 0 P
1 P 2 P 3 hash(key) % numPartitions = N
Consumer from Kafka - Single @KeithResar P 0 P 1
P 2 P 3 Single consumer reads from all partitions
Consumer from Kafka - Multiple @KeithResar P 0 P 1
P 2 P 3 Consumers can be split into multiple groups each of which operate in isolation
CONSUMER GROUP COORDINATOR CONSUMERS CONSUMER GROUP
Consumer from Kafka - Multiple @KeithResar P 0 P 1
P 2 P 3 Consumers can be split into multiple groups each of which operate in isolation
Consumer from Kafka - Multiple @KeithResar P 0 P 1
P 2 P 3 Consumers can be split into multiple groups each of which operate in isolation
Grouped Consumers @KeithResar P 0 P 1 P 2 P
3 Consumers can be split into multiple groups each of which operate in isolation
Grouped Consumers @KeithResar P 0 P 1 P 2 P
3 Consumers can be split into multiple groups each of which operate in isolation X
None
Linearly Scalable Architecture @KeithResar Producers • Many producers machines •
Many consumer machines • Many Broker machines Consumers Single topic, No Bottleneck!
Replicate for Fault Tolerance @KeithResar Broker A Broker B Message
✓ Leader Replicate
Partition Leadership / Replication @KeithResar Broker 1 Broker 2 Broker
3 Broker 4 P 0 P 1 P 2 P 3 Partition 0 Partition 2 Partition 3 Partition 0 Partition 1 Partition 3 Partition 0 Partition 1 Partition 2 Partition 1 Partition 2 Partition 3 Follower Leader
Replication Provides Resiliency @KeithResar Producers Consumers Replica followers become leaders
on machine failure X X X X X
Partition Leadership / Replication @KeithResar Broker 1 Broker 2 Broker
3 Broker 4 P 0 P 1 P 2 P 3 Partition 0 Partition 2 Partition 3 Partition 0 Partition 1 Partition 3 Partition 0 Partition 1 Partition 2 Partition 1 Partition 2 Partition 3 Follower Leader
Partition Leadership / Replication @KeithResar Broker 1 Broker 2 Broker
3 Broker 4 P 0 P 1 P 2 P 3 Partition 0 Partition 2 Partition 3 Partition 0 Partition 1 Partition 3 Partition 0 Partition 1 Partition 2 Partition 1 Partition 2 Partition 3 Follower Leader
Partition Leadership / Replication @KeithResar Broker 1 Broker 2 Broker
3 Broker 4 P 0 P 1 P 2 P 3 Partition 0 Partition 2 Partition 3 Partition 0 Partition 1 Partition 3 Partition 0 Partition 1 Partition 2 Partition 1 Partition 2 Partition 3 Follower Leader
Partition Leadership / Replication @KeithResar Broker 1 Broker 2 Broker
3 Broker 4 P 0 P 1 P 2 P 3 Partition 0 Partition 2 Partition 3 Partition 0 Partition 1 Partition 3 Partition 0 Partition 1 Partition 2 Partition 1 Partition 2 Partition 3 Follower Leader Partition 2 Partition 1 Partition 3
Partition Leadership / Replication @KeithResar Broker 1 Broker 2 Broker
3 Broker 4 P 0 P 1 P 2 P 3 Partition 0 Partition 2 Partition 3 Partition 0 Partition 1 Partition 3 Partition 0 Partition 1 Partition 2 Follower Leader Partition 2 Partition 1 Partition 3
None
The log is a type of durable messaging system @KeithResar
Similar to a traditional messaging system (ActiveMQ, Rabbit, etc.) but with: • Far better scalability • Built-in fault tolerance/HA • Storage
None
Origins in Stream Processing Serving Layer (Microservices, Elastic, etc.) Java
Apps with Kafka Streams or ksqlDB Continuous Computation High-Throughput Event Streaming Platform API-Based Clustering
Processing 51 @KeithResar
Streaming is the toolset for working with events as they
move! @KeithResar
What is stream processing? @KeithResar auth attempts possible fraud
What is stream processing? @KeithResar User Population Coding Sophistication Core
developers who use Java/Scala Core developers who don’t use Java/Scala Data engineers, architects, DevOps/SRE BI analysts streams
Standing on the Shoulders of Streaming Giants Producer, Consumer APIs
Kafka Streams ksqlDB Ease of use Flexibility ksqlDB UDFs Powered by Powered by
What is stream processing? @KeithResar CREATE STREAM possible_fraud AS SELECT
card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3;
What is stream processing? @KeithResar CREATE STREAM possible_fraud AS SELECT
card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3;
What is stream processing? @KeithResar CREATE STREAM possible_fraud AS SELECT
card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3;
What is stream processing? @KeithResar CREATE STREAM possible_fraud AS SELECT
card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3;
What is stream processing? @KeithResar CREATE STREAM possible_fraud AS SELECT
card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3;
What is stream processing? @KeithResar CREATE STREAM possible_fraud AS SELECT
card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3;
What is stream processing? @KeithResar CREATE STREAM possible_fraud AS SELECT
card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3;
None
Wrap Up 64 @KeithResar
developer.confluent.io Learn Kafka. Start building with Apache Kafka at Confluent
Developer.
Free eBooks Designing Event-Driven Systems Ben Stopford Kafka: The Definitive
Guide Neha Narkhede, Gwen Shapira, Todd Palino Making Sense of Stream Processing Martin Kleppmann I ❤ Logs Jay Kreps http://cnfl.io/book-bundle
None
Thank You @KeithResar Kafka Developer confluent.io