Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Event streaming fundamentals with Apache Kafka
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Keith Resar
February 24, 2022
Technology
580
1
Share
Event streaming fundamentals with Apache Kafka
Keith Resar
February 24, 2022
More Decks by Keith Resar
See All by Keith Resar
Real-Time Data Transformation by Example
keithresar
0
110
Exactly-Once Semantics and Transactions in Kafka
keithresar
0
220
Implementing Strangler pattern for microservices migrations
keithresar
0
480
Stream processing with ksqlDB and Apache Kafka
keithresar
1
480
How Nagios is leveraging Ansible Network Automation
keithresar
1
120
Automating Satellite Installation and Configuration With the Ansible Foreman Modules
keithresar
1
770
Writing your first Ansible operator for OpenShift
keithresar
1
270
Intro to CI/CD in GitLab and Anatomy of a Pipeline
keithresar
2
420
Ansible Ecosystem Future Directions
keithresar
0
190
Other Decks in Technology
See All in Technology
速さだけじゃない! VoidZero ツールが移行先に選ばれる理由
mizdra
PRO
6
730
製造業のクラウド活用最適解〜AI,DXを加速するデータ基盤の作り方〜
hamadakoji
0
330
先取りMaven4 ~16年ぶりのメジャーアップデート、その進化とは?~
ogiwarat
0
140
AI Engineering Summit Tokyo 2026 AIの前に、やることがある 〜医療データ企業の4フェーズ〜
dtaniwaki
0
1.4k
AIガバナンス実践 - 生成AIコネクタのデータ漏洩リスクと実務対策
knishioka
0
170
ルールやカスタム機能、どう使う?理想の出力を引き出すために今知りたいIBM Bob 5つの機能
muehara
1
310
【5分でわかる】セーフィー エンジニア向け会社紹介
safie_recruit
0
50k
ChatworkとBPaaS 異なる特性で学んだAI機能開発の ベストプラクティス
kubell_hr
2
2.3k
プラットフォームエンジニア ワークショップ/ platform-workshop
databricksjapan
0
230
「気づいたら仕事が終わっている」バクラクAIエージェント本番運用の裏側 / layerx-bakuraku-aie2026
yuya4
18
9.1k
コードレビューを制するチームがソフトウェアデリバリーのフローを制す / Beyond Code Review: Distributing Its Responsibilities Across the SDLC
mtx2s
3
930
美味しいスイスチーズを作ろう🧀🐭
taigamikami
1
230
Featured
See All Featured
It's Worth the Effort
3n
188
29k
The Organizational Zoo: Understanding Human Behavior Agility Through Metaphoric Constructive Conversations (based on the works of Arthur Shelley, Ph.D)
kimpetersen
PRO
0
350
How to build an LLM SEO readiness audit: a practical framework
nmsamuel
1
770
SEOcharity - Dark patterns in SEO and UX: How to avoid them and build a more ethical web
sarafernandez
0
190
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
11
930
More Than Pixels: Becoming A User Experience Designer
marktimemedia
3
430
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
37
6.5k
A Soul's Torment
seathinner
6
2.9k
30 Presentation Tips
portentint
PRO
1
310
Java REST API Framework Comparison - PWX 2021
mraible
34
9.3k
Stewardship and Sustainability of Urban and Community Forests
pwiseman
0
220
Code Reviewing Like a Champion
maltzj
528
40k
Transcript
Event Streaming Fundamentals with Apache Kafka Keith Resar Sr. Kafka
Developer @KeithResar
Data-Driven Operations
Data-Driven Operations
Data-Driven Operations
None
@KeithResar
@KeithResar
The Rise of Event Streaming 2010 Apache Kafka created at
LinkedIn 2022 Most fortune 100 companies trust and use Kafka
A company is built on _DATA FLOWS_ but all we
have are _DATA STORES_
Example Application Architecture Serving Layer (Microservices, Elastic, etc.) Java Apps
with Kafka Streams or ksqlDB Continuous Computation High-Throughput Event Streaming Platform API-Based Clustering @KeithResar
Apache Kafka is an Event Streaming Platform 1. Storage 2.
Pub / Sub 3. Processing @KeithResar
Storage 12 @KeithResar
Core Abstractions @KeithResar • DB → table • Hadoop →
file • Kafka - ?
LOG
Immutable Event Log New Messages are added at the end
of the log Old @KeithResar
Messages are KV Bytes key: byte[] value: byte[] Headers =>
[Header] @KeithResar
Messages Inside Topics Clicks Orders Customers Topics are similar to
database tables @KeithResar
Topics divide into Partitions Messages are guaranteed to be strictly
ordered within a partition @KeithResar P 0 Clicks P 1 P 2
None
Pub / Sub 20 @KeithResar
Producing Data New Messages are added at the end of
the log Old @KeithResar
Consuming Data New Consume via sequential data access starting from
a specific offset. Old @KeithResar Read to offset & scan
Distinct Consumer Positions New Old @KeithResar Sally offset 12 Fred
offset 3 Rick offset 9
None
Messages are KV Bytes key: byte[] value: byte[] Headers =>
[Header] @KeithResar
Producing to Kafka - No Key @KeithResar P 0 P
1 P 2 P 3 Messages will be produced in a round robin fashion
Producing to Kafka - No Key @KeithResar P 0 P
1 P 2 P 3 Messages will be produced in a round robin fashion
Producing to Kafka - With Key @KeithResar P 0 P
1 P 2 P 3 hash(key) % numPartitions = N
Producing to Kafka - With Key @KeithResar P 0 P
1 P 2 P 3 hash(key) % numPartitions = N
Consumer from Kafka - Single @KeithResar P 0 P 1
P 2 P 3 Single consumer reads from all partitions
Consumer from Kafka - Multiple @KeithResar P 0 P 1
P 2 P 3 Consumers can be split into multiple groups each of which operate in isolation
CONSUMER GROUP COORDINATOR CONSUMERS CONSUMER GROUP
Consumer from Kafka - Multiple @KeithResar P 0 P 1
P 2 P 3 Consumers can be split into multiple groups each of which operate in isolation
Consumer from Kafka - Multiple @KeithResar P 0 P 1
P 2 P 3 Consumers can be split into multiple groups each of which operate in isolation
Grouped Consumers @KeithResar P 0 P 1 P 2 P
3 Consumers can be split into multiple groups each of which operate in isolation
Grouped Consumers @KeithResar P 0 P 1 P 2 P
3 Consumers can be split into multiple groups each of which operate in isolation X
None
Linearly Scalable Architecture @KeithResar Producers • Many producers machines •
Many consumer machines • Many Broker machines Consumers Single topic, No Bottleneck!
Replicate for Fault Tolerance @KeithResar Broker A Broker B Message
✓ Leader Replicate
Partition Leadership / Replication @KeithResar Broker 1 Broker 2 Broker
3 Broker 4 P 0 P 1 P 2 P 3 Partition 0 Partition 2 Partition 3 Partition 0 Partition 1 Partition 3 Partition 0 Partition 1 Partition 2 Partition 1 Partition 2 Partition 3 Follower Leader
Replication Provides Resiliency @KeithResar Producers Consumers Replica followers become leaders
on machine failure X X X X X
Partition Leadership / Replication @KeithResar Broker 1 Broker 2 Broker
3 Broker 4 P 0 P 1 P 2 P 3 Partition 0 Partition 2 Partition 3 Partition 0 Partition 1 Partition 3 Partition 0 Partition 1 Partition 2 Partition 1 Partition 2 Partition 3 Follower Leader
Partition Leadership / Replication @KeithResar Broker 1 Broker 2 Broker
3 Broker 4 P 0 P 1 P 2 P 3 Partition 0 Partition 2 Partition 3 Partition 0 Partition 1 Partition 3 Partition 0 Partition 1 Partition 2 Partition 1 Partition 2 Partition 3 Follower Leader
Partition Leadership / Replication @KeithResar Broker 1 Broker 2 Broker
3 Broker 4 P 0 P 1 P 2 P 3 Partition 0 Partition 2 Partition 3 Partition 0 Partition 1 Partition 3 Partition 0 Partition 1 Partition 2 Partition 1 Partition 2 Partition 3 Follower Leader
Partition Leadership / Replication @KeithResar Broker 1 Broker 2 Broker
3 Broker 4 P 0 P 1 P 2 P 3 Partition 0 Partition 2 Partition 3 Partition 0 Partition 1 Partition 3 Partition 0 Partition 1 Partition 2 Partition 1 Partition 2 Partition 3 Follower Leader Partition 2 Partition 1 Partition 3
Partition Leadership / Replication @KeithResar Broker 1 Broker 2 Broker
3 Broker 4 P 0 P 1 P 2 P 3 Partition 0 Partition 2 Partition 3 Partition 0 Partition 1 Partition 3 Partition 0 Partition 1 Partition 2 Follower Leader Partition 2 Partition 1 Partition 3
None
The log is a type of durable messaging system @KeithResar
Similar to a traditional messaging system (ActiveMQ, Rabbit, etc.) but with: • Far better scalability • Built-in fault tolerance/HA • Storage
None
Origins in Stream Processing Serving Layer (Microservices, Elastic, etc.) Java
Apps with Kafka Streams or ksqlDB Continuous Computation High-Throughput Event Streaming Platform API-Based Clustering
Processing 51 @KeithResar
Streaming is the toolset for working with events as they
move! @KeithResar
What is stream processing? @KeithResar auth attempts possible fraud
What is stream processing? @KeithResar User Population Coding Sophistication Core
developers who use Java/Scala Core developers who don’t use Java/Scala Data engineers, architects, DevOps/SRE BI analysts streams
Standing on the Shoulders of Streaming Giants Producer, Consumer APIs
Kafka Streams ksqlDB Ease of use Flexibility ksqlDB UDFs Powered by Powered by
What is stream processing? @KeithResar CREATE STREAM possible_fraud AS SELECT
card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3;
What is stream processing? @KeithResar CREATE STREAM possible_fraud AS SELECT
card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3;
What is stream processing? @KeithResar CREATE STREAM possible_fraud AS SELECT
card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3;
What is stream processing? @KeithResar CREATE STREAM possible_fraud AS SELECT
card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3;
What is stream processing? @KeithResar CREATE STREAM possible_fraud AS SELECT
card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3;
What is stream processing? @KeithResar CREATE STREAM possible_fraud AS SELECT
card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3;
What is stream processing? @KeithResar CREATE STREAM possible_fraud AS SELECT
card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3;
None
Wrap Up 64 @KeithResar
developer.confluent.io Learn Kafka. Start building with Apache Kafka at Confluent
Developer.
Free eBooks Designing Event-Driven Systems Ben Stopford Kafka: The Definitive
Guide Neha Narkhede, Gwen Shapira, Todd Palino Making Sense of Stream Processing Martin Kleppmann I ❤ Logs Jay Kreps http://cnfl.io/book-bundle
None
Thank You @KeithResar Kafka Developer confluent.io