Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Kafka, the hard parts
Search
Chris Keathley
January 10, 2019
Programming
3
1.7k
Kafka, the hard parts
This talk tries to summarize a lot of the lessons I've learned building systems on kafka.
Chris Keathley
January 10, 2019
Tweet
Share
More Decks by Chris Keathley
See All by Chris Keathley
Solid code isn't flexible
keathley
5
1k
Building Adaptive Systems
keathley
43
2.6k
Contracts for building reliable systems
keathley
6
890
Building Resilient Elixir Systems
keathley
7
2.2k
Consistent, Distributed Elixir
keathley
6
1.5k
Telling stories with data visualization
keathley
1
620
Easing into continuous deployment
keathley
2
380
Leveling up your git skills
keathley
0
760
Generative Testing in Elixir
keathley
0
520
Other Decks in Programming
See All in Programming
dbt民主化とLLMによる開発ブースト ~ AI Readyな分析サイクルを目指して ~
yoshyum
2
240
Create a website using Spatial Web
akkeylab
0
310
Node-RED を(HTTP で)つなげる MCP サーバーを作ってみた
highu
0
120
エラーって何種類あるの?
kajitack
5
330
アンドパッドの Go 勉強会「 gopher 会」とその内容の紹介
andpad
0
290
deno-redisの紹介とJSRパッケージの運用について (toranoana.deno #21)
uki00a
0
170
『自分のデータだけ見せたい!』を叶える──Laravel × Casbin で複雑権限をスッキリ解きほぐす 25 分
akitotsukahara
1
600
Blazing Fast UI Development with Compose Hot Reload (droidcon New York 2025)
zsmb
1
270
0626 Findy Product Manager LT Night_高田スライド_speaker deck用
mana_takada
0
140
ペアプロ × 生成AI 現場での実践と課題について / generative-ai-in-pair-programming
codmoninc
0
230
すべてのコンテキストを、 ユーザー価値に変える
applism118
2
1.1k
AWS CDKの推しポイント 〜CloudFormationと比較してみた〜
akihisaikeda
3
320
Featured
See All Featured
Fashionably flexible responsive web design (full day workshop)
malarkey
407
66k
Optimising Largest Contentful Paint
csswizardry
37
3.3k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
248
1.3M
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
34
5.9k
Stop Working from a Prison Cell
hatefulcrawdad
270
20k
The World Runs on Bad Software
bkeepers
PRO
69
11k
How to Think Like a Performance Engineer
csswizardry
24
1.7k
The Art of Programming - Codeland 2020
erikaheidi
54
13k
A Tale of Four Properties
chriscoyier
160
23k
The Illustrated Children's Guide to Kubernetes
chrisshort
48
50k
Speed Design
sergeychernyshev
32
1k
Principles of Awesome APIs and How to Build Them.
keavy
126
17k
Transcript
Kafka The Hard Parts Chris Keathley / @ChrisKeathley / keathley.io
Kafka is great
Kafka is just a log
https://flic.kr/p/9aXr88
https://flic.kr/p/9aXr88 Kafka
Kafka https://flic.kr/p/9aXr88 (metaphor)
Log aggregation Analytics and activity tracking Queuing ETL Messaging Stream
Processing Kafka Uses
Event Sourcing
Log aggregation Analytics and activity tracking Queuing ETL Messaging Stream
Processing Kafka Uses
https://flic.kr/p/hrrbVx
https://flic.kr/p/hrrbVx (still a metaphor) Kafka
Large consequences for failure
Joke about mr. glass
Joke about mr. glass
Iteration Is Hard
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
Topic
Topic
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Written to the File system
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Messages are ordered
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Consumer
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Consumer
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Consumer
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Consumer
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Consumer
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Consumer Consumer
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Consumer Consumer
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Consumer Consumer
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Consumer Consumer
Topic
Topic Topic Topic Topic Broker
Broker Broker Broker
None
Replication Leader
Clients Java Client librdkafka
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
Order is important User Events
Order is important User Events
Order is important Follow
Order is important Follow
Order is important Follow Message
Order is important Follow Message Unfollow
Order is important Follow Message Unfollow Causal
Order is important Follow Message Unfollow Consumer
Order is important Follow Message Unfollow Consumer
Order is important Follow Message Unfollow Consumer
Order is important Follow Message Unfollow
Order is important Follow Message Unfollow Consumer
Order is important Follow Message Unfollow Consumer
Order is important Follow Message Unfollow Consumer
Group records based on order
Partitioner to_int(hash(key)) % partitions
Partitioner to_int(hash(user_id)) % partitions
Follow Message Unfollow Grouping Consumers
Follow Message Unfollow Causal Grouping Consumers
Follow Message Unfollow Grouping Consumers Follow Processor Message Processor
Follow Message Unfollow Grouping Consumers Follow Processor Message Processor
Follow Message Unfollow Grouping Consumers Follow Processor Message Processor
Follow Message Unfollow Grouping Consumers User event processor
Follow Message Unfollow Grouping Consumers User event processor
Follow Message Unfollow Grouping Consumers User event processor
User Events Create pipelines User event processor Messages
User Events Create pipelines User event processor Messages Consumes
User Events Create pipelines User event processor Messages Consumes Produces
"Commander: Better Distributed Applications through CQRS and Event Sourcing" by
Bobby Calderwood https://youtu.be/B1-gS0oEtYc
The less dependence you can have between consumers the better
Random partitioning is best if you can avoid ordering
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
Errors have the potential to wreck your day
Consumer Errors
Consumer Errors
Consumer Errors
Consumer Errors
Consumer Errors Blocking the head of the line
Consumer What should we do? Errors
Non-Blocking vs. Blocking
Non-Blocking vs. Blocking
Non-Blocking Errors Consumer 42 1337 “Robert’);drop table students;—”
Non-Blocking Errors Consumer 42 1337 “Robert’);drop table students;—” What do
we do?
Non-Blocking Errors Consumer
Non-Blocking Errors Consumer
Non-Blocking Errors Consumer Error Topic
Non-Blocking Errors Consumer
Non-Blocking Errors Consumer
Non-Blocking Errors Consumer
Non-Blocking vs. Blocking
Non-Blocking vs. Blocking
Blocking Errors Database Consumer
Blocking Errors Database Consumer Process messages Store Information
Blocking Errors Database Consumer
Blocking Errors Database Consumer
Blocking Errors Database Consumer What do we do?
Blocking Errors Database Consumer Retry
Blocking Errors Database Consumer Send alerts
Skip non-blocking errors & Retry blocking errors
Design errors out of existence
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
Delivery Guarantees
Computer A Communication is hard Computer B What time is
it?
Computer A Communication is hard Computer B
Computer A Communication is hard Computer B Did you get
it?
Computer A Communication is hard Computer B How about now?
Computer A Communication is hard Computer B Now?
0 <= 1 <= n Delivery At least once At
most once Impossible-ish
Consumers should *ALWAYS* assume “At Least Once”
The Joys of Functional Programming
None
You
You Functional Programming
Immutability and Idempotence
Immutability: An immutable object is an object whose state cannot
be modified after it is created.
Idempotence: …the property of certain operations in mathematics and computer
science whereby they can be applied multiple times without changing the result beyond the initial application.
Idempotence: Execute the same operation more than once but only
see the effect once.
Idempotent Operations
Counting comments comment comment comment increment 1
Counting comments comment comment comment increment 1
Counting comments comment comment comment increment 2
Counting comments comment comment comment increment 2
Counting comments comment comment comment increment 3
Counting comments comment comment comment increment 3 Some Error
Counting comments comment comment comment increment 3
Counting comments comment comment comment increment 3
Counting comments comment comment comment increment 4
Counting comments comment comment comment increment 4
Counting comments comment comment comment increment 5
Counting comments comment comment comment increment 5
Counting comments comment comment comment increment 6
Kafka Record { data: {}, type: “comment.created”, }
Kafka Record { data: {}, type: “comment.created”, msg_id: UUIDv4 }
Kafka Record { data: {}, type: “comment.created”, msg_id: UUIDv4 }
Used for managing idempotence
Counting comments comment comment comment increment 1
Counting comments comment comment comment Set.add(id) id: 1 id: 2
id: 3 (1)
Counting comments comment comment comment Set.add(id) id: 1 id: 2
id: 3 (1)
Counting comments comment comment comment Set.add(id) id: 1 id: 2
id: 3 (1, 2)
Counting comments comment comment comment Set.add(id) id: 1 id: 2
id: 3 (1, 2)
Counting comments comment comment comment Set.add(id) id: 1 id: 2
id: 3 (1, 2, 3)
Counting comments comment comment comment Set.add(id) id: 1 id: 2
id: 3 (1, 2, 3) Some Error
Counting comments comment comment comment id: 1 id: 2 id:
3 Set.add(id) (1, 2, 3)
Counting comments comment comment comment id: 1 id: 2 id:
3 Set.add(id) (1, 2, 3)
Counting comments comment comment comment id: 1 id: 2 id:
3 Set.add(id) (1, 2, 3)
Counting comments (1, 2, 3)
Counting comments cardinality(1, 2, 3)
Counting comments cardinality(1, 2, 3) => 3
Idempotent Side-Effects
smtp send_email Sending Emails email id: 1 email id: 2
email id: 3
smtp send_email Sending Emails email id: 1 email id: 2
email id: 3 What do we do if this fails?
smtp send_email Sending Emails email id: 1 email id: 2
email id: 3 Send at most once
smtp send_email Sending Emails email id: 1
Cache send_email Sending Emails email id: 1 smtp
Cache send_email Sending Emails email id: 1 smtp id?(1)
Cache send_email Sending Emails email id: 1 smtp id?(1) If
id exists then skip it
Cache send_email Sending Emails email id: 1 smtp
Cache send_email Sending Emails email id: 1 smtp add(1)
Cache send_email Sending Emails email id: 1 smtp
Cache send_email Sending Emails email id: 1 smtp
Cache send_email Sending Emails email id: 1 smtp
send_email Sending Emails email id: 1
send_email Sending Emails email id: 1 If we see this
message again move it to an audit topic
send_email Sending Emails If we see this message again move
it to an audit topic email id: 1
send_email Sending Emails
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
User Events Teams User event processor Messages Notifications Notifications Notification
Sender
User Events Teams User event processor Messages Notifications Notifications Notification
Sender Teams
Data is the language of the system
{ msg_id: "8700635f-1802-417e-89e7-595ad3600104", type: "comment.created", data: { user_id: 1234, msg:
"This is a super fun conference!" } } Data payloads
{ msg_id: String, type: String, data: { user_id: Integer, msg:
String } } Data payloads
{ msg_id: String, type: String, data: { user_id: Integer, msg:
String } } Data payloads None of this tells you anything useful about your data
{ msg_id: String, type: String, data: { user_id: Integer, msg:
String } } Data payloads What do we do when these things change?
{ msg_id: String, type: String, data: { user_id: String, msg:
String } } Data payloads What do we do when these things change?
{ msg_id: String, type: String, data: { user_id: String, msg:
String } } Data payloads Lets just use versions!
{ msg_id: String, type: String, data: { user_id: String, msg:
String } } Data payloads Lets just use versions! (spoiler: this isn’t great)
{ msg_id: String, type: String, data: { user_id: String, msg:
String } } Data payloads
{ msg_id: String, type: String, data: { user_id: String, msg:
String }, meta: { version: 2 } } Data payloads
Data Versions Consumer v1 v1 v1 v1 v2
Data Versions Consumer v1 v1 v1 v1 v2 This consumer
needs to understand both versions
Data Versions Consumer v1 v1 v1 v1 v2 This team
needs to know to make these changes
Versioning is broken
(sem)Versioning is broken
Change Growth Breakage
Change Growth Breakage Never do this
Growing schemas should be the default
{ msg_id: String, type: String, data: { user_id: String, msg:
String } } Data payloads
{ msg_id: String, type: String, data: { user_id: Integer, msg:
String } } Data payloads What are these?
Dependent Types
{ msg_id: String, type: String, data: { user_id: Integer, msg:
String } } Data payloads What are these?
Norm
{ msg_id: String, type: String, data: { user_id: String, msg:
String } } Data payloads
UUID = string? & re_matches?(/^[0-9A-F]{8}-[0-9A-F] {4}-4[0-9A-F]{3}-[89AB][0-9A-F]{3}-[0-9A-F]{12}$/i) ) CommentCreated = schema{
req :msg_id, UUID req :type, lit(“comment.created”) req :data, schema { req :user_id, integer? | UUID req :msg, string? } } Data payloads
json = {type: “comment.created”, msg: “Hello world”} Norm.decode(CommentEvent, json) =>
{:ok, data} Norm.decode(CommentEvent, {}) => {:error, errors} Norm.explain(CommentEvent, {}) => "In :msg_id, val: {} fails spec: required In :type, val: {} fails spec: required In :data, val: {} fails spec: required" Data payloads
Norm is built for extensibility
CommentEvent = schema{ req :type, lit(“comment.created”) req :msg, string? }
json = { type: “comment.created”, msg: “Hello world”, data: { msg: “Hello world” } } Norm.decode(CommentEvent, json) => {:ok, data} Norm is extensible
CommentEvent = schema{ req :type, lit(“comment.created”) req :msg, string? }
json = { type: “comment.created”, msg: “Hello world”, data: { msg: “Hello world” } } Norm.decode(CommentEvent, json) => {:ok, data} Norm is extensible This will still get passed through
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
Property Based Testing
Property based testing Database Consumer
Property based testing Database Consumer id: 1 id: 2 id:
3 id: 1
Property based testing Database Consumer id: 1 id: 2 id:
3 id: 1 Information should end up here
Property based testing Database Consumer id: 1 id: 2 id:
3 id: 1 Some combination of these messages causes a failure
Property based testing Database id: 1 id: 1 Consumer
Property based testing Database id: 1 id: 1 Looks like
we aren’t handling duplicates correctly Consumer
Property based testing Database id: 1 id: 1 Consumer
Property based testing Database Consumer id: 1 id: 1 Deterministically
fail this connection
Chaos Engineering
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
Monitoring vs. Observability
Monitoring: Figuring out that there’s a problem
Observability: Determining what the problem is.
Goal: Detect lagging or blocked consumers
Wisen
Wisen User Events User Consumer
metadata topic Wisen User Events Checkpoints its position in the
log to an offset topic User Consumer
Wisen metadata topic Wisen User Consumer User Events
Wisen metadata topic Wisen User Consumer User Events Compares farthest
offset from checkpoints over a time-window
Wisen user_consumer_errors Wisen User Consumer User Events
Wisen user_consumer_errors Wisen User Consumer User Events
Wisen user_consumer_errors Wisen User Consumer User Events Alert if we
see a rise in errors
Other useful metrics: Median and Tail latencies Internal buffers DB/Cache/RPC
latencies
OpenTracing
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Monitoring Capacity Planning #hottakes
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Monitoring Capacity Planning #hottakes
This has to be done up-front
Calculating partions messages in the system = arrival rate *
mean time in system
Calculating partions Desired throughput / measured throughput on one partition
=> partitions needed
Calculating partions partitions < 100 x brokers x replication factor
source: https://www.confluent.io/blog/how-choose-number-topics-partitions-kafka-cluster
Increasing partitions is tricky if you rely on ordering
to_int(hash(user_id)) % partitions
to_int(hash(user_id)) % partitions Existing data is not reshuffled if partitions
are increased
Data is not forever.
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Monitoring Capacity Planning #hottakes
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Monitoring Capacity Planning #hottakes
CQRS & Event Sourcing
Don’t rush to democratize your data
Embrace data and design
Go forth and build awesome stuff!
Thanks Chris Keathley / @ChrisKeathley / keathley.io