Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Kafka, the hard parts
Search
Chris Keathley
January 10, 2019
Programming
2
1.5k
Kafka, the hard parts
This talk tries to summarize a lot of the lessons I've learned building systems on kafka.
Chris Keathley
January 10, 2019
Tweet
Share
More Decks by Chris Keathley
See All by Chris Keathley
Solid code isn't flexible
keathley
4
980
Building Adaptive Systems
keathley
38
2.3k
Contracts for building reliable systems
keathley
5
750
Building Resilient Elixir Systems
keathley
6
2k
Consistent, Distributed Elixir
keathley
5
1.5k
Telling stories with data visualization
keathley
0
560
Easing into continuous deployment
keathley
1
310
Leveling up your git skills
keathley
0
680
Generative Testing in Elixir
keathley
0
450
Other Decks in Programming
See All in Programming
OSSで起業してもうすぐ10年 / Open Source Conference 2024 Shimane
furukawayasuto
0
100
Webの技術スタックで マルチプラットフォームアプリ開発を可能にするElixirDesktopの紹介
thehaigo
2
1k
Less waste, more joy, and a lot more green: How Quarkus makes Java better
hollycummins
0
100
Amazon Bedrock Agentsを用いてアプリ開発してみた!
har1101
0
330
初めてDefinitelyTypedにPRを出した話
syumai
0
400
Enabling DevOps and Team Topologies Through Architecture: Architecting for Fast Flow
cer
PRO
0
310
Ethereum_.pdf
nekomatu
0
460
詳細解説! ArrayListの仕組みと実装
yujisoftware
0
580
AWS Lambdaから始まった Serverlessの「熱」とキャリアパス / It started with AWS Lambda Serverless “fever” and career path
seike460
PRO
1
250
Streams APIとTCPフロー制御 / Web Streams API and TCP flow control
tasshi
2
350
とにかくAWS GameDay!AWSは世界の共通言語! / Anyway, AWS GameDay! AWS is the world's lingua franca!
seike460
PRO
1
860
What’s New in Compose Multiplatform - A Live Tour (droidcon London 2024)
zsmb
1
470
Featured
See All Featured
Dealing with People You Can't Stand - Big Design 2015
cassininazir
364
24k
Docker and Python
trallard
40
3.1k
Designing on Purpose - Digital PM Summit 2013
jponch
115
7k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.3k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
26
1.4k
A Modern Web Designer's Workflow
chriscoyier
693
190k
YesSQL, Process and Tooling at Scale
rocio
169
14k
The Cost Of JavaScript in 2023
addyosmani
45
6.7k
Optimising Largest Contentful Paint
csswizardry
33
2.9k
Unsuck your backbone
ammeep
668
57k
How to Think Like a Performance Engineer
csswizardry
20
1.1k
Java REST API Framework Comparison - PWX 2021
mraible
PRO
28
8.2k
Transcript
Kafka The Hard Parts Chris Keathley / @ChrisKeathley / keathley.io
Kafka is great
Kafka is just a log
https://flic.kr/p/9aXr88
https://flic.kr/p/9aXr88 Kafka
Kafka https://flic.kr/p/9aXr88 (metaphor)
Log aggregation Analytics and activity tracking Queuing ETL Messaging Stream
Processing Kafka Uses
Event Sourcing
Log aggregation Analytics and activity tracking Queuing ETL Messaging Stream
Processing Kafka Uses
https://flic.kr/p/hrrbVx
https://flic.kr/p/hrrbVx (still a metaphor) Kafka
Large consequences for failure
Joke about mr. glass
Joke about mr. glass
Iteration Is Hard
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
Topic
Topic
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Written to the File system
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Messages are ordered
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Consumer
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Consumer
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Consumer
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Consumer
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Consumer
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Consumer Consumer
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Consumer Consumer
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Consumer Consumer
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Consumer Consumer
Topic
Topic Topic Topic Topic Broker
Broker Broker Broker
None
Replication Leader
Clients Java Client librdkafka
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
Order is important User Events
Order is important User Events
Order is important Follow
Order is important Follow
Order is important Follow Message
Order is important Follow Message Unfollow
Order is important Follow Message Unfollow Causal
Order is important Follow Message Unfollow Consumer
Order is important Follow Message Unfollow Consumer
Order is important Follow Message Unfollow Consumer
Order is important Follow Message Unfollow
Order is important Follow Message Unfollow Consumer
Order is important Follow Message Unfollow Consumer
Order is important Follow Message Unfollow Consumer
Group records based on order
Partitioner to_int(hash(key)) % partitions
Partitioner to_int(hash(user_id)) % partitions
Follow Message Unfollow Grouping Consumers
Follow Message Unfollow Causal Grouping Consumers
Follow Message Unfollow Grouping Consumers Follow Processor Message Processor
Follow Message Unfollow Grouping Consumers Follow Processor Message Processor
Follow Message Unfollow Grouping Consumers Follow Processor Message Processor
Follow Message Unfollow Grouping Consumers User event processor
Follow Message Unfollow Grouping Consumers User event processor
Follow Message Unfollow Grouping Consumers User event processor
User Events Create pipelines User event processor Messages
User Events Create pipelines User event processor Messages Consumes
User Events Create pipelines User event processor Messages Consumes Produces
"Commander: Better Distributed Applications through CQRS and Event Sourcing" by
Bobby Calderwood https://youtu.be/B1-gS0oEtYc
The less dependence you can have between consumers the better
Random partitioning is best if you can avoid ordering
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
Errors have the potential to wreck your day
Consumer Errors
Consumer Errors
Consumer Errors
Consumer Errors
Consumer Errors Blocking the head of the line
Consumer What should we do? Errors
Non-Blocking vs. Blocking
Non-Blocking vs. Blocking
Non-Blocking Errors Consumer 42 1337 “Robert’);drop table students;—”
Non-Blocking Errors Consumer 42 1337 “Robert’);drop table students;—” What do
we do?
Non-Blocking Errors Consumer
Non-Blocking Errors Consumer
Non-Blocking Errors Consumer Error Topic
Non-Blocking Errors Consumer
Non-Blocking Errors Consumer
Non-Blocking Errors Consumer
Non-Blocking vs. Blocking
Non-Blocking vs. Blocking
Blocking Errors Database Consumer
Blocking Errors Database Consumer Process messages Store Information
Blocking Errors Database Consumer
Blocking Errors Database Consumer
Blocking Errors Database Consumer What do we do?
Blocking Errors Database Consumer Retry
Blocking Errors Database Consumer Send alerts
Skip non-blocking errors & Retry blocking errors
Design errors out of existence
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
Delivery Guarantees
Computer A Communication is hard Computer B What time is
it?
Computer A Communication is hard Computer B
Computer A Communication is hard Computer B Did you get
it?
Computer A Communication is hard Computer B How about now?
Computer A Communication is hard Computer B Now?
0 <= 1 <= n Delivery At least once At
most once Impossible-ish
Consumers should *ALWAYS* assume “At Least Once”
The Joys of Functional Programming
None
You
You Functional Programming
Immutability and Idempotence
Immutability: An immutable object is an object whose state cannot
be modified after it is created.
Idempotence: …the property of certain operations in mathematics and computer
science whereby they can be applied multiple times without changing the result beyond the initial application.
Idempotence: Execute the same operation more than once but only
see the effect once.
Idempotent Operations
Counting comments comment comment comment increment 1
Counting comments comment comment comment increment 1
Counting comments comment comment comment increment 2
Counting comments comment comment comment increment 2
Counting comments comment comment comment increment 3
Counting comments comment comment comment increment 3 Some Error
Counting comments comment comment comment increment 3
Counting comments comment comment comment increment 3
Counting comments comment comment comment increment 4
Counting comments comment comment comment increment 4
Counting comments comment comment comment increment 5
Counting comments comment comment comment increment 5
Counting comments comment comment comment increment 6
Kafka Record { data: {}, type: “comment.created”, }
Kafka Record { data: {}, type: “comment.created”, msg_id: UUIDv4 }
Kafka Record { data: {}, type: “comment.created”, msg_id: UUIDv4 }
Used for managing idempotence
Counting comments comment comment comment increment 1
Counting comments comment comment comment Set.add(id) id: 1 id: 2
id: 3 (1)
Counting comments comment comment comment Set.add(id) id: 1 id: 2
id: 3 (1)
Counting comments comment comment comment Set.add(id) id: 1 id: 2
id: 3 (1, 2)
Counting comments comment comment comment Set.add(id) id: 1 id: 2
id: 3 (1, 2)
Counting comments comment comment comment Set.add(id) id: 1 id: 2
id: 3 (1, 2, 3)
Counting comments comment comment comment Set.add(id) id: 1 id: 2
id: 3 (1, 2, 3) Some Error
Counting comments comment comment comment id: 1 id: 2 id:
3 Set.add(id) (1, 2, 3)
Counting comments comment comment comment id: 1 id: 2 id:
3 Set.add(id) (1, 2, 3)
Counting comments comment comment comment id: 1 id: 2 id:
3 Set.add(id) (1, 2, 3)
Counting comments (1, 2, 3)
Counting comments cardinality(1, 2, 3)
Counting comments cardinality(1, 2, 3) => 3
Idempotent Side-Effects
smtp send_email Sending Emails email id: 1 email id: 2
email id: 3
smtp send_email Sending Emails email id: 1 email id: 2
email id: 3 What do we do if this fails?
smtp send_email Sending Emails email id: 1 email id: 2
email id: 3 Send at most once
smtp send_email Sending Emails email id: 1
Cache send_email Sending Emails email id: 1 smtp
Cache send_email Sending Emails email id: 1 smtp id?(1)
Cache send_email Sending Emails email id: 1 smtp id?(1) If
id exists then skip it
Cache send_email Sending Emails email id: 1 smtp
Cache send_email Sending Emails email id: 1 smtp add(1)
Cache send_email Sending Emails email id: 1 smtp
Cache send_email Sending Emails email id: 1 smtp
Cache send_email Sending Emails email id: 1 smtp
send_email Sending Emails email id: 1
send_email Sending Emails email id: 1 If we see this
message again move it to an audit topic
send_email Sending Emails If we see this message again move
it to an audit topic email id: 1
send_email Sending Emails
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
User Events Teams User event processor Messages Notifications Notifications Notification
Sender
User Events Teams User event processor Messages Notifications Notifications Notification
Sender Teams
Data is the language of the system
{ msg_id: "8700635f-1802-417e-89e7-595ad3600104", type: "comment.created", data: { user_id: 1234, msg:
"This is a super fun conference!" } } Data payloads
{ msg_id: String, type: String, data: { user_id: Integer, msg:
String } } Data payloads
{ msg_id: String, type: String, data: { user_id: Integer, msg:
String } } Data payloads None of this tells you anything useful about your data
{ msg_id: String, type: String, data: { user_id: Integer, msg:
String } } Data payloads What do we do when these things change?
{ msg_id: String, type: String, data: { user_id: String, msg:
String } } Data payloads What do we do when these things change?
{ msg_id: String, type: String, data: { user_id: String, msg:
String } } Data payloads Lets just use versions!
{ msg_id: String, type: String, data: { user_id: String, msg:
String } } Data payloads Lets just use versions! (spoiler: this isn’t great)
{ msg_id: String, type: String, data: { user_id: String, msg:
String } } Data payloads
{ msg_id: String, type: String, data: { user_id: String, msg:
String }, meta: { version: 2 } } Data payloads
Data Versions Consumer v1 v1 v1 v1 v2
Data Versions Consumer v1 v1 v1 v1 v2 This consumer
needs to understand both versions
Data Versions Consumer v1 v1 v1 v1 v2 This team
needs to know to make these changes
Versioning is broken
(sem)Versioning is broken
Change Growth Breakage
Change Growth Breakage Never do this
Growing schemas should be the default
{ msg_id: String, type: String, data: { user_id: String, msg:
String } } Data payloads
{ msg_id: String, type: String, data: { user_id: Integer, msg:
String } } Data payloads What are these?
Dependent Types
{ msg_id: String, type: String, data: { user_id: Integer, msg:
String } } Data payloads What are these?
Norm
{ msg_id: String, type: String, data: { user_id: String, msg:
String } } Data payloads
UUID = string? & re_matches?(/^[0-9A-F]{8}-[0-9A-F] {4}-4[0-9A-F]{3}-[89AB][0-9A-F]{3}-[0-9A-F]{12}$/i) ) CommentCreated = schema{
req :msg_id, UUID req :type, lit(“comment.created”) req :data, schema { req :user_id, integer? | UUID req :msg, string? } } Data payloads
json = {type: “comment.created”, msg: “Hello world”} Norm.decode(CommentEvent, json) =>
{:ok, data} Norm.decode(CommentEvent, {}) => {:error, errors} Norm.explain(CommentEvent, {}) => "In :msg_id, val: {} fails spec: required In :type, val: {} fails spec: required In :data, val: {} fails spec: required" Data payloads
Norm is built for extensibility
CommentEvent = schema{ req :type, lit(“comment.created”) req :msg, string? }
json = { type: “comment.created”, msg: “Hello world”, data: { msg: “Hello world” } } Norm.decode(CommentEvent, json) => {:ok, data} Norm is extensible
CommentEvent = schema{ req :type, lit(“comment.created”) req :msg, string? }
json = { type: “comment.created”, msg: “Hello world”, data: { msg: “Hello world” } } Norm.decode(CommentEvent, json) => {:ok, data} Norm is extensible This will still get passed through
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
Property Based Testing
Property based testing Database Consumer
Property based testing Database Consumer id: 1 id: 2 id:
3 id: 1
Property based testing Database Consumer id: 1 id: 2 id:
3 id: 1 Information should end up here
Property based testing Database Consumer id: 1 id: 2 id:
3 id: 1 Some combination of these messages causes a failure
Property based testing Database id: 1 id: 1 Consumer
Property based testing Database id: 1 id: 1 Looks like
we aren’t handling duplicates correctly Consumer
Property based testing Database id: 1 id: 1 Consumer
Property based testing Database Consumer id: 1 id: 1 Deterministically
fail this connection
Chaos Engineering
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
Monitoring vs. Observability
Monitoring: Figuring out that there’s a problem
Observability: Determining what the problem is.
Goal: Detect lagging or blocked consumers
Wisen
Wisen User Events User Consumer
metadata topic Wisen User Events Checkpoints its position in the
log to an offset topic User Consumer
Wisen metadata topic Wisen User Consumer User Events
Wisen metadata topic Wisen User Consumer User Events Compares farthest
offset from checkpoints over a time-window
Wisen user_consumer_errors Wisen User Consumer User Events
Wisen user_consumer_errors Wisen User Consumer User Events
Wisen user_consumer_errors Wisen User Consumer User Events Alert if we
see a rise in errors
Other useful metrics: Median and Tail latencies Internal buffers DB/Cache/RPC
latencies
OpenTracing
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Monitoring Capacity Planning #hottakes
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Monitoring Capacity Planning #hottakes
This has to be done up-front
Calculating partions messages in the system = arrival rate *
mean time in system
Calculating partions Desired throughput / measured throughput on one partition
=> partitions needed
Calculating partions partitions < 100 x brokers x replication factor
source: https://www.confluent.io/blog/how-choose-number-topics-partitions-kafka-cluster
Increasing partitions is tricky if you rely on ordering
to_int(hash(user_id)) % partitions
to_int(hash(user_id)) % partitions Existing data is not reshuffled if partitions
are increased
Data is not forever.
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Monitoring Capacity Planning #hottakes
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Monitoring Capacity Planning #hottakes
CQRS & Event Sourcing
Don’t rush to democratize your data
Embrace data and design
Go forth and build awesome stuff!
Thanks Chris Keathley / @ChrisKeathley / keathley.io