Apache Kafka: Turning your architecture
inside out
Riviera DEV, 18 May 2018
Tom Bentley
Slide 2
Slide 2 text
Kafka is...
● … a distributed, fault tolerant
commit log
● … a horizontally scalable
publish/subscribe message broker
● … a message streaming platform
broker
broker
broker
producer
consumer
consumer
Slide 3
Slide 3 text
Records (aka Messages)
● Optional Key
○ Kafka only sees the key as uninterpreted bytes
● Value
● (Headers, since v0.11.0.0)
● Kafka doesn’t care about the content of value or headers
● Timestamp
Record
key(): K
value(): V
headers(): Map
timestamp(): Long
Conceptually:
Slide 4
Slide 4 text
Topics
● Logical grouping of records identified by a name
○ In reality the partitions are not stored together
● Sharded into n partitions, 0...n-1
Record
Record
Record
Record
Topic: customers.new
Partition: 0
Partition: 1
Slide 5
Slide 5 text
Producers
● Producers publish new records to a topic(s)
● Producer decides which partition a record belongs in
○ Semantic partitioning
○ Else, if there is a key: hash(key) mod #partitions
○ Otherwise: round-robin
Record
Record
Record
Record
Producer
Slide 6
Slide 6 text
Brokers
T: foo
P: 1
M: blah
Broker
T: foo
P: 1
T: bar
P: 2
● Producer sends record to the leader broker for the partition
⇒ So records in different partitions get sent to different brokers
● Broker has an append-only log of records for individual partitions
● Once appended, records can be identified by the offset within the partition
● Records retained according to a policy:
○ deleted: according to size or time-based threshold
○ compacted: until a new message with the same key arrives
Slide 7
Slide 7 text
Consumers
M: blah
T: foo
P: 1
Broker
T: foo
P: 1
Consumer
● Consumers fetch records from the leader for a partition
⇒ Consuming all partitions in a topic means connections to many brokers
● Consumers address records they’re reading by the offset
○ Can re-read by seeking to a previous offset
○ Messages can be skipped
● Message order preserved for a given partition
Slide 8
Slide 8 text
Partitions ⇒ Scalability
Key insight:
If each partition is stored on a different broker then the load for producing
and consuming the topic is spread across those brokers
● So can scale up throughput by having a larger number of partitions and/or
brokers
Slide 9
Slide 9 text
Consumer groups
● Consumers can be different processes on
different machines
● Consumers in same consumer group
discover each other via Kafka protocol
● A group leader is elected
● Leader assigns partitions to consumers
● Membership changes ⇒ reassignment
● Leader dies ⇒ Another election
● Makes it very easy to scale up
consumption
T: bar
P: 0
T: bar
P: 1
T: bar
P: 2
Consumer
cg: xyz
Consumer
cg: xyz
Slide 10
Slide 10 text
Replicas ⇒ Fault tolerance
● Partitions replicated on other brokers
● Follower broker for a replica fetches from leader broker
● If leader crashes, one of the followers is elected new leader
○ Producers and consumers and other followers produce/fetch from new
leader
● When old leader restarts it will be a follower
Broker 1
T: foo
P: 1
T: bar
P: 2
Broker 2
T: foo
P: 1
T: bar
P: 2
Leaders
Followers
Slide 11
Slide 11 text
Replicas ⇒ Fault tolerance
● Partitions replicated on other brokers
● Follower broker for a replica fetches from leader broker
● If leader crashes, one of the followers is elected new leader
○ Producers and consumers and other followers produce/fetch from new
leader
● When old leader restarts it will be a follower
Broker 1
T: foo
P: 1
T: bar
P: 2
Broker 2
T: foo
P: 1
T: bar
P: 2
Broker 2 fetches P:1 Broker 1 fetches P: 2
Slide 12
Slide 12 text
Replicas ⇒ Fault tolerance
● Partitions replicated on other brokers
● Follower broker for a replica fetches from leader broker
● If leader crashes, one of the followers is elected new leader
○ Producers and consumers and other followers produce/fetch from new
leader
● When old leader restarts it will be a follower
Broker 1
T: foo
P: 1
T: bar
P: 2
Broker 2
T: foo
P: 1
T: bar
P: 2
Slide 13
Slide 13 text
Performance
● Partitioning & batching, are prominent features
● Gain scalability by making clients aware of cluster topology
● Clients need to talk to leader broker ⇒ must be able to talk to all brokers
● Clients know identity of brokers
● Can’t hide brokers behind a load balancer
Slide 14
Slide 14 text
Balancing
● Some partitions cause a lot more load than others
● We want to avoid having any saturated brokers
● ⇒ Need to spread the hot partitions around
● Reassigning partitions between brokers can be slow
● Constrained optimization problem (Bin packing)
● Automated solutions
Slide 15
Slide 15 text
DEMO: Producing
Slide 16
Slide 16 text
Warning: Detour next 2 slides
Slide 17
Slide 17 text
Core Kafka & Microservices
● History included – free audit log
● Loosely coupled – sender needs no knowledge of receiver(s)
● Availability – Sender doesn’t require receiver to be available
● Immutable log ⇒ less need to encapsulate access to the data
○ emphasis more about sharing the data
○ the data is more important than the API used to access it
Slide 18
Slide 18 text
Events & Tables
Alice 45
Bob 12
Carol 23
Alice 32
Carol 19
Alice 32
Bob 12
Carol 19
Snapshot!
Time
A table is a snapshot of a stream
Alice 45
Bob 12
Carol 23
Update Alice set score=32
Update Carol set score=19
Alice 45
Bob 12
Carol 23
Alice 32
Carol 19
A stream is a changelog of a table
Slide 19
Slide 19 text
Kafka Streams
● Typical Kafka microservices share a lot of common code
● Kafka Streams is a framework for writing applications
● Just a jar file, runs in your application
● Leverages consumer groups scaling so it’s easy to horizontally scale your
application
● Presents a higher level API using “Streams” rather than (lowlevel) Topics
● Perform operations on whole streams rather than individual records
○ E.g. filter, map
● Applications are written by composing such operations
● The composition graph is called the “processor topology”
Slide 20
Slide 20 text
Processor topology
● Processors form a directed graph
● Processors as nodes
● Incoming edges are the operand streams
● Outgoing edges are the result stream(s)
● Source processors create a stream from a
Kafka topic or other source
● Sink processors are the output of the
Streams application and produce a Kafka
topic
Source Processors
Sink processor
Slide 21
Slide 21 text
Streams and Tables
● Kafka Streams has tables too!
● Stateless operations result in Streams
● Stateful operations can result in tables
○ Aggregation, Join, Windowing
● Can always turn a Table back into a Stream
● Tables can be interactively queried
Slide 22
Slide 22 text
Stream Processors (low-level API)
● Low-level API corresponds to a very generic, possibly stateful, processor
within the topology
● Writing your own processor ⇒ creating a custom operator in the high level
API (DSL)
● Kafka Streams uses in-memory and RocksDB state stores to implement the
higher level operations
● Custom state stores are also possible
Slide 23
Slide 23 text
Streams DSL
● Operations on streams:
○ Stateless: Filter, Map,
GroupBy etc
○ Stateful: Aggregation,
Join, etc
● Applications can compose
operations to perform
computation
Diagram credit: Kafka docs
Slide 24
Slide 24 text
“Traditional” Microservices
● Synchronous microservices
● OrderService orchestrates processing of an order: reserveStock(),
then takePayment(), then dispatchOrder()
● Exceptional flows for things like payment failure => unreserveStock()
OrderService StockService PaymentService DispatchService
reserveStock()
dispatchOrder()
createOrder()
takePayment()
Slide 25
Slide 25 text
Microservices: Orchestration
● Asynchronous microservices
● OrderService can still orchestrate
● Needs to watch for the replies explicitly
● OrderService will sit idle waiting for those replies
OrderService StockService PaymentService DispatchService
reserve.stock
dispatch.order
createOrder()
take.payment
stock.reservation
payment.result
Slide 26
Slide 26 text
Microservices: Choreography
● Asynchronous alternative: Choreography
● Services listen for specific triggering events and take action
● Ordering imposed by the event types
● Multiple components can respond to the same event, e.g. StockService
could respond to a paymentFailure by restoring reserved stock
OrderService StockService PaymentService DispatchService
order.created
payment.result
createOrder()
stock.reserved
Slide 27
Slide 27 text
DEMO: Streams
Slide 28
Slide 28 text
How is Kafka turning your architecture inside-out?
●