Records (aka Messages) ● Optional Key ○ Kafka only sees the key as uninterpreted bytes ● Value ● (Headers, since v0.11.0.0) ● Kafka doesn’t care about the content of value or headers ● Timestamp Record key(): K value(): V headers(): Map timestamp(): Long Conceptually:
Topics ● Logical grouping of records identified by a name ○ In reality the partitions are not stored together ● Sharded into n partitions, 0...n-1 Record Record Record Record Topic: customers.new Partition: 0 Partition: 1
Producers ● Producers publish new records to a topic(s) ● Producer decides which partition a record belongs in ○ Semantic partitioning ○ Else, if there is a key: hash(key) mod #partitions ○ Otherwise: round-robin Record Record Record Record Producer
Brokers T: foo P: 1 M: blah Broker T: foo P: 1 T: bar P: 2 ● Producer sends record to the leader broker for the partition ⇒ So records in different partitions get sent to different brokers ● Broker has an append-only log of records for individual partitions ● Once appended, records can be identified by the offset within the partition ● Records retained according to a policy: ○ deleted: according to size or time-based threshold ○ compacted: until a new message with the same key arrives
Consumers M: blah T: foo P: 1 Broker T: foo P: 1 Consumer ● Consumers fetch records from the leader for a partition ⇒ Consuming all partitions in a topic means connections to many brokers ● Consumers address records they’re reading by the offset ○ Can re-read by seeking to a previous offset ○ Messages can be skipped ● Message order preserved for a given partition
Partitions ⇒ Scalability Key insight: If each partition is stored on a different broker then the load for producing and consuming the topic is spread across those brokers ● So can scale up throughput by having a larger number of partitions and/or brokers
Consumer groups ● Consumers can be different processes on different machines ● Consumers in same consumer group discover each other via Kafka protocol ● A group leader is elected ● Leader assigns partitions to consumers ● Membership changes ⇒ reassignment ● Leader dies ⇒ Another election ● Makes it very easy to scale up consumption T: bar P: 0 T: bar P: 1 T: bar P: 2 Consumer cg: xyz Consumer cg: xyz
Replicas ⇒ Fault tolerance ● Partitions replicated on other brokers ● Follower broker for a replica fetches from leader broker ● If leader crashes, one of the followers is elected new leader ○ Producers and consumers and other followers produce/fetch from new leader ● When old leader restarts it will be a follower Broker 1 T: foo P: 1 T: bar P: 2 Broker 2 T: foo P: 1 T: bar P: 2 Leaders Followers
Replicas ⇒ Fault tolerance ● Partitions replicated on other brokers ● Follower broker for a replica fetches from leader broker ● If leader crashes, one of the followers is elected new leader ○ Producers and consumers and other followers produce/fetch from new leader ● When old leader restarts it will be a follower Broker 1 T: foo P: 1 T: bar P: 2 Broker 2 T: foo P: 1 T: bar P: 2 Broker 2 fetches P:1 Broker 1 fetches P: 2
Replicas ⇒ Fault tolerance ● Partitions replicated on other brokers ● Follower broker for a replica fetches from leader broker ● If leader crashes, one of the followers is elected new leader ○ Producers and consumers and other followers produce/fetch from new leader ● When old leader restarts it will be a follower Broker 1 T: foo P: 1 T: bar P: 2 Broker 2 T: foo P: 1 T: bar P: 2
Performance ● Partitioning & batching, are prominent features ● Gain scalability by making clients aware of cluster topology ● Clients need to talk to leader broker ⇒ must be able to talk to all brokers ● Clients know identity of brokers ● Can’t hide brokers behind a load balancer
Balancing ● Some partitions cause a lot more load than others ● We want to avoid having any saturated brokers ● ⇒ Need to spread the hot partitions around ● Reassigning partitions between brokers can be slow ● Constrained optimization problem (Bin packing) ● Automated solutions
Core Kafka & Microservices ● History included – free audit log ● Loosely coupled – sender needs no knowledge of receiver(s) ● Availability – Sender doesn’t require receiver to be available ● Immutable log ⇒ less need to encapsulate access to the data ○ emphasis more about sharing the data ○ the data is more important than the API used to access it
Events & Tables Alice 45 Bob 12 Carol 23 Alice 32 Carol 19 Alice 32 Bob 12 Carol 19 Snapshot! Time A table is a snapshot of a stream Alice 45 Bob 12 Carol 23 Update Alice set score=32 Update Carol set score=19 Alice 45 Bob 12 Carol 23 Alice 32 Carol 19 A stream is a changelog of a table
Kafka Streams ● Typical Kafka microservices share a lot of common code ● Kafka Streams is a framework for writing applications ● Just a jar file, runs in your application ● Leverages consumer groups scaling so it’s easy to horizontally scale your application ● Presents a higher level API using “Streams” rather than (lowlevel) Topics ● Perform operations on whole streams rather than individual records ○ E.g. filter, map ● Applications are written by composing such operations ● The composition graph is called the “processor topology”
Processor topology ● Processors form a directed graph ● Processors as nodes ● Incoming edges are the operand streams ● Outgoing edges are the result stream(s) ● Source processors create a stream from a Kafka topic or other source ● Sink processors are the output of the Streams application and produce a Kafka topic Source Processors Sink processor
Streams and Tables ● Kafka Streams has tables too! ● Stateless operations result in Streams ● Stateful operations can result in tables ○ Aggregation, Join, Windowing ● Can always turn a Table back into a Stream ● Tables can be interactively queried
Stream Processors (low-level API) ● Low-level API corresponds to a very generic, possibly stateful, processor within the topology ● Writing your own processor ⇒ creating a custom operator in the high level API (DSL) ● Kafka Streams uses in-memory and RocksDB state stores to implement the higher level operations ● Custom state stores are also possible
“Traditional” Microservices ● Synchronous microservices ● OrderService orchestrates processing of an order: reserveStock(), then takePayment(), then dispatchOrder() ● Exceptional flows for things like payment failure => unreserveStock() OrderService StockService PaymentService DispatchService reserveStock() dispatchOrder() createOrder() takePayment()
Microservices: Orchestration ● Asynchronous microservices ● OrderService can still orchestrate ● Needs to watch for the replies explicitly ● OrderService will sit idle waiting for those replies OrderService StockService PaymentService DispatchService reserve.stock dispatch.order createOrder() take.payment stock.reservation payment.result
Microservices: Choreography ● Asynchronous alternative: Choreography ● Services listen for specific triggering events and take action ● Ordering imposed by the event types ● Multiple components can respond to the same event, e.g. StockService could respond to a paymentFailure by restoring reserved stock OrderService StockService PaymentService DispatchService order.created payment.result createOrder() stock.reserved