the key as uninterpreted bytes • Value • (Headers, since v0.11.0.0) • Kafka doesn’t care about the content of value or headers • Timestamp Record<K, V> key(): K value(): V headers(): Map<String, byte[]> timestamp(): Long Conceptually:
◦ In reality the partitions are not stored together • Sharded into n partitions, 0...n-1 Record Record Record Record Topic: customers.new Partition: 0 Partition: 1
Producer decides which partition a record belongs in ◦ Semantic partitioning ◦ Else, if there is a key: hash(key) mod #partitions ◦ Otherwise: round-robin Record Record Record Record Producer
P: 1 T: bar P: 2 • Producer sends record to the leader broker for the partition ⇒ So records in different partitions get sent to different brokers • Broker has an append-only log of records for individual partitions • Once appended, records can be identified by the offset within the partition • Records retained according to a policy: ◦ deleted: according to size or time-based threshold ◦ compacted: until a new message with the same key arrives
P: 1 Consumer • Consumers fetch records from the leader for a partition ⇒ Consuming all partitions in a topic means connections to many brokers • Consumers address records they’re reading by the offset ◦ Can re-read by seeking to a previous offset ◦ Messages can be skipped • Message order preserved for a given partition
on a different broker then the load for producing and consuming the topic is spread across those brokers • So can scale up throughput by having a larger number of partitions and/or brokers
machines • Consumers in same consumer group discover each other via Kafka protocol • A group leader is elected • Leader assigns partitions to consumers • Membership changes ⇒ reassignment • Leader dies ⇒ Another election • Makes it very easy to scale up consumption T: bar P: 0 T: bar P: 1 T: bar P: 2 Consumer cg: xyz Consumer cg: xyz
• Follower broker for a replica fetches from leader broker • If leader crashes, one of the followers is elected new leader ◦ Producers and consumers and other followers produce/fetch from new leader • When old leader restarts it will be a follower Broker 1 T: foo P: 1 T: bar P: 2 Broker 2 T: foo P: 1 T: bar P: 2 Leaders Followers
• Follower broker for a replica fetches from leader broker • If leader crashes, one of the followers is elected new leader ◦ Producers and consumers and other followers produce/fetch from new leader • When old leader restarts it will be a follower Broker 1 T: foo P: 1 T: bar P: 2 Broker 2 T: foo P: 1 T: bar P: 2 Broker 2 fetches P:1 Broker 1 fetches P: 2
• Follower broker for a replica fetches from leader broker • If leader crashes, one of the followers is elected new leader ◦ Producers and consumers and other followers produce/fetch from new leader • When old leader restarts it will be a follower Broker 1 T: foo P: 1 T: bar P: 2 Broker 2 T: foo P: 1 T: bar P: 2
scalability by making clients aware of cluster topology • Clients need to talk to leader broker ⇒ must be able to talk to all brokers • Clients know identity of brokers • Can’t hide brokers behind a load balancer
others • We want to avoid having any saturated brokers • ⇒ Need to spread the hot partitions around • Reassigning partitions between brokers can be slow • Constrained optimization problem (Bin packing) • Automated solutions
log • Loosely coupled – sender needs no knowledge of receiver(s) • Availability – Sender doesn’t require receiver to be available • Immutable log ⇒ less need to encapsulate access to the data ◦ emphasis more about sharing the data ◦ the data is more important than the API used to access it
32 Carol 19 Alice 32 Bob 12 Carol 19 Snapshot! Time A table is a snapshot of a stream Alice 45 Bob 12 Carol 23 Update Alice set score=32 Update Carol set score=19 Alice 45 Bob 12 Carol 23 Alice 32 Carol 19 A stream is a changelog of a table
common code • Kafka Streams is a framework for writing applications • Just a jar file, runs in your application • Leverages consumer groups scaling so it’s easy to horizontally scale your application • Presents a higher level API using “Streams” rather than (lowlevel) Topics • Perform operations on whole streams rather than individual records ◦ E.g. filter, map • Applications are written by composing such operations • The composition graph is called the “processor topology”
as nodes • Incoming edges are the operand streams • Outgoing edges are the result stream(s) • Source processors create a stream from a Kafka topic or other source • Sink processors are the output of the Streams application and produce a Kafka topic Source Processors Sink processor
Stateless operations result in Streams • Stateful operations can result in tables ◦ Aggregation, Join, Windowing • Can always turn a Table back into a Stream • Tables can be interactively queried
very generic, possibly stateful, processor within the topology • Writing your own processor ⇒ creating a custom operator in the high level API (DSL) • Kafka Streams uses in-memory and RocksDB state stores to implement the higher level operations • Custom state stores are also possible
an order: reserveStock(), then takePayment(), then dispatchOrder() • Exceptional flows for things like payment failure => unreserveStock() OrderService StockService PaymentService DispatchService reserveStock() dispatchOrder() createOrder() takePayment()
• Needs to watch for the replies explicitly • OrderService will sit idle waiting for those replies OrderService StockService PaymentService DispatchService reserve.stock dispatch.order createOrder() take.payment stock.reservation payment.result
specific triggering events and take action • Ordering imposed by the event types • Multiple components can respond to the same event, e.g. StockService could respond to a paymentFailure by restoring reserved stock OrderService StockService PaymentService DispatchService order.created payment.result createOrder() stock.reserved