Riviera Dev 2018: Apache Kafka: Turning your architecture inside out

Apache Kafka: Turning your architecture inside out Riviera DEV, 18
May 2018 Tom Bentley

Kafka is... • … a distributed, fault tolerant commit log
• … a horizontally scalable publish/subscribe message broker • … a message streaming platform broker broker broker producer consumer consumer

Records (aka Messages) • Optional Key ◦ Kafka only sees
the key as uninterpreted bytes • Value • (Headers, since v0.11.0.0) • Kafka doesn’t care about the content of value or headers • Timestamp Record<K, V> key(): K value(): V headers(): Map<String, byte[]> timestamp(): Long Conceptually:

Topics • Logical grouping of records identified by a name
◦ In reality the partitions are not stored together • Sharded into n partitions, 0...n-1 Record Record Record Record Topic: customers.new Partition: 0 Partition: 1

Producers • Producers publish new records to a topic(s) •
Producer decides which partition a record belongs in ◦ Semantic partitioning ◦ Else, if there is a key: hash(key) mod #partitions ◦ Otherwise: round-robin Record Record Record Record Producer

Brokers T: foo P: 1 M: blah Broker T: foo
P: 1 T: bar P: 2 • Producer sends record to the leader broker for the partition ⇒ So records in different partitions get sent to different brokers • Broker has an append-only log of records for individual partitions • Once appended, records can be identified by the offset within the partition • Records retained according to a policy: ◦ deleted: according to size or time-based threshold ◦ compacted: until a new message with the same key arrives

Consumers M: blah T: foo P: 1 Broker T: foo
P: 1 Consumer • Consumers fetch records from the leader for a partition ⇒ Consuming all partitions in a topic means connections to many brokers • Consumers address records they’re reading by the offset ◦ Can re-read by seeking to a previous offset ◦ Messages can be skipped • Message order preserved for a given partition

Partitions ⇒ Scalability Key insight: If each partition is stored
on a different broker then the load for producing and consuming the topic is spread across those brokers • So can scale up throughput by having a larger number of partitions and/or brokers

Consumer groups • Consumers can be different processes on different
machines • Consumers in same consumer group discover each other via Kafka protocol • A group leader is elected • Leader assigns partitions to consumers • Membership changes ⇒ reassignment • Leader dies ⇒ Another election • Makes it very easy to scale up consumption T: bar P: 0 T: bar P: 1 T: bar P: 2 Consumer cg: xyz Consumer cg: xyz

Replicas ⇒ Fault tolerance • Partitions replicated on other brokers
• Follower broker for a replica fetches from leader broker • If leader crashes, one of the followers is elected new leader ◦ Producers and consumers and other followers produce/fetch from new leader • When old leader restarts it will be a follower Broker 1 T: foo P: 1 T: bar P: 2 Broker 2 T: foo P: 1 T: bar P: 2 Leaders Followers

• Follower broker for a replica fetches from leader broker • If leader crashes, one of the followers is elected new leader ◦ Producers and consumers and other followers produce/fetch from new leader • When old leader restarts it will be a follower Broker 1 T: foo P: 1 T: bar P: 2 Broker 2 T: foo P: 1 T: bar P: 2 Broker 2 fetches P:1 Broker 1 fetches P: 2

• Follower broker for a replica fetches from leader broker • If leader crashes, one of the followers is elected new leader ◦ Producers and consumers and other followers produce/fetch from new leader • When old leader restarts it will be a follower Broker 1 T: foo P: 1 T: bar P: 2 Broker 2 T: foo P: 1 T: bar P: 2

Performance • Partitioning & batching, are prominent features • Gain
scalability by making clients aware of cluster topology • Clients need to talk to leader broker ⇒ must be able to talk to all brokers • Clients know identity of brokers • Can’t hide brokers behind a load balancer

Balancing • Some partitions cause a lot more load than
others • We want to avoid having any saturated brokers • ⇒ Need to spread the hot partitions around • Reassigning partitions between brokers can be slow • Constrained optimization problem (Bin packing) • Automated solutions

DEMO: Producing

Warning: Detour next 2 slides

Core Kafka & Microservices • History included – free audit
log • Loosely coupled – sender needs no knowledge of receiver(s) • Availability – Sender doesn’t require receiver to be available • Immutable log ⇒ less need to encapsulate access to the data ◦ emphasis more about sharing the data ◦ the data is more important than the API used to access it

Events & Tables Alice 45 Bob 12 Carol 23 Alice
32 Carol 19 Alice 32 Bob 12 Carol 19 Snapshot! Time A table is a snapshot of a stream Alice 45 Bob 12 Carol 23 Update Alice set score=32 Update Carol set score=19 Alice 45 Bob 12 Carol 23 Alice 32 Carol 19 A stream is a changelog of a table

Kafka Streams • Typical Kafka microservices share a lot of
common code • Kafka Streams is a framework for writing applications • Just a jar file, runs in your application • Leverages consumer groups scaling so it’s easy to horizontally scale your application • Presents a higher level API using “Streams” rather than (lowlevel) Topics • Perform operations on whole streams rather than individual records ◦ E.g. filter, map • Applications are written by composing such operations • The composition graph is called the “processor topology”

Processor topology • Processors form a directed graph • Processors
as nodes • Incoming edges are the operand streams • Outgoing edges are the result stream(s) • Source processors create a stream from a Kafka topic or other source • Sink processors are the output of the Streams application and produce a Kafka topic Source Processors Sink processor

Streams and Tables • Kafka Streams has tables too! •
Stateless operations result in Streams • Stateful operations can result in tables ◦ Aggregation, Join, Windowing • Can always turn a Table back into a Stream • Tables can be interactively queried

Stream Processors (low-level API) • Low-level API corresponds to a
very generic, possibly stateful, processor within the topology • Writing your own processor ⇒ creating a custom operator in the high level API (DSL) • Kafka Streams uses in-memory and RocksDB state stores to implement the higher level operations • Custom state stores are also possible

Streams DSL • Operations on streams: ◦ Stateless: Filter, Map,
GroupBy etc ◦ Stateful: Aggregation, Join, etc • Applications can compose operations to perform computation Diagram credit: Kafka docs

“Traditional” Microservices • Synchronous microservices • OrderService orchestrates processing of
an order: reserveStock(), then takePayment(), then dispatchOrder() • Exceptional flows for things like payment failure => unreserveStock() OrderService StockService PaymentService DispatchService reserveStock() dispatchOrder() createOrder() takePayment()

Microservices: Orchestration • Asynchronous microservices • OrderService can still orchestrate
• Needs to watch for the replies explicitly • OrderService will sit idle waiting for those replies OrderService StockService PaymentService DispatchService reserve.stock dispatch.order createOrder() take.payment stock.reservation payment.result

Microservices: Choreography • Asynchronous alternative: Choreography • Services listen for
specific triggering events and take action • Ordering imposed by the event types • Multiple components can respond to the same event, e.g. StockService could respond to a paymentFailure by restoring reserved stock OrderService StockService PaymentService DispatchService order.created payment.result createOrder() stock.reserved

DEMO: Streams

How is Kafka turning your architecture inside-out? •

Fin • Thanks for listening • Questions?

Riviera Dev 2018: Apache Kafka: Turning your ar...

Riviera Dev 2018: Apache Kafka: Turning your architecture inside out

Tom Bentley

Other Decks in Technology

Featured

Transcript

Apache Kafka: Turning your architecture inside out Riviera DEV, 18

Kafka is... • … a distributed, fault tolerant commit log

Records (aka Messages) • Optional Key ◦ Kafka only sees

Topics • Logical grouping of records identified by a name

Producers • Producers publish new records to a topic(s) •

Brokers T: foo P: 1 M: blah Broker T: foo

Consumers M: blah T: foo P: 1 Broker T: foo

Partitions ⇒ Scalability Key insight: If each partition is stored

Consumer groups • Consumers can be different processes on different

Replicas ⇒ Fault tolerance • Partitions replicated on other brokers

Replicas ⇒ Fault tolerance • Partitions replicated on other brokers

Replicas ⇒ Fault tolerance • Partitions replicated on other brokers

Performance • Partitioning & batching, are prominent features • Gain

Balancing • Some partitions cause a lot more load than

DEMO: Producing

Warning: Detour next 2 slides

Core Kafka & Microservices • History included – free audit

Events & Tables Alice 45 Bob 12 Carol 23 Alice

Kafka Streams • Typical Kafka microservices share a lot of

Processor topology • Processors form a directed graph • Processors

Streams and Tables • Kafka Streams has tables too! •

Stream Processors (low-level API) • Low-level API corresponds to a

Streams DSL • Operations on streams: ◦ Stateless: Filter, Map,

“Traditional” Microservices • Synchronous microservices • OrderService orchestrates processing of

Microservices: Orchestration • Asynchronous microservices • OrderService can still orchestrate

Microservices: Choreography • Asynchronous alternative: Choreography • Services listen for

DEMO: Streams

How is Kafka turning your architecture inside-out? •

Fin • Thanks for listening • Questions?