$30 off During Our Annual Pro Sale. View Details »

Riviera Dev 2018: Apache Kafka: Turning your architecture inside out

Riviera Dev 2018: Apache Kafka: Turning your architecture inside out

Tom Bentley

May 18, 2018
Tweet

Other Decks in Technology

Transcript

  1. Apache Kafka: Turning your architecture
    inside out
    Riviera DEV, 18 May 2018
    Tom Bentley

    View Slide

  2. Kafka is...
    ● … a distributed, fault tolerant
    commit log
    ● … a horizontally scalable
    publish/subscribe message broker
    ● … a message streaming platform
    broker
    broker
    broker
    producer
    consumer
    consumer

    View Slide

  3. Records (aka Messages)
    ● Optional Key
    ○ Kafka only sees the key as uninterpreted bytes
    ● Value
    ● (Headers, since v0.11.0.0)
    ● Kafka doesn’t care about the content of value or headers
    ● Timestamp
    Record
    key(): K
    value(): V
    headers(): Map
    timestamp(): Long
    Conceptually:

    View Slide

  4. Topics
    ● Logical grouping of records identified by a name
    ○ In reality the partitions are not stored together
    ● Sharded into n partitions, 0...n-1
    Record
    Record
    Record
    Record
    Topic: customers.new
    Partition: 0
    Partition: 1

    View Slide

  5. Producers
    ● Producers publish new records to a topic(s)
    ● Producer decides which partition a record belongs in
    ○ Semantic partitioning
    ○ Else, if there is a key: hash(key) mod #partitions
    ○ Otherwise: round-robin
    Record
    Record
    Record
    Record
    Producer

    View Slide

  6. Brokers
    T: foo
    P: 1
    M: blah
    Broker
    T: foo
    P: 1
    T: bar
    P: 2
    ● Producer sends record to the leader broker for the partition
    ⇒ So records in different partitions get sent to different brokers
    ● Broker has an append-only log of records for individual partitions
    ● Once appended, records can be identified by the offset within the partition
    ● Records retained according to a policy:
    ○ deleted: according to size or time-based threshold
    ○ compacted: until a new message with the same key arrives

    View Slide

  7. Consumers
    M: blah
    T: foo
    P: 1
    Broker
    T: foo
    P: 1
    Consumer
    ● Consumers fetch records from the leader for a partition
    ⇒ Consuming all partitions in a topic means connections to many brokers
    ● Consumers address records they’re reading by the offset
    ○ Can re-read by seeking to a previous offset
    ○ Messages can be skipped
    ● Message order preserved for a given partition

    View Slide

  8. Partitions ⇒ Scalability
    Key insight:
    If each partition is stored on a different broker then the load for producing
    and consuming the topic is spread across those brokers
    ● So can scale up throughput by having a larger number of partitions and/or
    brokers

    View Slide

  9. Consumer groups
    ● Consumers can be different processes on
    different machines
    ● Consumers in same consumer group
    discover each other via Kafka protocol
    ● A group leader is elected
    ● Leader assigns partitions to consumers
    ● Membership changes ⇒ reassignment
    ● Leader dies ⇒ Another election
    ● Makes it very easy to scale up
    consumption
    T: bar
    P: 0
    T: bar
    P: 1
    T: bar
    P: 2
    Consumer
    cg: xyz
    Consumer
    cg: xyz

    View Slide

  10. Replicas ⇒ Fault tolerance
    ● Partitions replicated on other brokers
    ● Follower broker for a replica fetches from leader broker
    ● If leader crashes, one of the followers is elected new leader
    ○ Producers and consumers and other followers produce/fetch from new
    leader
    ● When old leader restarts it will be a follower
    Broker 1
    T: foo
    P: 1
    T: bar
    P: 2
    Broker 2
    T: foo
    P: 1
    T: bar
    P: 2
    Leaders
    Followers

    View Slide

  11. Replicas ⇒ Fault tolerance
    ● Partitions replicated on other brokers
    ● Follower broker for a replica fetches from leader broker
    ● If leader crashes, one of the followers is elected new leader
    ○ Producers and consumers and other followers produce/fetch from new
    leader
    ● When old leader restarts it will be a follower
    Broker 1
    T: foo
    P: 1
    T: bar
    P: 2
    Broker 2
    T: foo
    P: 1
    T: bar
    P: 2
    Broker 2 fetches P:1 Broker 1 fetches P: 2

    View Slide

  12. Replicas ⇒ Fault tolerance
    ● Partitions replicated on other brokers
    ● Follower broker for a replica fetches from leader broker
    ● If leader crashes, one of the followers is elected new leader
    ○ Producers and consumers and other followers produce/fetch from new
    leader
    ● When old leader restarts it will be a follower
    Broker 1
    T: foo
    P: 1
    T: bar
    P: 2
    Broker 2
    T: foo
    P: 1
    T: bar
    P: 2

    View Slide

  13. Performance
    ● Partitioning & batching, are prominent features
    ● Gain scalability by making clients aware of cluster topology
    ● Clients need to talk to leader broker ⇒ must be able to talk to all brokers
    ● Clients know identity of brokers
    ● Can’t hide brokers behind a load balancer

    View Slide

  14. Balancing
    ● Some partitions cause a lot more load than others
    ● We want to avoid having any saturated brokers
    ● ⇒ Need to spread the hot partitions around
    ● Reassigning partitions between brokers can be slow
    ● Constrained optimization problem (Bin packing)
    ● Automated solutions

    View Slide

  15. DEMO: Producing

    View Slide

  16. Warning: Detour next 2 slides

    View Slide

  17. Core Kafka & Microservices
    ● History included – free audit log
    ● Loosely coupled – sender needs no knowledge of receiver(s)
    ● Availability – Sender doesn’t require receiver to be available
    ● Immutable log ⇒ less need to encapsulate access to the data
    ○ emphasis more about sharing the data
    ○ the data is more important than the API used to access it

    View Slide

  18. Events & Tables
    Alice 45
    Bob 12
    Carol 23
    Alice 32
    Carol 19
    Alice 32
    Bob 12
    Carol 19
    Snapshot!
    Time
    A table is a snapshot of a stream
    Alice 45
    Bob 12
    Carol 23
    Update Alice set score=32
    Update Carol set score=19
    Alice 45
    Bob 12
    Carol 23
    Alice 32
    Carol 19
    A stream is a changelog of a table

    View Slide

  19. Kafka Streams
    ● Typical Kafka microservices share a lot of common code
    ● Kafka Streams is a framework for writing applications
    ● Just a jar file, runs in your application
    ● Leverages consumer groups scaling so it’s easy to horizontally scale your
    application
    ● Presents a higher level API using “Streams” rather than (lowlevel) Topics
    ● Perform operations on whole streams rather than individual records
    ○ E.g. filter, map
    ● Applications are written by composing such operations
    ● The composition graph is called the “processor topology”

    View Slide

  20. Processor topology
    ● Processors form a directed graph
    ● Processors as nodes
    ● Incoming edges are the operand streams
    ● Outgoing edges are the result stream(s)
    ● Source processors create a stream from a
    Kafka topic or other source
    ● Sink processors are the output of the
    Streams application and produce a Kafka
    topic
    Source Processors
    Sink processor

    View Slide

  21. Streams and Tables
    ● Kafka Streams has tables too!
    ● Stateless operations result in Streams
    ● Stateful operations can result in tables
    ○ Aggregation, Join, Windowing
    ● Can always turn a Table back into a Stream
    ● Tables can be interactively queried

    View Slide

  22. Stream Processors (low-level API)
    ● Low-level API corresponds to a very generic, possibly stateful, processor
    within the topology
    ● Writing your own processor ⇒ creating a custom operator in the high level
    API (DSL)
    ● Kafka Streams uses in-memory and RocksDB state stores to implement the
    higher level operations
    ● Custom state stores are also possible

    View Slide

  23. Streams DSL
    ● Operations on streams:
    ○ Stateless: Filter, Map,
    GroupBy etc
    ○ Stateful: Aggregation,
    Join, etc
    ● Applications can compose
    operations to perform
    computation
    Diagram credit: Kafka docs

    View Slide

  24. “Traditional” Microservices
    ● Synchronous microservices
    ● OrderService orchestrates processing of an order: reserveStock(),
    then takePayment(), then dispatchOrder()
    ● Exceptional flows for things like payment failure => unreserveStock()
    OrderService StockService PaymentService DispatchService
    reserveStock()
    dispatchOrder()
    createOrder()
    takePayment()

    View Slide

  25. Microservices: Orchestration
    ● Asynchronous microservices
    ● OrderService can still orchestrate
    ● Needs to watch for the replies explicitly
    ● OrderService will sit idle waiting for those replies
    OrderService StockService PaymentService DispatchService
    reserve.stock
    dispatch.order
    createOrder()
    take.payment
    stock.reservation
    payment.result

    View Slide

  26. Microservices: Choreography
    ● Asynchronous alternative: Choreography
    ● Services listen for specific triggering events and take action
    ● Ordering imposed by the event types
    ● Multiple components can respond to the same event, e.g. StockService
    could respond to a paymentFailure by restoring reserved stock
    OrderService StockService PaymentService DispatchService
    order.created
    payment.result
    createOrder()
    stock.reserved

    View Slide

  27. DEMO: Streams

    View Slide

  28. How is Kafka turning your architecture inside-out?

    View Slide

  29. Fin
    ● Thanks for listening
    ● Questions?

    View Slide