Upgrade to Pro — share decks privately, control downloads, hide ads and more …

2019-04 Kafka on Kubernetes

2019-04 Kafka on Kubernetes

Presented by: Marius Bogoevici
Video recording: https://youtu.be/CXy_T_rWcLE

Event-centric design and event-driven architecture are powerful tools for designing scalable distributed systems, capable of taking advantage of the agility and organizational efficiencies promised by microservices. In this presentation we will show you how to build such an architecture using Kafka and Kubernetes. To build such an architecture, you need a reliable and scalable messaging system (Kafka), a powerful programming model (Spring/Kafka Streams), and a platform where they all can run reliably and resiliently (Kubernetes.)

In this presentation, you will see a demo-centric introduction to how these technologies complement each other and deliver a cohesive solution:

* how to run Kafka on Kubernetes using the Strimzi operator for Kafka;
* how to build microservices using Spring and Kafka Streams;
* how to run bring them all together in complex data processing topologies on Kubernetes.

Toronto Java Users Group

April 25, 2019
Tweet

More Decks by Toronto Java Users Group

Other Decks in Education

Transcript

  1. Event-Driven Microservices with Kafka & Kubernetes Toronto Java User Group

    - April 25, 2019 Marius Bogoevici Principal Specialist Solutions Architect Red Hat [email protected] @mariusbogoevici
  2. Marius Bogoevici • Principal Specialist Solutions Architect at Red Hat

    ◦ Specialize in Integration/Messaging/Data Streaming • OSS contributor since 2008 ◦ Spring Integration ◦ Spring XD, Spring Integration Kafka ◦ Former Spring Cloud Stream project lead • Co-author “Spring Integration in Action”, Manning, 2012
  3. INSERT DESIGNATOR, IF NEEDED 5 Still, exactly why microservices? Fast

    value delivery, meaning ... Fixes New features Experiments Increased confidence
  4. INSERT DESIGNATOR, IF NEEDED 8 Adopting microservices means dealing with

    the inherent complexity of distributed systems
  5. INSERT DESIGNATOR, IF NEEDED 9 Request-reply vs. event-driven communication Synchronous

    & ephemeral Low composability Simplified model Low tolerance to failure Best practices evolved as REST Asynchronous and persistent Decoupled Highly composable Complex model High tolerance to failure Best practices are still evolving
  6. INSERT DESIGNATOR, IF NEEDED 10 Event-driven architecture reduces friction •

    From a technical standpoint: ◦ Building robust and resilient distributed architectures • From a development process standpoint ◦ High composability encourage agility and experimentation • From a business standpoint: ◦ Aligning digital business with the real world
  7. INSERT DESIGNATOR, IF NEEDED 12 Why event-driven microservices? • Asynchronous

    communication patterns ◦ Decoupling: logical, spatial, temporal • Enable eventual consistency across heterogenous resources ◦ Alternative to distributed transactions • Composability via pub-sub integration
  8. INSERT DESIGNATOR, IF NEEDED 13 Key challenges of event-driven microservices

    • Programming model ◦ Requires higher abstractions than vendor-specific producer/consumer APIs ◦ Requires higher level DSLs than simple message handling • Messaging infrastructure ◦ Large number of producers/consumers ◦ Complex interaction patterns - esp for pub-sub • Complex operations ◦ Scaling, elasticity, resiliency, etc.
  9. What is Apache Kafka? A publish/subscribe messaging system A data

    streaming platform A distributed, horizontally-scalable, fault-tolerant, commit log
  10. Traditional Messaging Queue Producer Consumer •Reference count-based message retention model

    • When message is consumed it is deleted from broker •“Smart broker, dumb client” • Broker knows about all consumers • Can perform per consumer filtering
  11. Apache Kafka Kafka Topic Producer Consumer 1 2 3 •Time-based

    message retention model by default • Messages are retained according to topic config (time or capacity) • Also “compacted topic” – like a “last-value topic” •“Dumb broker, smart client” • Client maintains position in message stream • Message stream can be replayed
  12. Kafka Concepts High Availability Broker 1 T1 - P1 T1

    - P2 T2 - P1 T2 - P2 Broker 2 T1 - P1 T1 - P2 T2 - P1 T2 - P2 Broker 3 T1 - P1 T1 - P2 T2 - P1 T2 - P2 Leaders and followers spread across the cluster
  13. Kafka Concepts High Availability If a broker with leader partition

    goes down, a new leader partition is elected on different node Broker 1 T1 - P1 T1 - P2 T2 - P1 T2 - P2 Broker 2 T1 - P1 T1 - P2 T2 - P1 T2 - P2 Broker 3 T1 - P1 T1 - P2 T2 - P1 T2 - P2
  14. Kafka Concepts Clients Interact With Leaders Broker 1 T1 -

    P1 T1 - P2 T2 - P1 T2 - P2 Broker 2 T1 - P1 T1 - P2 T2 - P1 T2 - P2 Broker 3 T1 - P1 T1 - P2 T2 - P1 T2 - P2 Producer P2 Consumer C3 Consumer C1 Producer P1 Consumer C2
  15. Topic 21 Consumer Groups Partitions assignment Partition 0 Partition 1

    Partition 2 Partition 3 Group 1 Consumer Consumer Group 2 Consumer Consumer Consumer
  16. Topic 22 Consumer Groups Rebalancing Partition 0 Partition 1 Partition

    2 Partition 3 Group 1 Consumer Consumer Group 2 Consumer Consumer Consumer
  17. Topic 23 Consumer Groups Max parallelism & Idle consumer Partition

    0 Partition 1 Partition 2 Partition 3 Group 1 Consumer Consumer Consumer Consumer Consumer
  18. INSERT DESIGNATOR, IF NEEDED 24 Traditional messaging Kafka • Advantage

    in: individual message exchanges (transactionality, acknowledgment, error handling/DLQs), P2P/competing consumer support • Strong support for queueing/competing consumers • Publish-subscribe support (with limitations) • No replay support • Advantage in: long-term persistence, replay, semantic partitioning, large publisher/subscriber imbalances, replay and late-coming subscribers • Weak support for individual message acknowledgment, p2p/competing consumers Kafka vs. traditional messaging
  19. INSERT DESIGNATOR, IF NEEDED • Reduce overhead in running services

    • Higher density/utilization gains • Portable across deployment platforms • Rich ecosystem (see Kubernetes!) Containerization
  20. INSERT DESIGNATOR, IF NEEDED • Kubernetes makes running complex topologies

    reliable, transparent and boring • Stateless and stateful workloads ◦ Not only applications, but also messaging infra • In-built resource management ◦ Memory, CPU, disk • Elastic scaling • Monitoring and failover ◦ Health, logging, metrics • Routing and load balancing • Rolling upgrades and CI/CD • Namespacing Kubernetes as a runtime platform
  21. Strimzi: Provisioning Kafka on Kubernetes What is Strimzi ? •

    Open source project focused on running Apache Kafka on Kubernetes and OpenShift • Available as a part of Red Hat AMQ • Licensed under Apache License 2.0 • Web site: http://strimzi.io/ • GitHub: https://github.com/strimzi • Slack: strimzi.slack.com • Mailing list: [email protected] • Twitter: @strimziio
  22. Kafka on Kubernetes ? • As more application workloads move

    to Kubernetes, it makes sense to bring Kafka to the same environment • Serve as the foundation for event-driven microservices • Benefit from Kubernetes core strengths • However Kafka is stateful which requires: • a stable broker identity • a way for the brokers to discover each other on the network • durable broker state (i.e., the messages) • the ability to recover broker state after a failure • Kubernetes primitives help but still not easy
  23. Goals for Strimzi • Simplifying the deployment of Apache Kafka

    on Kubernetes • Using the Kubernetes native mechanisms for... • Provisioning the cluster • Persistence • Ordering and identity • Managing the topics and users • Providing a better integration with applications running on Kubernetes • Microservices, data streaming, event-sourcing, etc.
  24. Stateful Sets and Persistent Volumes • Description: ◦ Provides an

    identity to each pod of the set that corresponds to that pod’s persistent volume(s) ◦ If a StatefulSet pod is lost, a new pod with the same virtual identity is reinstated and the associated storage is reattached • Benefits ◦ Alleviate complex, state-related problems ◦ Automation of manual process ◦ Easy to run stateful applications at scale
  25. The Operator Pattern • An application used to create, configure

    and manage other complex applications ◦ Contains domain-specific domain knowledge • Operator works based on input from Custom Resource Definitions (CRDs) ◦ User describes the desired state ◦ Controller applies this state to the application • It watches the *desired* state and the *actual* state and makes forward progress to reconcile Observe Analyze Act
  26. Strimzi Operators Cluster Operator Kafka CR Kafka Zookeeper Deploys &

    manages cluster Topic Operator User Operator Topic CR User CR Manages topics & users
  27. INSERT DESIGNATOR, IF NEEDED 40 • Client library for stream

    processing ◦ Embed stream processing features into regular Java applications ◦ Create sophisticated topologies of independent applications ◦ One-record-at-a-time processing (no microbatching) • Kafka-to-Kafka semantics ◦ Event/State management coordination ◦ Stateful processing support ◦ Transactions/exactly once Kafka Cluster Application Kafka Streams Kafka Streams Overview Events State
  28. INSERT DESIGNATOR, IF NEEDED 41 Kafka Streams - high level

    functional DSL KStream words = builder.stream(“words”) KTable countsTable = words.flatMapValues(value -> Arrays.asList(value.toLowerCase().split("\\W+"))) .map((key, value) -> new KeyValue<>(value, value)) .groupByKey(Serdes.String(), Serdes.String()) .count(timeWindows, "WordCounts"); KStream counts = counts.toStream() counts.to(“counts”)
  29. INSERT DESIGNATOR, IF NEEDED 42 Key Kafka Streams abstractions •

    KStream ◦ Record stream abstraction ◦ Read from/written to external topic as is • KTable/GlobalKTable ◦ Key/Value map abstraction ◦ Read from/written to topic as a sequence of updates based on record key ◦ Complex operations: joins, aggregations • Stream/Table Duality ◦ KStream -> KTable - read a stream as a changelog centered around the key ◦ KTable -> KStream - table updates are produced as a stream • Time windowing for aggregate operations
  30. Kafka Streams on Kubernetes 43 Kafka Cluster Application Kafka Streams

    Container changelog events Application Kafka Streams Container changelog events Application Kafka Streams Container changelog events
  31. INSERT DESIGNATOR, IF NEEDED 44 Kafka Streams: stateful and stateless

    deployments Kafka Cluster Application Kafka Streams In-memory state store Local disk • Changes propagated to changelog topic • Stored locally for recovery/restart • Fully stateless deployments require to replay the topic on restart/failover • State store recovery can be optimized by providing access to stateful deployments changelog events
  32. INSERT DESIGNATOR, IF NEEDED 45 Kafka Streams with Kubernetes StatefulSets

    Application Kafka Streams Pod Application Kafka Streams Pod Application Kafka Streams Pod volume-word-count-0 word-count-1 word-count-2 volume-word-count-1 volume-word-count-2 word-count-0
  33. INSERT DESIGNATOR, IF NEEDED 46 Back to the future Camel-K

    https://istio.io/ https://github.com/apache/camel-k https://github.com/knative/ QUARKUS https://quarkus.io/ https://debezium.io/