Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cloud Native Apache Kafka with Strimzi on Kubernetes and Openshift

Cloud Native Apache Kafka with Strimzi on Kubernetes and Openshift

Presentation doing a bit Apache Kafka explanation, and than diving into how to run and manage an Apache Kafka cluster, on Openshift and Kubernetes.

The Strimzi Open-Source project implements a Kubernetes Operator, for managing various aspects of Apache Kafka on Openshift and Kubernetes:
* Cluster configuration
* Topic management
* User management

Here is a gist to get started with minikube and Strimzi:
https://gist.github.com/matzew/a5efcaa60eedeb910711becaa1534e01

Matthias Wessendorf

October 11, 2018
Tweet

More Decks by Matthias Wessendorf

Other Decks in Technology

Transcript

  1. What is Apache Kafka? A publish/subscribe messaging system? A streaming

    data platform? A distributed, horizontally-scalable, fault-tolerant, commit log?
  2. Apache Kafka Concepts •Messages are sent to and received from

    a topic • Topics are split into one or more partitions (aka shards) • All actual work is done on partition level, topic is just a virtual object •Each message is written only into a one selected partition • Partitioning is usually done based on the message key • Message ordering within the partition is fixed •Retention • Based on size / message age • Compacted based on message key
  3. Kafka concepts Topics & partitions with Producers old new 0

    1 2 3 4 5 6 7 8 9 1 0 1 1 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 9 1 0 Producer Partition 0 Partition 1 Partition 2
  4. Kafka concepts Topics & partitions with Consumers old new 0

    1 2 3 4 5 6 7 8 9 1 0 1 1 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 9 1 0 Consumer Partition 0 Partition 1 Partition 2
  5. Kafka concepts Consumer Groups Group1 C1 C2 Group2 C3 C4

    C5 C6 Broker 1 T1 - P1 T1 - P2 Broker 2 T1 - P3 T1 - P4 Cluster •Logical grouping of consumers • The group receives the message... • Consumer might have a partition assigned •Separate scaling of groups • Scaling on use-case… ▪ Non-time-sensitive (down) ▪ Time-sensitive (up)
  6. Kafka concepts High availability Broker 1 T1 - P1 T1

    - P2 T2 - P1 T2 - P2 Broker 2 T1 - P1 T1 - P2 T2 - P1 T2 - P2 Broker 3 T1 - P1 T1 - P2 T2 - P1 T2 - P2 Leaders and followers spread across the cluster
  7. Kafka concepts High availability Broker 1 T1 - P1 T1

    - P2 T2 - P1 T2 - P2 Broker 2 T1 - P1 T1 - P2 T2 - P1 T2 - P2 Broker 3 T1 - P1 T1 - P2 T2 - P1 T2 - P2 If a broker with leader partition goes down, a new leader partition is elected on different node
  8. Apache Kafka on Kubernetes & OpenShift The challenges • Apache

    Kafka is *stateful* which means we require … ◦ … a stable broker identity ◦ … a way for the brokers to discover each other on the network ◦ … durable broker state (i.e., the messages) ◦ … the ability to recover broker state after a failure • All the above are true for Apache Zookeeper as well • StatefulSets, PersistentVolumeClaims, Services can help but …
  9. Apache Kafka on Kubernetes & OpenShift What is Strimzi ?

    • Open source project focused on running Apache Kafka on Kubernetes and OpenShift • Provides: ◦ Docker images for running Apache Kafka and Zookeeper ◦ Tooling for managing and configuring Apache Kafka clusters and topics • Follows the Kubernetes “operator” model • Licensed under Apache License 2.0 • Web site: http://strimzi.io/ • GitHub: https://github.com/strimzi • Slack: strimzi.slack.com • Mailing list: [email protected] • Twitter: @strimziio
  10. Strimzi on Kubernetes & OpenShift Goals • Simplifying the Apache

    Kafka deployment on OpenShift/k8s • Using the OpenShift / kube-native mechanisms for... ◦ Provisioning the cluster ◦ Managing the topics • … thereby removing the need to use Kafka command-line tools • Providing a better integration with applications running on OpenShift/k8s ◦ microservices, data streaming, event-sourcing, etc.
  11. Strimzi on Kubernetes & OpenShift The “Operator” model • An

    application used to create, configure and manage other complex applications ◦ Contains specific domain / application knowledge • Controller operates based on input from Config Maps or Custom Resource Definitions ◦ User describes the desired state ◦ Controller applies this state to the application • It watches the *desired* state and the *actual* state … ◦ … taking appropriate actions Observe Analyze Act
  12. Strimzi on Kubernetes & OpenShift Config Map versus Custom Resource

    Definitions • OLD version: used Config Maps for configuration... ◦ Main advantage of Config Maps is no need for special permissions to install Strimzi/AMQ Streams on OpenShift • However,... CRDs have some advantages as well ◦ Flexible data structure ◦ Possibility to set permissions for the CRD resources
  13. Topic Operator Creating and managing Kafka topics Zookeeper (Topic Operator’s

    own storage) Kafka topics Topic Operator (3-way diff) Topic CR
  14. Outlook A rich ecosystem for Apache Kafka • Kafka-CDI /

    reactive Messaging (MP) • Eclipse Vert.x ◦ Reactive wrappers for Apache Kafka client API • Debezium.io: CDC platform ◦ Not just Apache Kafka! Gunnar knows more ◦ Contains KafkaCluster for unit tests • AMQP-Kafka-Bridge (Strimzi) • ...
  15. Resources • Strimzi : http://strimzi.io/ • Apache Kafka : https://kafka.apache.org/

    • Demo : https://github.com/matzew/kafka-presentation