with Apache Kafka in Kubernetes Aykut M. Bulgu - @systemcraftsman Middleware Consultant - Red Hat Distributed Streaming with Apache Kafka in Kubernetes
2011 Distributed by design High throughput Designed to be fast, scalable, durable and highly available Data partitioning (sharding) Ability to handle huge number of consumers What is Apache Kafka? @systemcraftsman
count-based message retention model When message is consumed it is deleted from broker “Smart broker, dumb client” Broker knows about all consumers Can perform per consumer filtering @systemcraftsman
1 2 3 Time-based message retention model by default Messages are retained according to topic config (time or capacity) Also “compacted topic” – like a “last-value topic” “Dumb broker, smart client” Client maintains position in message stream Message stream can be replayed @systemcraftsman
High scale, high throughput, built-in partitioning, replication, and fault-tolerance. Some limitations compared to traditional broker (filtering, standard protocols, JMS …) Website Activity Tracker Rebuild user activity tracking pipeline as a set of real-time publish-subscribe feeds. Activity is published to central topics with one topic per activity type Metrics Aggregation of statistics from distributed applications to produce centralized feeds of operational data. Log Aggregation Abstracts details of files an gives event data as stream of messages. Offers good performance, stronger durability guarantees due to replication. Stream Processing Enables continuous, real-time applications built to react to, process, or transform streams. Data Integration Captures streams of events or data changes and feed these to other data systems (see Debezium project). @systemcraftsman
nodes a set of different “resources” can be deployed and handled Abstract the underlying hardware in terms of “nodes” Containerized applications are deployed, using and sharing “resources” Kubernetes kubernetes @systemcraftsman
up and down Persistence Survive data beyond container lifecycle Aggregation Compose apps from multiple containers kubernetes Scheduling Decide where to deploy containers Lifecycle and health Keep containers running despite failures Discovery Find other containers on the network Monitoring Visibility into running containers @systemcraftsman
and stable network address A way for brokers to discover each other and communicate Durable state on brokers and storage recovery To have brokers accessible from clients, directly It runs alongside a Zookeeper ensemble which requires; Each node has the configuration of the others To have nodes able to communicate each others Accessing Kafka isn’t so simple @systemcraftsman
identity and network Together with Headless services for internal discovery Services for accessing the cluster Secrets and ConfigMap for handling configurations PersistentVolume and PersistentVolumeClaim for durable storage Kubernetes primitives help but still not easy It is still hard to deploy and manage Kafka on Kubernetes... @systemcraftsman
manage other complex applications Contains domain-specific domain knowledge Operator works based on input from Custom Resource Definitions (CRDs) User describes the desired state Controller applies this state to the application It watches the *desired* state and the *actual* state and makes forward progress to reconcile OperatorHub.io Observe Analyze Act @systemcraftsman
project licensed under Apache License 2.0 Focuses on running Apache Kafka on Kubernetes and OpenShift: Container images for Apache Kafka and Apache Zookeeper Operators for managing and configuring Kafka clusters, topics or users Provides Kubernetes-native experience for running Kafka on Kubernetes and OpenShift Kafka cluster, topic and user as Kubernetes custom resources @systemcraftsman
32 Part of the Red Hat AMQ suite AMQ Streams on OCP Running Apache Kafka on OpenShift Container Platform Based on the Strimzi project AMQ Streams on RHEL Running Apache Kafka on “bare metal” @systemcraftsman
Kafka Connect, Zookeeper Also deploys other operators Topic Operator, User Operator The only component which the user has to install on his own Uses CRDs as blueprints for the clusters it deploys and manages CRDs act as extensions to the Kubernetes API Can be used similarly to native resources … oc get kafkas or kubectl get kafkas @systemcraftsman
Kubernetes Configuration options are passed as environment variables Installation Requirements Service Account RBAC resources CRD definitions Should always run as a single replica
diff Using CRDs Users can just do … oc get kafkatopics or kubectl get kafkatopics Installation One Topic Operator per Kafka cluster Users are expected to install Topic Operator through Cluster Operator Standalone installation is available and supported @systemcraftsman
can just do … oc get kafkausers or kubectl get kafkausers Installation One User Operator per Kafka cluster Users are expected to install User Operator through Cluster Operator Standalone installation is available and supported @systemcraftsman
SASL SCRAM-SHA-512 The KafkaUser CR requests TLS Client Authentication The User Operator will issue TLS certificate and stores it in Secret Authorization Currently supports Kafka’s built-in SimpleAclAuthorizer The KafkaUser CR lists the desired ACL rights The User Operator will update them in Zookeeper @systemcraftsman