Slide 1

Slide 1 text

1 Distributed streaming made easy in the cloud Distributed Streaming with Apache Kafka in Kubernetes Aykut M. Bulgu - @systemcraftsman Middleware Consultant - Red Hat Distributed Streaming with Apache Kafka in Kubernetes

Slide 2

Slide 2 text

2 Who am I? #oc apply -f aykutbulgu.yaml apiVersion: redhat/v1.1 kind: Middleware Consultant metadata: name: Aykut Bulgu namespace: Red Hat Consulting EMEA Annotations: twitter: @systemcraftsman organizer: Software Craftsmanship Turkey founder: System Craftsman labels: married: yes children: [daughter] interests: openshift, kubernetes, spring boot, middleware, infinispan, kafka, strimzi spec: replicas: 1 containers: - image: aykut:latest @systemcraftsman

Slide 3

Slide 3 text

3 Messaging Types & Kafka Some Challenges Operators Demo Kubernetes Strimzi Accessing Kafka What we’ll be discussing today @systemcraftsman

Slide 4

Slide 4 text

Chapter One: Messaging 4

Slide 5

Slide 5 text

5 Messaging Messaging ≠ ≠ Messaging Low-latency pub/sub Cross-cloud backbone Temporal decoupling Load levelling Load balancing Enterprise application integration IoT device connectivity Message-driv en beans Event-driven microservices Long-term message storage Replayable streams Event sourcing Geo-aware routing Database change data capture @systemcraftsman

Slide 6

Slide 6 text

Messaging Technologies 6 @systemcraftsman

Slide 7

Slide 7 text

What is Apache Kafka? 7 A publish/subscribe messaging system. A data streaming platform A distributed, horizontally-scalable, fault-tolerant, commit log @systemcraftsman

Slide 8

Slide 8 text

8 Developed at LinkedIn back in 2010, open sourced in 2011 Distributed by design High throughput Designed to be fast, scalable, durable and highly available Data partitioning (sharding) Ability to handle huge number of consumers What is Apache Kafka? @systemcraftsman

Slide 9

Slide 9 text

Traditional Messaging 9 Queue Producer Consumer 1 2 3 Reference count-based message retention model When message is consumed it is deleted from broker “Smart broker, dumb client” Broker knows about all consumers Can perform per consumer filtering @systemcraftsman

Slide 10

Slide 10 text

Apache Kafka 10 Kafka Topic Producer Consumer 1 2 3 1 2 3 Time-based message retention model by default Messages are retained according to topic config (time or capacity) Also “compacted topic” – like a “last-value topic” “Dumb broker, smart client” Client maintains position in message stream Message stream can be replayed @systemcraftsman

Slide 11

Slide 11 text

Kafka Concepts - Producers 11 old new 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 Partition 0 Partition 1 Partition 2 Producer Topic @systemcraftsman

Slide 12

Slide 12 text

Kafka Concepts - Consumers 12 old new 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 Partition 0 Partition 1 Partition 2 Consumer Topic @systemcraftsman

Slide 13

Slide 13 text

Kafka Concepts - High Availability 13 Broker 1 T1 - P1 T1 - P2 T2 - P1 T2 - P2 Broker 2 T1 - P1 T1 - P2 T2 - P1 T2 - P2 Broker 3 T1 - P1 T1 - P2 T2 - P1 T2 - P2 Leaders and followers spread across the cluster @systemcraftsman

Slide 14

Slide 14 text

Kafka Concepts - High Availability 14 If a broker with leader partition goes down, a new leader partition is elected on different node Broker 1 T1 - P1 T1 - P2 T2 - P1 T2 - P2 Broker 2 T1 - P1 T1 - P2 T2 - P1 T2 - P2 Broker 3 T1 - P1 T1 - P2 T2 - P1 T2 - P2 @systemcraftsman

Slide 15

Slide 15 text

Kafka Concepts - Interaction with Leaders 15 Broker 1 T1 - P1 T1 - P2 T2 - P1 T2 - P2 Broker 2 T1 - P1 T1 - P2 T2 - P1 T2 - P2 Broker 3 T1 - P1 T1 - P2 T2 - P1 T2 - P2 Producer P2 Consumer C3 Consumer C1 Producer P1 Consumer C2 @systemcraftsman

Slide 16

Slide 16 text

Kafka Use Cases 16 Messaging Replacement of traditional message broker High scale, high throughput, built-in partitioning, replication, and fault-tolerance. Some limitations compared to traditional broker (filtering, standard protocols, JMS …) Website Activity Tracker Rebuild user activity tracking pipeline as a set of real-time publish-subscribe feeds. Activity is published to central topics with one topic per activity type Metrics Aggregation of statistics from distributed applications to produce centralized feeds of operational data. Log Aggregation Abstracts details of files an gives event data as stream of messages. Offers good performance, stronger durability guarantees due to replication. Stream Processing Enables continuous, real-time applications built to react to, process, or transform streams. Data Integration Captures streams of events or data changes and feed these to other data systems (see Debezium project). @systemcraftsman

Slide 17

Slide 17 text

Chapter Two: Kubernetes 17

Slide 18

Slide 18 text

18 @systemcraftsman Kubernetes is an open-source system for automating deployment, operations, and scaling of containerized applications across multiple hosts kubernetes

Slide 19

Slide 19 text

19 Comes from Google experience with project “Borg” On the nodes a set of different “resources” can be deployed and handled Abstract the underlying hardware in terms of “nodes” Containerized applications are deployed, using and sharing “resources” Kubernetes kubernetes @systemcraftsman

Slide 20

Slide 20 text

20 Security Control who can do what Scaling Scale containers up and down Persistence Survive data beyond container lifecycle Aggregation Compose apps from multiple containers kubernetes Scheduling Decide where to deploy containers Lifecycle and health Keep containers running despite failures Discovery Find other containers on the network Monitoring Visibility into running containers @systemcraftsman

Slide 21

Slide 21 text

21 An open-source Enterprise Kubernetes platform based on Docker and Kubernetes for building, distributing and running containers at scale @systemcraftsman

Slide 22

Slide 22 text

22 Routing & Load Balancing Multi-tenancy CI/CD Pipelines Role-based Authorization Capacity Management Infrastructure Visibility Chargeback Vulnerability Scanning Container Isolation Image Build Automation Quota Management Teams and Collaboration @systemcraftsman

Slide 23

Slide 23 text

23 kubernetes Now, we know about We know about There are some challenges... How these two can work together? @systemcraftsman

Slide 24

Slide 24 text

Chapter Three: Challenges 24

Slide 25

Slide 25 text

Challenges 25 A Kafka cluster requires; A stable broker identity and stable network address A way for brokers to discover each other and communicate Durable state on brokers and storage recovery To have brokers accessible from clients, directly It runs alongside a Zookeeper ensemble which requires; Each node has the configuration of the others To have nodes able to communicate each others Accessing Kafka isn’t so simple @systemcraftsman

Slide 26

Slide 26 text

How Kubernetes Can Help 26 Kubernetes provides; StatefulSets for stable identity and network Together with Headless services for internal discovery Services for accessing the cluster Secrets and ConfigMap for handling configurations PersistentVolume and PersistentVolumeClaim for durable storage Kubernetes primitives help but still not easy It is still hard to deploy and manage Kafka on Kubernetes... @systemcraftsman

Slide 27

Slide 27 text

Operator Framework 27 An application used to create, configure and manage other complex applications Contains domain-specific domain knowledge Operator works based on input from Custom Resource Definitions (CRDs) User describes the desired state Controller applies this state to the application It watches the *desired* state and the *actual* state and makes forward progress to reconcile OperatorHub.io Observe Analyze Act @systemcraftsman

Slide 28

Slide 28 text

Chapter Four: Strimzi 28 Operators Accessing Kafka

Slide 29

Slide 29 text

Operators 29

Slide 30

Slide 30 text

Strimzi Project 30 @systemcraftsman

Slide 31

Slide 31 text

Strimzi - The open-source Apache Kafka Operator 31 Open source project licensed under Apache License 2.0 Focuses on running Apache Kafka on Kubernetes and OpenShift: Container images for Apache Kafka and Apache Zookeeper Operators for managing and configuring Kafka clusters, topics or users Provides Kubernetes-native experience for running Kafka on Kubernetes and OpenShift Kafka cluster, topic and user as Kubernetes custom resources @systemcraftsman

Slide 32

Slide 32 text

Red Hat AMQ Streams - Apache Kafka for the Enterprise 32 Part of the Red Hat AMQ suite AMQ Streams on OCP Running Apache Kafka on OpenShift Container Platform Based on the Strimzi project AMQ Streams on RHEL Running Apache Kafka on “bare metal” @systemcraftsman

Slide 33

Slide 33 text

Strimzi Operators 33 Cluster Operator Kafka CR Kafka Zookeeper Deploys & manages cluster Topic Operator User Operator Topic CR User CR Manages topics & users @systemcraftsman

Slide 34

Slide 34 text

Cluster Operator 34 Responsible for deploying and managing clusters Kafka, Kafka Connect, Zookeeper Also deploys other operators Topic Operator, User Operator The only component which the user has to install on his own Uses CRDs as blueprints for the clusters it deploys and manages CRDs act as extensions to the Kubernetes API Can be used similarly to native resources … oc get kafkas or kubectl get kafkas @systemcraftsman

Slide 35

Slide 35 text

Cluster Operator 35 @systemcraftsman Installation Runs as a Deployment inside Kubernetes Configuration options are passed as environment variables Installation Requirements Service Account RBAC resources CRD definitions Should always run as a single replica

Slide 36

Slide 36 text

Cluster Operator 36 Deploying Kafka Using a Kafka CR Configures Kafka, Zookeeper and the other operators Minimal options: Number of replicas Storage Listeners Other options available as well apiVersion: kafka.strimzi.io/v1alpha1 kind: Kafka metadata: name: my-cluster spec: kafka: replicas: 3 listeners: plain: {} tls: {} storage: type: persistent-claim size: 1Gi zookeeper: replicas: 3 storage: type: persistent-claim size: 1Gi topicOperator: { } @systemcraftsman

Slide 37

Slide 37 text

Topic Operator 37 Manages Kafka topics Bi-directional synchronization and 3-way diff Using CRDs Users can just do … oc get kafkatopics or kubectl get kafkatopics Installation One Topic Operator per Kafka cluster Users are expected to install Topic Operator through Cluster Operator Standalone installation is available and supported @systemcraftsman

Slide 38

Slide 38 text

Topic Operator 38 Managing Kafka topics Using a KafkaTopic CR Label defining Kafka Cluster Minimal options: Number of partitions Replication factor apiVersion: kafka.strimzi.io/v1alpha1 kind: KafkaTopic metadata: name: my-topic labels: strimzi.io/cluster: my-cluster spec: partitions: 1 replicas: 1 config: retention.ms: 7200000 segment.bytes: 1073741824 @systemcraftsman

Slide 39

Slide 39 text

User Operator 39 Manages authentication and authorization Using CRDs Users can just do … oc get kafkausers or kubectl get kafkausers Installation One User Operator per Kafka cluster Users are expected to install User Operator through Cluster Operator Standalone installation is available and supported @systemcraftsman

Slide 40

Slide 40 text

User Operator 40 Authentication Currently supports TLS Client Authentication and SASL SCRAM-SHA-512 The KafkaUser CR requests TLS Client Authentication The User Operator will issue TLS certificate and stores it in Secret Authorization Currently supports Kafka’s built-in SimpleAclAuthorizer The KafkaUser CR lists the desired ACL rights The User Operator will update them in Zookeeper @systemcraftsman

Slide 41

Slide 41 text

User Operator 41 Managing Kafka users Using a KafkaUser CR Label defining Kafka Cluster Authentication configuration Authorization configuration apiVersion: kafka.strimzi.io/v1alpha1 kind: KafkaUser metadata: name: my-user labels: strimzi.io/cluster: my-cluster spec: authentication: type: tls authorization: type: simple acls: - resource: type: topic name: my-topic patternType: literal operation: Read host: "*" - resource: # ... @systemcraftsman

Slide 42

Slide 42 text

Accessing Kafka 42

Slide 43

Slide 43 text

Kafka’s Discovery Protocol 43 Broker 1 T1 - P1 T1 - P2 T2 - P1 T2 - P2 Broker 2 T1 - P1 T1 - P2 T2 - P1 T2 - P2 Broker 3 T1 - P1 T1 - P2 T2 - P1 T2 - P2 Producer P2 Consumer C3 Consumer C1 Producer P1 Consumer C2 @systemcraftsman

Slide 44

Slide 44 text

Kafka’s Discovery Protocol 44 @systemcraftsman

Slide 45

Slide 45 text

Kubernetes Cluster Internal Access 45 @systemcraftsman

Slide 46

Slide 46 text

Kubernetes Cluster External Access 46 @systemcraftsman

Slide 47

Slide 47 text

Features 47 Tolerations Memory and CPU resources High Availability Mirroring Affinity Authentication Storage Encryption Scale Down JVM Configuration Logging Metrics Off cluster access Scale Up Authorization Healthchecks Source2Image Configuration @systemcraftsman

Slide 48

Slide 48 text

Chapter Five: Demo 48

Slide 49

Slide 49 text

Resources 49 Strimzi : https://strimzi.io/ OperatorHub.io : https://www.operatorhub.io/ Apache Kafka : https://kafka.apache.org/ Kubernetes : https://kubernetes.io/ OpenShift : https://www.openshift.com/ Operator framework : https://github.com/operator-framework Demo : https://github.com/systemcraftsman/strimzi-demo @systemcraftsman

Slide 50

Slide 50 text

Thank You 50 @systemcraftsman aykut@systemcraftsman.com aykut@redhat.com