Slide 1

Slide 1 text

Cloud Native Apache Kafka with Strimzi.io RH_PREStemp_light_v2_0816 1 Matthias Wessendorf Principal Software Engineer @mwessendorf

Slide 2

Slide 2 text

What is Apache Kafka? A publish/subscribe messaging system? A streaming data platform? A distributed, horizontally-scalable, fault-tolerant, commit log?

Slide 3

Slide 3 text

DEMO : WebSocket to Kafka

Slide 4

Slide 4 text

Apache Kafka Concepts •Messages are sent to and received from a topic • Topics are split into one or more partitions (aka shards) • All actual work is done on partition level, topic is just a virtual object •Each message is written only into a one selected partition • Partitioning is usually done based on the message key • Message ordering within the partition is fixed •Retention • Based on size / message age • Compacted based on message key

Slide 5

Slide 5 text

Kafka concepts Topics & partitions with Producers old new 0 1 2 3 4 5 6 7 8 9 1 0 1 1 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 9 1 0 Producer Partition 0 Partition 1 Partition 2

Slide 6

Slide 6 text

Kafka concepts Topics & partitions with Consumers old new 0 1 2 3 4 5 6 7 8 9 1 0 1 1 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 9 1 0 Consumer Partition 0 Partition 1 Partition 2

Slide 7

Slide 7 text

Kafka concepts Consumer Groups Group1 C1 C2 Group2 C3 C4 C5 C6 Broker 1 T1 - P1 T1 - P2 Broker 2 T1 - P3 T1 - P4 Cluster •Logical grouping of consumers • The group receives the message... • Consumer might have a partition assigned •Separate scaling of groups • Scaling on use-case… ■ Non-time-sensitive (down) ■ Time-sensitive (up)

Slide 8

Slide 8 text

Kafka concepts High availability Broker 1 T1 - P1 T1 - P2 T2 - P1 T2 - P2 Broker 2 T1 - P1 T1 - P2 T2 - P1 T2 - P2 Broker 3 T1 - P1 T1 - P2 T2 - P1 T2 - P2 Leaders and followers spread across the cluster

Slide 9

Slide 9 text

Kafka concepts High availability Broker 1 T1 - P1 T1 - P2 T2 - P1 T2 - P2 Broker 2 T1 - P1 T1 - P2 T2 - P1 T2 - P2 Broker 3 T1 - P1 T1 - P2 T2 - P1 T2 - P2 If a broker with leader partition goes down, a new leader partition is elected on different node

Slide 10

Slide 10 text

DEMO : WebSocket behind the scenes The JAVA API

Slide 11

Slide 11 text

Apache Kafka on Kubernetes & OpenShift The challenges ● Apache Kafka is *stateful* which means we require … ○ … a stable broker identity ○ … a way for the brokers to discover each other on the network ○ … durable broker state (i.e., the messages) ○ … the ability to recover broker state after a failure ● All the above are true for Apache Zookeeper as well ● StatefulSets, PersistentVolumeClaims, Services can help but …

Slide 12

Slide 12 text

It’s not easy!

Slide 13

Slide 13 text

Apache Kafka on Kubernetes & OpenShift What is Strimzi ? ● Open source project focused on running Apache Kafka on Kubernetes and OpenShift ● Provides: ○ Docker images for running Apache Kafka and Zookeeper ○ Tooling for managing and configuring Apache Kafka clusters and topics ● Follows the Kubernetes “operator” model ● Licensed under Apache License 2.0 ● Web site: http://strimzi.io/ ● GitHub: https://github.com/strimzi ● Slack: strimzi.slack.com ● Mailing list: [email protected] ● Twitter: @strimziio

Slide 14

Slide 14 text

Strimzi on Kubernetes & OpenShift Goals ● Simplifying the Apache Kafka deployment on OpenShift/k8s ● Using the OpenShift / kube-native mechanisms for... ○ Provisioning the cluster ○ Managing the topics ● … thereby removing the need to use Kafka command-line tools ● Providing a better integration with applications running on OpenShift/k8s ○ microservices, data streaming, event-sourcing, etc.

Slide 15

Slide 15 text

Strimzi on Kubernetes & OpenShift The “Operator” model ● An application used to create, configure and manage other complex applications ○ Contains specific domain / application knowledge ● Controller operates based on input from Config Maps or Custom Resource Definitions ○ User describes the desired state ○ Controller applies this state to the application ● It watches the *desired* state and the *actual* state … ○ … taking appropriate actions Observe Analyze Act

Slide 16

Slide 16 text

Strimzi on Kubernetes & OpenShift Config Map versus Custom Resource Definitions ● OLD version: used Config Maps for configuration... ○ Main advantage of Config Maps is no need for special permissions to install Strimzi/AMQ Streams on OpenShift ● However,... CRDs have some advantages as well ○ Flexible data structure ○ Possibility to set permissions for the CRD resources

Slide 17

Slide 17 text

Cluster Operator Creating and managing Apache Kafka clusters Zookeeper Kafka Cluster Operator Cluster CR Manages

Slide 18

Slide 18 text

DEMO : CLUSTER DEPLOYMENT (using minikube)

Slide 19

Slide 19 text

Topic Operator Zookeeper Kafka Topic Operator Topic CR Manages Topics Creating and managing Kafka topics

Slide 20

Slide 20 text

Topic Operator Creating and managing Kafka topics Zookeeper (Topic Operator’s own storage) Kafka topics Topic Operator (3-way diff) Topic CR

Slide 21

Slide 21 text

DEMO : TOPICS MANAGEMENT

Slide 22

Slide 22 text

Outlook A rich ecosystem for Apache Kafka ● Kafka-CDI / reactive Messaging (MP) ● Eclipse Vert.x ○ Reactive wrappers for Apache Kafka client API ● Debezium.io: CDC platform ○ Not just Apache Kafka! Gunnar knows more ○ Contains KafkaCluster for unit tests ● AMQP-Kafka-Bridge (Strimzi) ● ...

Slide 23

Slide 23 text

Resources ● Strimzi : http://strimzi.io/ ● Apache Kafka : https://kafka.apache.org/ ● Demo : https://github.com/matzew/kafka-presentation

Slide 24

Slide 24 text

THANK YOU plus.google.com/+RedHat linkedin.com/company/red-hat youtube.com/user/RedHatVideos facebook.com/redhatinc twitter.com/RedHatNews 24