Slide 1

Slide 1 text

Kafka on Kubernetes with Strimzi Ken (Kenju Wagatsuma) Senior SRE September 2021

Slide 2

Slide 2 text

Background / History

Slide 3

Slide 3 text

[May 2018~] Confluent Cloud Introduced Personal Feed ● Replaced MySQL table with Apache Kafka ● Started using Confluent Cloud (Managed Service) ● Easy to setup, Easy to maintain

Slide 4

Slide 4 text

[Jan 2021~] More Focus on Feed Business features will depend much more on Feed ● Resiliency+ ● Performance+ ● Flexibility+

Slide 5

Slide 5 text

[Jan 2021~] Confluent Cloud Deprecation "Action required: Migrate your Standard Legacy cluster(s) prior to May 31st, 2021” from Confluent Cloud team ● Hard deadline for migrating the old cluster to somewhere else ● Downtime is inevitable anyway ● Pricing plan will be changed

Slide 6

Slide 6 text

Our Experiences with Confluent Cloud Kafka Cluster is a “blackbox” ● No access to brokers metrics/configuration - lack of observability ● Cluster upgrade just happens - less flexibility on versioning ● Some Limitations, or pay for Enterprise Plan

Slide 7

Slide 7 text

Solution

Slide 8

Slide 8 text

Self-Hosting Kafka on Kubernetes (AWS EKS)

Slide 9

Slide 9 text

Self-Hosting Kafka on Kubernetes Why do we need Self-Hosting? ● More flexibility and observability are achievable ● Theoretically no limitation ● Scalability with more optimized solutions

Slide 10

Slide 10 text

Self-Hosting Kafka on Kubernetes Why on Kubernetes? ● Community is growing, Best practices are widespreading ● Kubernetes professionals at Cookpad Global ● Ecosystem has a lot of library options

Slide 11

Slide 11 text

Kubernetes Operator … is an architectural pattern for deploying and operating applications ● Writing a full YAML file for all Kafka components is hard ○ Kafka/Zookeeper/Connect/MirrorMaker/Bridge/RBAC/... ● Application-specific requirements are complex ○ Cluster configuration ○ Networking among processes ○ Distributing mTLS and client certificates

Slide 12

Slide 12 text

Kubernetes Operator for Kafka ● Strimzi Operator - OSS, Redhat, CNCF Sandbox ● Confluent Operator - Confluent Cloud ● KOperator - BANZAI Cloud ● KUDO Operator - KUDO ● Writing Your Own Kafka Operator - Cookpad ● ...

Slide 13

Slide 13 text

Strimzi Operator … is OSS-driven Production-Ready Kafka Operator ● CNCF Sandox Project since Aug. 2019 ● Core Contributors from Red Hat ● Ecosystem & Community are growing ● Strimzi Survey 2020: 40% of 40+ users are using it in production

Slide 14

Slide 14 text

3 Strimzi Operator ● Cluster Operator ● User Operator ● Topic Operator Kafka Components ● Kafka Brokers ● Zookeeper Nodes

Slide 15

Slide 15 text

Strimzi Operator is Production-Ready … supports ● MirrorMaker 2.0/Kafka Connect/Kafka Bridge ● Obserability (JMX Exporter integration) ● Rolling Update Support ● Automatic Disk Resizing Support ● mTLS ● RBAC

Slide 16

Slide 16 text

Next Challenges

Slide 17

Slide 17 text

Next Challenges More operational knowledge/experiences within the SRE team.. ● Stable Rolling Update ● Disaster Recovery (Backup/Restore) ● Improve Developer Experiences ○ Topic + User Management

Slide 18

Slide 18 text

Thank you Kenju Wagatsuma, September 2021

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

For those who’d like to know more about Strimzi platform… following slides are for reference only

Slide 21

Slide 21 text

Glossary of Terms

Slide 22

Slide 22 text

broker a single Kafka server, a composite of a cluster commit the action of updating the current offset position in each partition consumer group a logical group which assures that each partition is only consumed by one member controller one of the brokers that is responsible for electing partition leaders consumer applicatoins or processes which read messages from a broker message the unit of data, which is an array of bytes internally offset a metadata that is an integer value showing which consumers takes messages from partition a single log which appends-only and can be hosted on different servers producer applications or processes which write messages to a broker topic a collection of partitions, “table” in RDBMS and “folder” in FileSystem

Slide 23

Slide 23 text

FAQ

Slide 24

Slide 24 text

Q. Alternative Design A. we also thought about the following design... a. Upgrade Confluent Cloud to Standard Plan b. AWS MSK c. Confluent Cloud Operator d. Writing Operator by ourselves

Slide 25

Slide 25 text

Q. Do you have Authorization? A. Yes, we do. Kafka supports Authorization using ACLs. You can limit who (users/machines) can do what kind of operations (read/write/admin) on which resources (topics/offset/cluster).

Slide 26

Slide 26 text

Q. Downside of Strimzi Operator? A. Strimzi Operator is another abstraction layter: ● Written in Java - no Java professionals within the team (yet) ● Software can have a bug ● Strimzi should keep supporting the latest version of Kafka ○ … Strimzi team has been doing great job so far :)

Slide 27

Slide 27 text

Diagrams

Slide 28

Slide 28 text

Networking - N+1 Load Balancers