Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Kafka on Kubernetes with Strimzi

Kafka on Kubernetes with Strimzi

How we manage Apache Kafka cluster on Kubernetes with Strimzi Operator

Ken Wagatsuma

September 19, 2021

More Decks by Ken Wagatsuma

Other Decks in Programming


  1. Kafka on Kubernetes with Strimzi Ken (Kenju Wagatsuma) Senior SRE

    September 2021
  2. Background / History

  3. [May 2018~] Confluent Cloud Introduced Personal Feed • Replaced MySQL

    table with Apache Kafka • Started using Confluent Cloud (Managed Service) • Easy to setup, Easy to maintain
  4. [Jan 2021~] More Focus on Feed Business features will depend

    much more on Feed • Resiliency+ • Performance+ • Flexibility+
  5. [Jan 2021~] Confluent Cloud Deprecation "Action required: Migrate your Standard

    Legacy cluster(s) prior to May 31st, 2021” from Confluent Cloud team • Hard deadline for migrating the old cluster to somewhere else • Downtime is inevitable anyway • Pricing plan will be changed
  6. Our Experiences with Confluent Cloud Kafka Cluster is a “blackbox”

    • No access to brokers metrics/configuration - lack of observability • Cluster upgrade just happens - less flexibility on versioning • Some Limitations, or pay for Enterprise Plan
  7. Solution

  8. Self-Hosting Kafka on Kubernetes (AWS EKS)

  9. Self-Hosting Kafka on Kubernetes Why do we need Self-Hosting? •

    More flexibility and observability are achievable • Theoretically no limitation • Scalability with more optimized solutions
  10. Self-Hosting Kafka on Kubernetes Why on Kubernetes? • Community is

    growing, Best practices are widespreading • Kubernetes professionals at Cookpad Global • Ecosystem has a lot of library options
  11. Kubernetes Operator … is an architectural pattern for deploying and

    operating applications • Writing a full YAML file for all Kafka components is hard ◦ Kafka/Zookeeper/Connect/MirrorMaker/Bridge/RBAC/... • Application-specific requirements are complex ◦ Cluster configuration ◦ Networking among processes ◦ Distributing mTLS and client certificates
  12. Kubernetes Operator for Kafka • Strimzi Operator - OSS, Redhat,

    CNCF Sandbox • Confluent Operator - Confluent Cloud • KOperator - BANZAI Cloud • KUDO Operator - KUDO • Writing Your Own Kafka Operator - Cookpad • ...
  13. Strimzi Operator … is OSS-driven Production-Ready Kafka Operator • CNCF

    Sandox Project since Aug. 2019 • Core Contributors from Red Hat • Ecosystem & Community are growing • Strimzi Survey 2020: 40% of 40+ users are using it in production
  14. 3 Strimzi Operator • Cluster Operator • User Operator •

    Topic Operator Kafka Components • Kafka Brokers • Zookeeper Nodes
  15. Strimzi Operator is Production-Ready … supports • MirrorMaker 2.0/Kafka Connect/Kafka

    Bridge • Obserability (JMX Exporter integration) • Rolling Update Support • Automatic Disk Resizing Support • mTLS • RBAC
  16. Next Challenges

  17. Next Challenges More operational knowledge/experiences within the SRE team.. •

    Stable Rolling Update • Disaster Recovery (Backup/Restore) • Improve Developer Experiences ◦ Topic + User Management
  18. Thank you Kenju Wagatsuma, September 2021

  19. None
  20. For those who’d like to know more about Strimzi platform…

    following slides are for reference only
  21. Glossary of Terms

  22. broker a single Kafka server, a composite of a cluster

    commit the action of updating the current offset position in each partition consumer group a logical group which assures that each partition is only consumed by one member controller one of the brokers that is responsible for electing partition leaders consumer applicatoins or processes which read messages from a broker message the unit of data, which is an array of bytes internally offset a metadata that is an integer value showing which consumers takes messages from partition a single log which appends-only and can be hosted on different servers producer applications or processes which write messages to a broker topic a collection of partitions, “table” in RDBMS and “folder” in FileSystem
  23. FAQ

  24. Q. Alternative Design A. we also thought about the following

    design... a. Upgrade Confluent Cloud to Standard Plan b. AWS MSK c. Confluent Cloud Operator d. Writing Operator by ourselves
  25. Q. Do you have Authorization? A. Yes, we do. Kafka

    supports Authorization using ACLs. You can limit who (users/machines) can do what kind of operations (read/write/admin) on which resources (topics/offset/cluster).
  26. Q. Downside of Strimzi Operator? A. Strimzi Operator is another

    abstraction layter: • Written in Java - no Java professionals within the team (yet) • Software can have a bug • Strimzi should keep supporting the latest version of Kafka ◦ … Strimzi team has been doing great job so far :)
  27. Diagrams

  28. Networking - N+1 Load Balancers