Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Kafka on Kubernetes with Strimzi

Ken Wagatsuma
September 19, 2021

Kafka on Kubernetes with Strimzi

How we manage Apache Kafka cluster on Kubernetes with Strimzi Operator

Ken Wagatsuma

September 19, 2021
Tweet

More Decks by Ken Wagatsuma

Other Decks in Programming

Transcript

  1. Kafka on Kubernetes
    with Strimzi
    Ken (Kenju Wagatsuma)
    Senior SRE
    September 2021

    View full-size slide

  2. Background / History

    View full-size slide

  3. [May 2018~] Confluent Cloud Introduced
    Personal Feed
    ● Replaced MySQL table with Apache Kafka
    ● Started using Confluent Cloud (Managed Service)
    ● Easy to setup, Easy to maintain

    View full-size slide

  4. [Jan 2021~] More Focus on Feed
    Business features will depend much more on Feed
    ● Resiliency+
    ● Performance+
    ● Flexibility+

    View full-size slide

  5. [Jan 2021~] Confluent Cloud Deprecation
    "Action required: Migrate your Standard Legacy cluster(s) prior to May
    31st, 2021” from Confluent Cloud team
    ● Hard deadline for migrating the old cluster to somewhere else
    ● Downtime is inevitable anyway
    ● Pricing plan will be changed

    View full-size slide

  6. Our Experiences with Confluent Cloud
    Kafka Cluster is a “blackbox”
    ● No access to brokers metrics/configuration - lack of observability
    ● Cluster upgrade just happens - less flexibility on versioning
    ● Some Limitations, or pay for Enterprise Plan

    View full-size slide

  7. Self-Hosting Kafka
    on
    Kubernetes (AWS EKS)

    View full-size slide

  8. Self-Hosting Kafka on Kubernetes
    Why do we need Self-Hosting?
    ● More flexibility and observability are achievable
    ● Theoretically no limitation
    ● Scalability with more optimized solutions

    View full-size slide

  9. Self-Hosting Kafka on Kubernetes
    Why on Kubernetes?
    ● Community is growing, Best practices are widespreading
    ● Kubernetes professionals at Cookpad Global
    ● Ecosystem has a lot of library options

    View full-size slide

  10. Kubernetes Operator
    … is an architectural pattern for deploying and operating applications
    ● Writing a full YAML file for all Kafka components is hard
    ○ Kafka/Zookeeper/Connect/MirrorMaker/Bridge/RBAC/...
    ● Application-specific requirements are complex
    ○ Cluster configuration
    ○ Networking among processes
    ○ Distributing mTLS and client certificates

    View full-size slide

  11. Kubernetes Operator for Kafka
    ● Strimzi Operator - OSS, Redhat, CNCF Sandbox
    ● Confluent Operator - Confluent Cloud
    ● KOperator - BANZAI Cloud
    ● KUDO Operator - KUDO
    ● Writing Your Own Kafka Operator - Cookpad
    ● ...

    View full-size slide

  12. Strimzi Operator
    … is OSS-driven Production-Ready Kafka Operator
    ● CNCF Sandox Project since Aug. 2019
    ● Core Contributors from Red Hat
    ● Ecosystem & Community are growing
    ● Strimzi Survey 2020: 40% of 40+ users are using it in production

    View full-size slide

  13. 3 Strimzi Operator
    ● Cluster Operator
    ● User Operator
    ● Topic Operator
    Kafka Components
    ● Kafka Brokers
    ● Zookeeper Nodes

    View full-size slide

  14. Strimzi Operator is Production-Ready
    … supports
    ● MirrorMaker 2.0/Kafka Connect/Kafka Bridge
    ● Obserability (JMX Exporter integration)
    ● Rolling Update Support
    ● Automatic Disk Resizing Support
    ● mTLS
    ● RBAC

    View full-size slide

  15. Next Challenges

    View full-size slide

  16. Next Challenges
    More operational knowledge/experiences within the SRE team..
    ● Stable Rolling Update
    ● Disaster Recovery (Backup/Restore)
    ● Improve Developer Experiences
    ○ Topic + User Management

    View full-size slide

  17. Thank you
    Kenju Wagatsuma, September 2021

    View full-size slide

  18. For those who’d like to know more about
    Strimzi platform…
    following slides are for reference only

    View full-size slide

  19. Glossary of Terms

    View full-size slide

  20. broker a single Kafka server, a composite of a cluster
    commit the action of updating the current offset position in each partition
    consumer group a logical group which assures that each partition is only consumed by one member
    controller one of the brokers that is responsible for electing partition leaders
    consumer applicatoins or processes which read messages from a broker
    message the unit of data, which is an array of bytes internally
    offset a metadata that is an integer value showing which consumers takes messages from
    partition a single log which appends-only and can be hosted on different servers
    producer applications or processes which write messages to a broker
    topic a collection of partitions, “table” in RDBMS and “folder” in FileSystem

    View full-size slide

  21. Q. Alternative Design
    A. we also thought about the following design...
    a. Upgrade Confluent Cloud to Standard Plan
    b. AWS MSK
    c. Confluent Cloud Operator
    d. Writing Operator by ourselves

    View full-size slide

  22. Q. Do you have Authorization?
    A. Yes, we do. Kafka supports Authorization using ACLs. You can limit
    who (users/machines) can do what kind of operations
    (read/write/admin) on which resources (topics/offset/cluster).

    View full-size slide

  23. Q. Downside of Strimzi Operator?
    A. Strimzi Operator is another abstraction layter:
    ● Written in Java - no Java professionals within the team (yet)
    ● Software can have a bug
    ● Strimzi should keep supporting the latest version of Kafka
    ○ … Strimzi team has been doing great job so far :)

    View full-size slide

  24. Networking - N+1 Load Balancers

    View full-size slide