Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Strimzi: Distributed Streaming with Apache Kafka in Kubernetes

Strimzi: Distributed Streaming with Apache Kafka in Kubernetes

Istanbul JUG, July 4th 2019

Follow me on twitter (@systemcraftsman) or subscribe to https://www.systemcraftsman.com/join/ to get any updates from me.

Aykut Bulgu

July 04, 2019
Tweet

More Decks by Aykut Bulgu

Other Decks in Technology

Transcript

  1. 1
    Distributed streaming made easy in the cloud
    Distributed Streaming with
    Apache Kafka in
    Kubernetes
    Aykut M. Bulgu - @systemcraftsman
    Middleware Consultant - Red Hat
    Distributed Streaming with Apache Kafka in Kubernetes

    View Slide

  2. 2
    Who am I?
    #oc apply -f aykutbulgu.yaml
    apiVersion: redhat/v1.1
    kind: Middleware Consultant
    metadata:
    name: Aykut Bulgu
    namespace: Red Hat Consulting EMEA
    Annotations:
    twitter: @systemcraftsman
    organizer: Software Craftsmanship Turkey
    founder: System Craftsman
    labels:
    married: yes
    children: [daughter]
    interests: openshift, kubernetes, spring boot,
    middleware, infinispan, kafka, strimzi
    spec:
    replicas: 1
    containers:
    - image: aykut:latest
    @systemcraftsman

    View Slide

  3. 3
    Messaging Types & Kafka
    Some Challenges
    Operators
    Demo
    Kubernetes
    Strimzi
    Accessing Kafka
    What we’ll be discussing today
    @systemcraftsman

    View Slide

  4. Chapter One: Messaging
    4

    View Slide

  5. 5
    Messaging
    Messaging ≠ ≠ Messaging
    Low-latency
    pub/sub
    Cross-cloud
    backbone
    Temporal
    decoupling
    Load
    levelling
    Load
    balancing
    Enterprise
    application
    integration
    IoT device
    connectivity
    Message-driv
    en beans
    Event-driven
    microservices
    Long-term
    message
    storage
    Replayable
    streams
    Event
    sourcing
    Geo-aware
    routing
    Database
    change data
    capture
    @systemcraftsman

    View Slide

  6. Messaging Technologies
    6
    @systemcraftsman

    View Slide

  7. What is Apache Kafka?
    7
    A publish/subscribe messaging system.
    A data streaming platform
    A distributed, horizontally-scalable, fault-tolerant, commit log
    @systemcraftsman

    View Slide

  8. 8
    Developed at LinkedIn back in 2010, open sourced in 2011
    Distributed by design
    High throughput
    Designed to be fast, scalable, durable and highly available
    Data partitioning (sharding)
    Ability to handle huge number of consumers
    What is Apache Kafka?
    @systemcraftsman

    View Slide

  9. Traditional Messaging
    9
    Queue
    Producer Consumer
    1
    2
    3
    Reference count-based message retention model
    When message is consumed it is deleted from broker
    “Smart broker, dumb client”
    Broker knows about all consumers
    Can perform per consumer filtering
    @systemcraftsman

    View Slide

  10. Apache Kafka
    10
    Kafka Topic
    Producer Consumer
    1
    2
    3 1
    2
    3
    Time-based message retention model by default
    Messages are retained according to topic config (time or capacity)
    Also “compacted topic” – like a “last-value topic”
    “Dumb broker, smart client”
    Client maintains position in message stream
    Message stream can be replayed
    @systemcraftsman

    View Slide

  11. Kafka Concepts - Producers
    11
    old new
    0 1 2 3 4 5 6 7 8 9
    0 1 2 3 4 5 6
    0 1 2 3 4 5 6 7 8
    Partition 0
    Partition 1
    Partition 2
    Producer
    Topic
    @systemcraftsman

    View Slide

  12. Kafka Concepts - Consumers
    12
    old new
    0 1 2 3 4 5 6 7 8 9
    0 1 2 3 4 5 6
    0 1 2 3 4 5 6 7 8
    Partition 0
    Partition 1
    Partition 2
    Consumer
    Topic
    @systemcraftsman

    View Slide

  13. Kafka Concepts - High Availability
    13
    Broker 1
    T1 - P1
    T1 - P2
    T2 - P1
    T2 - P2
    Broker 2
    T1 - P1
    T1 - P2
    T2 - P1
    T2 - P2
    Broker 3
    T1 - P1
    T1 - P2
    T2 - P1
    T2 - P2
    Leaders and followers spread across the cluster
    @systemcraftsman

    View Slide

  14. Kafka Concepts - High Availability
    14
    If a broker with leader partition goes down, a new leader partition is elected on different node
    Broker 1
    T1 - P1
    T1 - P2
    T2 - P1
    T2 - P2
    Broker 2
    T1 - P1
    T1 - P2
    T2 - P1
    T2 - P2
    Broker 3
    T1 - P1
    T1 - P2
    T2 - P1
    T2 - P2
    @systemcraftsman

    View Slide

  15. Kafka Concepts - Interaction with Leaders
    15
    Broker 1
    T1 - P1
    T1 - P2
    T2 - P1
    T2 - P2
    Broker 2
    T1 - P1
    T1 - P2
    T2 - P1
    T2 - P2
    Broker 3
    T1 - P1
    T1 - P2
    T2 - P1
    T2 - P2
    Producer P2
    Consumer C3
    Consumer C1
    Producer P1
    Consumer C2
    @systemcraftsman

    View Slide

  16. Kafka Use Cases
    16
    Messaging
    Replacement of traditional message broker High scale, high throughput, built-in partitioning, replication, and
    fault-tolerance. Some limitations compared to traditional broker (filtering, standard protocols, JMS …)
    Website Activity Tracker
    Rebuild user activity tracking pipeline as a set of real-time publish-subscribe feeds. Activity is published to central
    topics with one topic per activity type
    Metrics
    Aggregation of statistics from distributed applications to produce centralized feeds of operational data.
    Log Aggregation
    Abstracts details of files an gives event data as stream of messages. Offers good performance, stronger durability
    guarantees due to replication.
    Stream Processing
    Enables continuous, real-time applications built to react to, process, or transform streams.
    Data Integration
    Captures streams of events or data changes and feed these to other data systems (see Debezium project).
    @systemcraftsman

    View Slide

  17. Chapter Two: Kubernetes
    17

    View Slide

  18. 18
    @systemcraftsman
    Kubernetes is an open-source
    system for automating deployment,
    operations, and scaling of
    containerized applications across
    multiple hosts
    kubernetes

    View Slide

  19. 19
    Comes from Google experience with project “Borg”
    On the nodes a set of different “resources” can be deployed and handled
    Abstract the underlying hardware in terms of “nodes”
    Containerized applications are deployed, using and sharing “resources”
    Kubernetes
    kubernetes
    @systemcraftsman

    View Slide

  20. 20
    Security
    Control who can do what
    Scaling
    Scale containers up and down
    Persistence
    Survive data beyond container lifecycle
    Aggregation
    Compose apps from multiple containers
    kubernetes
    Scheduling
    Decide where to deploy containers
    Lifecycle and health
    Keep containers running despite failures
    Discovery
    Find other containers on the network
    Monitoring
    Visibility into running containers
    @systemcraftsman

    View Slide

  21. 21
    An open-source Enterprise
    Kubernetes platform based on
    Docker and Kubernetes for
    building, distributing and
    running containers at scale
    @systemcraftsman

    View Slide

  22. 22
    Routing & Load Balancing
    Multi-tenancy
    CI/CD Pipelines
    Role-based Authorization
    Capacity Management
    Infrastructure Visibility Chargeback
    Vulnerability Scanning
    Container Isolation
    Image Build Automation
    Quota Management
    Teams and Collaboration
    @systemcraftsman

    View Slide

  23. 23
    kubernetes
    Now, we know about We know about
    There are some challenges...
    How these two can work together?
    @systemcraftsman

    View Slide

  24. Chapter Three: Challenges
    24

    View Slide

  25. Challenges
    25
    A Kafka cluster requires;
    A stable broker identity and stable network address
    A way for brokers to discover each other and communicate
    Durable state on brokers and storage recovery
    To have brokers accessible from clients, directly
    It runs alongside a Zookeeper ensemble which requires;
    Each node has the configuration of the others
    To have nodes able to communicate each others
    Accessing Kafka isn’t so simple
    @systemcraftsman

    View Slide

  26. How Kubernetes Can Help
    26
    Kubernetes provides;
    StatefulSets for stable identity and network
    Together with Headless services for internal discovery
    Services for accessing the cluster
    Secrets and ConfigMap for handling configurations
    PersistentVolume and PersistentVolumeClaim for durable storage
    Kubernetes primitives help but still not easy
    It is still hard to deploy and manage Kafka on Kubernetes...
    @systemcraftsman

    View Slide

  27. Operator Framework
    27
    An application used to create, configure and manage other complex applications
    Contains domain-specific domain knowledge
    Operator works based on input from Custom Resource Definitions (CRDs)
    User describes the desired state
    Controller applies this state to the application
    It watches the *desired* state and the *actual* state and makes forward progress to reconcile
    OperatorHub.io
    Observe
    Analyze
    Act
    @systemcraftsman

    View Slide

  28. Chapter Four: Strimzi
    28
    Operators
    Accessing Kafka

    View Slide

  29. Operators
    29

    View Slide

  30. Strimzi Project
    30
    @systemcraftsman

    View Slide

  31. Strimzi - The open-source Apache Kafka Operator
    31
    Open source project licensed under Apache License 2.0
    Focuses on running Apache Kafka on Kubernetes and OpenShift:
    Container images for Apache Kafka and Apache Zookeeper
    Operators for managing and configuring Kafka clusters, topics or users
    Provides Kubernetes-native experience for running Kafka on Kubernetes and OpenShift
    Kafka cluster, topic and user as Kubernetes custom resources
    @systemcraftsman

    View Slide

  32. Red Hat AMQ Streams - Apache Kafka for the Enterprise
    32
    Part of the Red Hat AMQ suite
    AMQ Streams on OCP
    Running Apache Kafka on OpenShift Container Platform
    Based on the Strimzi project
    AMQ Streams on RHEL
    Running Apache Kafka on “bare metal”
    @systemcraftsman

    View Slide

  33. Strimzi Operators
    33
    Cluster
    Operator
    Kafka CR
    Kafka
    Zookeeper
    Deploys & manages
    cluster
    Topic
    Operator
    User
    Operator
    Topic CR
    User CR
    Manages
    topics & users
    @systemcraftsman

    View Slide

  34. Cluster Operator
    34
    Responsible for deploying and managing clusters
    Kafka, Kafka Connect, Zookeeper
    Also deploys other operators
    Topic Operator, User Operator
    The only component which the user has to install on his own
    Uses CRDs as blueprints for the clusters it deploys and manages
    CRDs act as extensions to the Kubernetes API
    Can be used similarly to native resources … oc get kafkas or kubectl get kafkas
    @systemcraftsman

    View Slide

  35. Cluster Operator
    35
    @systemcraftsman
    Installation
    Runs as a Deployment inside Kubernetes
    Configuration options are passed as environment variables
    Installation Requirements
    Service Account
    RBAC resources
    CRD definitions
    Should always run as a single replica

    View Slide

  36. Cluster Operator
    36
    Deploying Kafka
    Using a Kafka CR
    Configures Kafka, Zookeeper and the other operators
    Minimal options:
    Number of replicas
    Storage
    Listeners
    Other options available as well
    apiVersion: kafka.strimzi.io/v1alpha1
    kind: Kafka
    metadata:
    name: my-cluster
    spec:
    kafka:
    replicas: 3
    listeners:
    plain: {}
    tls: {}
    storage:
    type: persistent-claim
    size: 1Gi
    zookeeper:
    replicas: 3
    storage:
    type: persistent-claim
    size: 1Gi
    topicOperator: { }
    @systemcraftsman

    View Slide

  37. Topic Operator
    37
    Manages Kafka topics
    Bi-directional synchronization and 3-way diff
    Using CRDs
    Users can just do … oc get kafkatopics or kubectl get kafkatopics
    Installation
    One Topic Operator per Kafka cluster
    Users are expected to install Topic Operator through Cluster Operator
    Standalone installation is available and supported
    @systemcraftsman

    View Slide

  38. Topic Operator
    38
    Managing Kafka topics
    Using a KafkaTopic CR
    Label defining Kafka Cluster
    Minimal options:
    Number of partitions
    Replication factor
    apiVersion: kafka.strimzi.io/v1alpha1
    kind: KafkaTopic
    metadata:
    name: my-topic
    labels:
    strimzi.io/cluster: my-cluster
    spec:
    partitions: 1
    replicas: 1
    config:
    retention.ms: 7200000
    segment.bytes: 1073741824
    @systemcraftsman

    View Slide

  39. User Operator
    39
    Manages authentication and authorization
    Using CRDs
    Users can just do … oc get kafkausers or kubectl get kafkausers
    Installation
    One User Operator per Kafka cluster
    Users are expected to install User Operator through Cluster Operator
    Standalone installation is available and supported
    @systemcraftsman

    View Slide

  40. User Operator
    40
    Authentication
    Currently supports TLS Client Authentication and SASL
    SCRAM-SHA-512
    The KafkaUser CR requests TLS Client
    Authentication
    The User Operator will issue TLS certificate and
    stores it in Secret
    Authorization
    Currently supports Kafka’s built-in SimpleAclAuthorizer
    The KafkaUser CR lists the desired ACL rights
    The User Operator will update them in Zookeeper
    @systemcraftsman

    View Slide

  41. User Operator
    41
    Managing Kafka users
    Using a KafkaUser CR
    Label defining Kafka Cluster
    Authentication configuration
    Authorization configuration
    apiVersion: kafka.strimzi.io/v1alpha1
    kind: KafkaUser
    metadata:
    name: my-user
    labels:
    strimzi.io/cluster: my-cluster
    spec:
    authentication:
    type: tls
    authorization:
    type: simple
    acls:
    - resource:
    type: topic
    name: my-topic
    patternType: literal
    operation: Read
    host: "*"
    - resource:
    # ...
    @systemcraftsman

    View Slide

  42. Accessing Kafka
    42

    View Slide

  43. Kafka’s Discovery Protocol
    43
    Broker 1
    T1 - P1
    T1 - P2
    T2 - P1
    T2 - P2
    Broker 2
    T1 - P1
    T1 - P2
    T2 - P1
    T2 - P2
    Broker 3
    T1 - P1
    T1 - P2
    T2 - P1
    T2 - P2
    Producer P2
    Consumer C3
    Consumer C1
    Producer P1
    Consumer C2
    @systemcraftsman

    View Slide

  44. Kafka’s Discovery Protocol
    44
    @systemcraftsman

    View Slide

  45. Kubernetes Cluster Internal Access
    45
    @systemcraftsman

    View Slide

  46. Kubernetes Cluster External Access
    46
    @systemcraftsman

    View Slide

  47. Features
    47
    Tolerations
    Memory and CPU
    resources
    High
    Availability
    Mirroring
    Affinity
    Authentication
    Storage
    Encryption
    Scale Down
    JVM
    Configuration
    Logging
    Metrics
    Off cluster
    access
    Scale Up
    Authorization
    Healthchecks
    Source2Image
    Configuration
    @systemcraftsman

    View Slide

  48. Chapter Five: Demo
    48

    View Slide

  49. Resources
    49
    Strimzi : https://strimzi.io/
    OperatorHub.io : https://www.operatorhub.io/
    Apache Kafka : https://kafka.apache.org/
    Kubernetes : https://kubernetes.io/
    OpenShift : https://www.openshift.com/
    Operator framework : https://github.com/operator-framework
    Demo : https://github.com/systemcraftsman/strimzi-demo
    @systemcraftsman

    View Slide

  50. Thank You
    50
    @systemcraftsman
    [email protected]
    [email protected]

    View Slide