$30 off During Our Annual Pro Sale. View Details »

Kafka Streams from the grounds up to the Cloud

Kafka Streams from the grounds up to the Cloud

[Talk given at Spring One Platform, Dec 5, 2017, San Francisco]

In this session we will introduce the Kafka Streams API and the Kafka Streams processing engine, followed by the Kafka Streams support in the Spring portfolio - showing how to easily write and deploy Kafka Streams applications using Spring Cloud Stream and deploy them on various cloud platforms using Spring Cloud Data Flow.

Marius Bogoevici

December 05, 2017
Tweet

More Decks by Marius Bogoevici

Other Decks in Technology

Transcript

  1. Kafka Streams:
    From the grounds up to the Cloud
    Marius Bogoevici, Chief Architect, Red Hat
    Spring One Platform, Dec 4, 2017
    @mariusbogoevici

    View Slide

  2. Marius Bogoevici
    ● Chief Architect, Data Streaming at Red Hat
    ● Spring ecosystem contributor since 2008
    ○ Spring Integration
    ● Spring team member between 2014 and 2017
    ○ Spring XD, Spring Integration Kafka
    ○ Spring Cloud Stream project lead
    ● Co-author “Spring Integration in Action”, Manning, 2012

    View Slide

  3. Kafka: from messaging system to streaming platform
    (based on https://www.confluent.io/blog/apache-kafka-goes-1-0/)
    Distributed
    log
    Replication,
    Fault
    tolerance
    Connect and
    Streams
    Transactions,
    Exactly once

    View Slide

  4. How about
    applications that are
    both producers and
    consumers and
    perform complex
    computations?
    Kafka as a distributed messaging system

    View Slide

  5. Kafka Streams
    ● Client library for stream processing
    ○ Embed stream processing features into
    regular Java applications (microservice
    model)
    ○ Create sophisticated topologies of
    independent applications
    ● Functional transformations via DSL:
    ○ Mapping, filtering, flatMap
    ○ Aggregation, joins (multiple topics)
    ○ Windowing
    ● Kafka-to-Kafka semantics
    ● One-record-at-a-time processing (no
    microbatching)
    ● Stateful processing support
    ● Transactions/exactly once
    Kafka Cluster
    Application
    Kafka Streams

    View Slide

  6. Kafka Streams - important concepts
    ● KStream
    ○ Record stream abstraction
    ○ Read from/written to external topic or produced from other KStream via operators such as
    map/filter
    ● KTable/GlobalKTable
    ○ Changelog stream abstraction (key is meaningful)
    ○ Read from external topic as a sequence of updates
    ○ Produced from other tables or stream joins, aggregations etc
    ● State Store
    ○ Key-value store for intermediate aggregation data, KTable materialized views, arbitrary
    key-value data produced during
    ○ Replicated externally
    ● Time windowing

    View Slide

  7. Kafka Streams - high level DSL
    KStream words = builder.stream(“words”)
    KTable countsTable = words.flatMapValues(value -> Arrays.asList(value.toLowerCase().split("\\W+")))
    .map((key, value) -> new KeyValue<>(value, value))
    .groupByKey(Serdes.String(), Serdes.String())
    .count(timeWindows, "WordCounts");
    KStream counts = counts.toStream()
    counts.to(“counts”)

    View Slide

  8. Kafka Streams stateful processing (default stores)
    Kafka Cluster
    Application
    Kafka Streams
    In-memory
    state store
    Local disk
    ● Pluggable state store model
    ● Key-value data store
    ● Default strategy:
    ○ In-memory (fast access)
    ○ Local disk (for fast recovery)
    ○ Replicated to Kafka (for resilience)
    ● Tightly integrated with Kafka: state
    updates are correlated with offset commits
    changelog

    View Slide

  9. Spring Cloud Stream
    ● Event-driven microservice framework
    ● Developer focus on writing business code
    ● Middleware-agnostic programming model
    ● Binders:
    ○ Kafka
    ○ RabbitMQ
    ○ AWS Kinesis
    ○ Google Pub Sub
    ○ Apache Artemis (community)
    ● Easy to deploy with Spring Cloud Data
    Flow

    View Slide

  10. Spring Cloud Stream KStream Processor (since 1.3)
    counts
    words
    Spring Cloud Stream
    KStream API output
    input
    Spring Boot
    Programming
    model (developer
    focus)
    Application model (configuration
    options, StreamConfig based on Spring
    Boot properties, KStreamBuilder,
    KStream binder)
    Externalized configuration,
    uberjar construction, health
    monitoring endpoints

    View Slide

  11. Kafka Streams in the Cloud
    Application
    Kafka Streams
    Application
    Kafka Streams
    Docker
    uberjar
    Spring Cloud Data Flow

    View Slide

  12. Kafka Streams stateful and stateless deployments
    Kafka Cluster
    Application
    Kafka Streams
    In-memory
    state store
    Local disk
    ● Changes propagated to changelog topic
    ● Stored locally for recovery/restart
    ● Fully stateless deployments require to
    replay the topic on restart/failover
    ● State store recovery can be optimized by
    providing access to stateful deployments
    changelog

    View Slide

  13. Kafka Streams with Kubernetes StatefulSets
    Application
    Kafka Streams
    Pod
    Application
    Kafka Streams
    Pod
    Application
    Kafka Streams
    Pod
    volume-word-count-0
    word-count-1 word-count-2
    volume-word-count-1 volume-word-count-2
    word-count-0

    View Slide

  14. Demo time!
    twitter:@mariusbogoevici
    Email: [email protected]
    http://cloud.spring.io/spring-cloud-stream/
    https://github.com/EnMasseProject/barnabas

    View Slide