Kafka Streams from the grounds up to the Cloud

Kafka Streams: From the grounds up to the Cloud Marius
Bogoevici, Chief Architect, Red Hat Spring One Platform, Dec 4, 2017 @mariusbogoevici

Marius Bogoevici • Chief Architect, Data Streaming at Red Hat
• Spring ecosystem contributor since 2008 ◦ Spring Integration • Spring team member between 2014 and 2017 ◦ Spring XD, Spring Integration Kafka ◦ Spring Cloud Stream project lead • Co-author “Spring Integration in Action”, Manning, 2012

Kafka: from messaging system to streaming platform (based on https://www.confluent.io/blog/apache-kafka-goes-1-0/)
Distributed log Replication, Fault tolerance Connect and Streams Transactions, Exactly once

How about applications that are both producers and consumers and
perform complex computations? Kafka as a distributed messaging system

Kafka Streams • Client library for stream processing ◦ Embed
stream processing features into regular Java applications (microservice model) ◦ Create sophisticated topologies of independent applications • Functional transformations via DSL: ◦ Mapping, filtering, flatMap ◦ Aggregation, joins (multiple topics) ◦ Windowing • Kafka-to-Kafka semantics • One-record-at-a-time processing (no microbatching) • Stateful processing support • Transactions/exactly once Kafka Cluster Application Kafka Streams

Kafka Streams - important concepts • KStream ◦ Record stream
abstraction ◦ Read from/written to external topic or produced from other KStream via operators such as map/filter • KTable/GlobalKTable ◦ Changelog stream abstraction (key is meaningful) ◦ Read from external topic as a sequence of updates ◦ Produced from other tables or stream joins, aggregations etc • State Store ◦ Key-value store for intermediate aggregation data, KTable materialized views, arbitrary key-value data produced during ◦ Replicated externally • Time windowing

Kafka Streams - high level DSL KStream words = builder.stream(“words”)
KTable countsTable = words.flatMapValues(value -> Arrays.asList(value.toLowerCase().split("\\W+"))) .map((key, value) -> new KeyValue<>(value, value)) .groupByKey(Serdes.String(), Serdes.String()) .count(timeWindows, "WordCounts"); KStream counts = counts.toStream() counts.to(“counts”)

Kafka Streams stateful processing (default stores) Kafka Cluster Application Kafka
Streams In-memory state store Local disk • Pluggable state store model • Key-value data store • Default strategy: ◦ In-memory (fast access) ◦ Local disk (for fast recovery) ◦ Replicated to Kafka (for resilience) • Tightly integrated with Kafka: state updates are correlated with offset commits changelog

Spring Cloud Stream • Event-driven microservice framework • Developer focus
on writing business code • Middleware-agnostic programming model • Binders: ◦ Kafka ◦ RabbitMQ ◦ AWS Kinesis ◦ Google Pub Sub ◦ Apache Artemis (community) • Easy to deploy with Spring Cloud Data Flow

Spring Cloud Stream KStream Processor (since 1.3) counts words Spring
Cloud Stream KStream API output input Spring Boot Programming model (developer focus) Application model (configuration options, StreamConfig based on Spring Boot properties, KStreamBuilder, KStream binder) Externalized configuration, uberjar construction, health monitoring endpoints

Kafka Streams in the Cloud Application Kafka Streams Application Kafka
Streams Docker uberjar Spring Cloud Data Flow

Kafka Streams stateful and stateless deployments Kafka Cluster Application Kafka
Streams In-memory state store Local disk • Changes propagated to changelog topic • Stored locally for recovery/restart • Fully stateless deployments require to replay the topic on restart/failover • State store recovery can be optimized by providing access to stateful deployments changelog

Kafka Streams with Kubernetes StatefulSets Application Kafka Streams Pod Application
Kafka Streams Pod Application Kafka Streams Pod volume-word-count-0 word-count-1 word-count-2 volume-word-count-1 volume-word-count-2 word-count-0

Demo time! twitter:@mariusbogoevici Email: [email protected] http://cloud.spring.io/spring-cloud-stream/ https://github.com/EnMasseProject/barnabas

Kafka Streams from the grounds up to the Cloud

Kafka Streams from the grounds up to the Cloud

Marius Bogoevici

More Decks by Marius Bogoevici

Other Decks in Technology

Featured

Transcript

Kafka Streams: From the grounds up to the Cloud Marius

Marius Bogoevici • Chief Architect, Data Streaming at Red Hat

Kafka: from messaging system to streaming platform (based on https://www.confluent.io/blog/apache-kafka-goes-1-0/)

How about applications that are both producers and consumers and

Kafka Streams • Client library for stream processing ◦ Embed

Kafka Streams - important concepts • KStream ◦ Record stream

Kafka Streams - high level DSL KStream words = builder.stream(“words”)

Kafka Streams stateful processing (default stores) Kafka Cluster Application Kafka

Spring Cloud Stream • Event-driven microservice framework • Developer focus

Spring Cloud Stream KStream Processor (since 1.3) counts words Spring

Kafka Streams in the Cloud Application Kafka Streams Application Kafka

Kafka Streams stateful and stateless deployments Kafka Cluster Application Kafka

Kafka Streams with Kubernetes StatefulSets Application Kafka Streams Pod Application

Demo time! twitter:@mariusbogoevici Email: [email protected] http://cloud.spring.io/spring-cloud-stream/ https://github.com/EnMasseProject/barnabas