Slide 1

Slide 1 text

Writing Blazing Fast, and Production Ready Kafka Streams apps (in less than 30 min) using Azkarra Kafka Summit Europe 2021 Florian HUSSONNOIS

Slide 2

Slide 2 text

. @fhussonnois Consultant, Trainer Software Engineer Co-founder @StreamThoughts Confluent Community Catalyst (2019/2021) Apache Kafka Streams contributor Open Source Technology Enthusiastic - Azkarra Streams - Kafka Connect File Pulse - Kafka Streams CEP - Kafka Client for Kotlin Hi, Im Florian Hussonnois 2

Slide 3

Slide 3 text

3 Like me, you probably started with the famous Word Count ! KStream source = builder.stream("streams-plaintext-input"); source.flatMapValues(splitAndToLowercase()) .groupBy((key, value) -> value) .count(Materialized.as("counts-store")) .toStream() .to("streams-wordcount-output", Produced.with(Serdes.String(), Serdes.Long())); Topology topology = builder.build();

Slide 4

Slide 4 text

4 KStream source = builder.stream("streams-plaintext-input"); source.flatMapValues(splitAndToLowercase()) .groupBy((key, value) -> value) .count(Materialized.as("counts-store")) .toStream() .to("streams-wordcount-output", Produced.with(Serdes.String(), Serdes.Long())); Topology topology = builder.build(); GroupBy(Key) Repartition Stateful Stream Processing Consume Transform Aggregate / Join Produce 1 2 3

Slide 5

Slide 5 text

public class WordCount { public static void main(String[] args) { var builder = new StreamsBuilder (); KStream source = builder.stream("streams-plaintext-input" ); source.flatMapValues(splitAndToLowercase ()) .groupBy((key, value) -> value) .count(Materialized.as("counts-store" )) .toStream() .to("streams-wordcount-output" , Produced.with(Serdes.String(), Serdes.Long())); var topology = builder.build(); Properties props = new Properties(); props.put(StreamsConfig.APPLICATION_ID_CONFIG , "streams-wordcount" ); props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG , "localhost:9092" ); props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG , Serdes.String().getClass()); props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG , Serdes.String().getClass()); var streams = new KafkaStreams(topology, props); Runtime.getRuntime().addShutdownHook (new Thread(streams::close )); } } Core Logic Execution 5 Configuration

Slide 6

Slide 6 text

6 Can we deploy a Kafka Streams application like this one in production, without any changes?

Slide 7

Slide 7 text

7 The Answer is No!

Slide 8

Slide 8 text

8 (Well, unless you are testing your app in production…cough, cough...)

Slide 9

Slide 9 text

9 (Well, unless you are testing your app in production…cough, cough...) OK, Nobody does that!

Slide 10

Slide 10 text

▢ Test the app is working as expected ▢ Externalize configuration ▢ Handle transient errors ▢ Handle deserialization exceptions Some requirements before moving into production Our TODO list 10 ▢ Expose the state of the Kafka Streams application ▢ Be able to monitor offsets and lags of consumers and state stores ▢ Interactive Queries (optional) ▢ Package the Kafka Streams application for production

Slide 11

Slide 11 text

. Business Value vs Effort Topology (Business Logic) Business Value High Kafka Streams Management IQ Error Handling logic Monitoring / Health-check Security Config Externalization Low Effort Low/Medium High Streams Lifecycle Kafka Streams Application 11 RocksDB Offsets and Lags Packaging

Slide 12

Slide 12 text

. A lightweight Java framework to make a Kafka Streams application production-ready in just a few lines of code. ■ Distributed under the Apache License 2.0. ■ Was developed based on experience on a wide range of projects ■ Uses best-practices developed by Kafka users and the open-source community. Overview: ■ REST API: Health Check, Monitoring, Interactive Queries, etc ■ Embedded WebUI: Topology DAG Visualization ■ Built-in features for handling exceptions and tuning RocksDB ■ Support for Server-Sent Events Azkarra Framework in a nutshell 12 #azkarrastreams

Slide 13

Slide 13 text

. Available on Maven Central Azkarra Stream How to use It ? 13 io.streamthoughts azkarra-streams 0.9.2 Azkarra Framework: io.streamthoughts azkarra-commons 0.9.2 Provides reusable classes for Kafka Streams : mvn archetype:generate -DarchetypeGroupId =io.streamthoughts \ -DarchetypeArtifactId =azkarra-quickstart-java \ -DarchetypeVersion =0.9.2 \ -DgroupId=azkarra.streams \ -DartifactId=azkarra-getting-started \ -Dversion=1.0 \ -Dpackage=azkarra \ -DinteractiveMode =false Quick start:

Slide 14

Slide 14 text

14 Let’s re-write the “Word Count” using with Azkarra (we have still 25’ left) 👾

Slide 15

Slide 15 text

. . . Concepts TopologyProvider Topology Provider Topology Container for building and configuring a Topology 15 class WordCountTopology implements TopologyProvider, Configurable { private Conf conf; @Override public Topology topology() { var source = conf.getString("topic.source.name"); var sink = conf.getString("topic.sink.name"); var store = conf.getString("store.name"); var builder = new StreamsBuilder(); builder .stream(source) .flatMapValues(splitAndToLowercase()) .groupBy((key, value) -> value) .count(Materialized.as(store)) .toStream() .to(sink, Produced.with(Serdes.String(), Serdes.Long())); return builder.build(); } @Override public void configure(final Conf conf) { this.conf = conf; } @Override public String version() { return "1.0"; } }

Slide 16

Slide 16 text

. . . Concepts Execution Environment StreamsExecution Environment Manages the life cycle of KafkaStreams instances. Topology Provider Topology 16 // (1) Define the KafkaStreams configuration var streamsConfig = Conf.of( BOOTSTRAP_SERVERS_CONFIG, "localhost:9092", DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass(), DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass() ); // (2) Define the Topology configuration var topologyConfig = Conf.of( "topic.source.name", "topic-text-lines", "topic.sink.name", "topic-text-word-count", "store.name", "Count" ); // (3) Create and configure a local execution environment var env = LocalStreamsExecutionEnvironment .create(Conf.of("streams", streamsConfig)) // (4) Register our topology to run .registerTopology( WordCountTopology::new, Executed.as("WordCount").withConfig(topologyConfig) ); // (5) Start the environment env.start(); // (6) Add Shutdown Hook Runtime.getRuntime() .addShutdownHook(new Thread(env::stop));

Slide 17

Slide 17 text

. 17 Let’s start KafkaStreams Boom! Transient Errors word-count-1-0-ae1a9bf9-101d-4796-ad36-2e1130e83573-StreamThread-1] Received error code INCOMPLETE_SOURCE_TOPIC_METADATA 16:05:12.585 [word-count-1-0-ae1a9bf9-101d-4796-ad36-2e1130e83573-StreamThread-1] ERROR org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=word-count-1-0-ae1a9bf9-101d-4796-ad36-2e1130e83573-StreamThread-1-consumer, groupId=word-count-1-0] User provided listener org.apache.kafka.streams.processor.internals.StreamsRebalanceListener failed on invocation of onPartitionsAssigned for partitions [] org.apache.kafka.streams.errors.MissingSourceTopicException: One or more source topics were missing during rebalance at org.apache.kafka.streams.processor.internals.StreamsRebalanceListener.onPartitionsAssigned(StreamsRebalanceListener.java:57) ~[kafka-streams-2.7.0.jar:?] at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.invokePartitionsAssigned(ConsumerCoordinator.java:293) [kafka-clients-2.7.0.jar:?] at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:430) [kafka-clients-2.7.0.jar:?] at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:451) [kafka-clients-2.7.0.jar:?] at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:367) [kafka-clients-2.7.0.jar:?] at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:508) [kafka-clients-2.7.0.jar:?]

Slide 18

Slide 18 text

. . . 18 StreamLifecycleInterceptor Concepts Interface StreamsLifecycleInterceptor { /** * Intercepts the streams instance before being started. */ default void onStart(StreamsLifecycleContext context, StreamsLifecycleChain chain) { chain.execute(); } /** * Intercepts the streams instance before being stopped. */ default void onStop(StreamsLifecycleContext context, StreamsLifecycleChain chain) { chain.execute(); } /** * Used for logging information. */ default String name() { return getClass().getSimpleName(); } } A pluggable interface that allows intercepting a KafkaStreams instance before being started or stopped. Built-in Implementations: ■ AutoCreateTopicsInterceptor ■ WaitForSourceTopicsInterceptor ■ KafkaBrokerReadyInterceptor ...and a few more (discussed later) 😉 Most Interceptors are configurable.

Slide 19

Slide 19 text

. . . 19 AutoCreateTopicsInterceptor Concepts import static io.s.a.r.i.AutoCreateTopicsInterceptorConfig.*; // (1) Define the KafkaStreams configuration var streamsConfig = ... // (2) Define the Topology configuration var topologyConfig = ... // (3) Define the Environment configuration var envConfig = Conf.of( "streams", streamsConfig, AUTO_CREATE_TOPICS_NUM_PARTITIONS_CONFIG, 2, AUTO_CREATE_TOPICS_REPLICATION_FACTOR_CONFIG, 1, // WARN - ONLY DURING DEVELOPMENT AUTO_DELETE_TOPICS_ENABLE_CONFIG, true ); // (4) Create and configure the local execution environment LocalStreamsExecutionEnvironment .create(envConfig) // (5) Add the StreamLifecycleInterceptor .addStreamsLifecycleInterceptor( AutoCreateTopicsInterceptor::new ) // ...code omitted for clarity Automatically infers the source and sink topics to be created from the Topology.describe(). ■ Internally, uses the AdminClient API. ■ Can be used during development for deleting all topics when the instance is stopped. for

Slide 20

Slide 20 text

▢ Test the app is working as expected ▢ Externalize configuration ▢ Handle transient errors ▢ Handle deserialization exceptions Externalizing configuration (we have 20’ left)😀 What's left to do ? 20 ▢ Expose the state of the Kafka Streams application ▢ Be able to monitor offsets and lags of consumers and state stores ▢ Interactive Queries (optional) ▢ Package the Kafka Streams application for production

Slide 21

Slide 21 text

. . . 21 Conf & AzkarraConf External Configuration // file:application.conf azkarra { // The configuration settings passed to the Kafka Streams // instance should be prefixed with `.streams` streams { bootstrap.servers = "localhost:9092" default.key.serde = "org.apache.kafka..Serdes$StringSerde" default.value.serde = "org.apache.kafka..Serdes$StringSerde" } topic.source.name = "topic-text-lines" topic.sink.name = "topic-text-word-count" store.name = "Count" auto.create.topics.num.partitions = 2 auto.create.topics.replication.factor = 1 auto.delete.topics.enable = true } // file:Main.class var config = AzkarraConf.create().getSubConf("azkarra"); Azkarra provides the Configurable interface which can be implemented by most of the Azkarra components. ■ AzkarraConf: Uses the Lightbend Config library. ○ Allows loading configuration settings from HOCON files. void configure(final Conf configuration);

Slide 22

Slide 22 text

. . . Concepts AzkarraContext AzkarraContext StreamsExecution Environment Container for Dependency Injection. Used to automatically configures streams execution environments. Topology Provider Topology 22 public static void main(final String[] args) { // (1) Load the configuration (application.conf) var config = AzkarraConf.create().getSubConf("azkarra"); // (2) Create the Azkarra Context var context = DefaultAzkarraContext.create(config); // (3) Register StreamLifecycleInterceptor as component context.registerComponent( ConsoleStreamsLifecycleInterceptor.class ); // (4) Register the Topology to the default environment context.addTopology( WordCountTopology.class, Executed.as("word-count") ); // (5) Start the context context .setRegisterShutdownHook(true) .start(); }

Slide 23

Slide 23 text

. . . Concepts AzkarraApplication AzkarraContext AzkarraApplication StreamsExecution Environment Used to bootstrap and configure an Azkarra application. Provides Embedded HTTP-Server Provides Component Scanning Topology Provider Topology 23 public class WordCount { public static void main(final String[] args) { // (1) Load the configuration (application.conf) var config = AzkarraConf.create(); // (2) Create the Azkarra Context var context = DefaultAzkarraContext.create(); // (3) Register the Topology to the default environment context.addTopology( WordCountTopology.class, Executed.as("word-count") ); // (4) Create Azkarra application new AzkarraApplication() .setContext(context) .setConfiguration(config) // (5) Enable and configure embedded HTTP server .setHttpServerEnable(true) .setHttpServerConf(ServerConfig.newBuilder() .setListener("localhost") .setPort(8080) .build() ) // (6) Start Azkarra .run(args); } }

Slide 24

Slide 24 text

. . . Concepts AzkarraApplication AzkarraContext AzkarraApplication StreamsExecution Environment Topology Provider Topology 24 @AzkarraStreamsApplication public class WordCount { public static void main(String[] args) { AzkarraApplication.run(WordCount.class, args); } @Component public static class WordCountTopology implements TopologyProvider, Configurable { private Conf conf; @Override public Topology topology() { var builder = new StreamsBuilder(); // ...code omitted for clarity return builder.build(); } @Override public void configure(Conf conf) { this.conf = conf; } @Override public String version() { return "1.0"; } } } Used to bootstrap and configure an Azkarra application. Provides Embedded HTTP-Server Provides Component Scanning

Slide 25

Slide 25 text

▢ Test the app is working as expected ▢ Externalize configuration ▢ Handle transient errors ▢ Handle deserialization exceptions Handling Deserialization Exceptions (we have 15’ left)🤔 What's left to do ? 25 ▢ Expose the state of the Kafka Streams application ▢ Be able to monitor offsets and lags of consumers and state stores ▢ Interactive Queries (optional) ▢ Package the Kafka Streams application for production

Slide 26

Slide 26 text

. default.deserialization.exception.handler ■ CONTINUE: continue with processing ■ FAIL: fail the processing and stop Two available implementations : ■ LogAndContinueExceptionHandler ■ LogAndFailExceptionHandler 26 Solution #1 Built-in mechanisms Not really suitable for production. Cannot monitor efficiently corrupted messages

Slide 27

Slide 27 text

. . . 27 Solution #2 Dead Letter Queue Topic Solution #3 Sentinel Value DeserializationExceptionHandler Send corrupted messages to a special topic. Deserializer Catch any exception thrown during deserialization and return a default value (e.g: null, “N/A”, etc). Handler ? Source Topic Topology (skip) Dead Letter Topic ! ! ! ! Source Topic SafeDeserializer Delegate Deserializer (null)(null) ! !

Slide 28

Slide 28 text

. . . 28 Solution #2 Using Azkarra Solution #3 DeadLetterTopicExceptionHandler ■ By default, sends corrupted records to -rejected ■ Doesn’t change the schema/format of the corrupted message. ■ Use Kafka Headers to trace exception cause and origin, e.g. : ○ __errors.exception.stacktrace __errors.exception.message ○ __errors.exception.class.name ○ __errors.timestamp ○ __errors.application.id ○ __errors.record.[topic|partition|offset] ■ Can be configured to send records to a distinct Kafka Cluster than the one used for KafkaStreams. SafeSerdes SafeSerdes.Long(-1L); SafeSerdes.UUID(null); SafeSerdes.serdeFrom( new JsonSerializer (), new JsonDeserializer (), NullNode.getInstance () );

Slide 29

Slide 29 text

▢ Test the app is working as expected ▢ Externalize configuration ▢ Handle transient errors ▢ Handle deserialization exceptions Monitoring (we have 10’ left)🙃 Our TODO list 29 ▢ Expose the state of the Kafka Streams application ▢ Be able to monitor offsets and lags of consumers and state stores ▢ Interactive Queries (optional) ▢ Package the Kafka Streams application for production

Slide 30

Slide 30 text

. The Kafka Streams API provides few methods for monitoring the state of the running instance. ■ KafkaStreams#state(), KafkaStreams#setStateListener() ⎼ CREATED, REBALANCING, RUNNING, PENDING_SHUTDOWN, NOT_RUNNING, ERROR ⎼ can be used for checking the Liveness and Readiness for the instance. ■ KafkaStreams#localThreadsMetadata ⎼ returns information about local Threads/Tasks and partition assignments. ■ KafkaStreams#metrics() Best Practices: ■ Build some REST APIs to expose the states of Kafka Streams ■ Export Metrics using JMX, Prometheus, etc 30 How to monitor Kafka Streams ?

Slide 31

Slide 31 text

. 31 Kafka Consumer Lag and Offsets Maybe the most fundamental indicator to monitor Consumer KafkaStreams#allLocalStorePartitionLags() KafkaStreams#setGlobalStateRestoreListener ■ NOTE: Internal KafkaStreams Threads do not start consuming messages until stores are recovered. public interface ConsumerInterceptor extends Configurable , AutoCloseable { ConsumerRecords onConsume (ConsumerRecords record); void onCommit (Map offsets); void close(); } KafkaStreams Configured using : main.consumer.interceptor.classes How far behind the Kafka Streams consumers are from the producers ? Is the Kafka Streams application ready to process records and can serve interactive queries ?

Slide 32

Slide 32 text

. Azkarra supports a REST API for managing, monitoring and querying Kafka Streams instances. ■ Provides support for Interactive Queries ■ Built-in authentication and authorization mechanisms (Basic Auth, SSL 2-Way). ■ Allows registration of new JAX-RS resources using plugin interface: AzkarraRestExtension 32 Azkarra REST API ● Get information about the local streams instance GET /api/v1/streams ● Get the status for the streams instance GET /api/v1/streams/(string: id)/status ● Get the configuration for the streams instance GET /api/v1/streams/(string: id)/config ● Get current metrics for the streams instance GET /api/v1/streams/(string: applicationId)/metrics ● Get all metrics in Prometheus format GET /prometheus Micrometer Prometheus

Slide 33

Slide 33 text

. . . Azkarra can be configured for periodically reporting the internal states of a KafkaStreams instance. ■ Use StreamLifecycleInterceptor: ⎼ MonitoringStreamsInterceptor ■ Accepts a pluggable reporter class ⎼ Default : KafkaMonitoringReporter ⎼ Publishes events that adhere to the CloudEvents specification. 33 Putting it all together Exporting Kafka Streams States Anywhere { "id": "appid:word-count;appsrv:localhost:8080;ts:1620691200000", "source": "azkarra/ks/localhost:8080", "specversion": "1.0", "type": "io.streamthoughts.azkarra.streams.stateupdateevent", "time": "2021-05-11T00:00:00.000+0000", "datacontenttype": "application/json", "ioazkarramonitorintervalms": 10000, "ioazkarrastreamsappid": "word-count", "ioazkarraversion": "0.9.2", "ioazkarrastreamsappserver": "localhost:8080", "data": { "state": "RUNNING", "threads": [ { "name": "word-count-...-93e9a84057ad-StreamThread-1", "state": "RUNNING", "active_tasks": [], "standby_tasks": [], "clients": {} } ], "offsets": { "group": "", "consumers": [] }, "stores": { "partitionRestoreInfos": [], "partitionLagInfos": [] }, "state_changed_time": 1620691200000 } } Cloud Events

Slide 34

Slide 34 text

▢ Test the app is working as expected ▢ Externalize configuration ▢ Handle transient errors ▢ Handle deserialization exceptions Packaging (we have still 5’ left) 😬 Our TODO list 34 ▢ Expose the state of the Kafka Streams application ▢ Be able to monitor offsets and lags of consumers and state stores ▢ Interactive Queries (optional) ▢ Package the Kafka Streams application for production

Slide 35

Slide 35 text

. Azkarra-based applications can be packaged as any other Kafka Streams apps. Azkarra Worker → An empty Azkarra application ■ Topologies and components can be loaded from an external uber-jar ⎼ Similar to Kafka Connect plugins and connectors ■ Can be used as the base image for Docker ⎼ Use Jib to build optimized Docker images for Java 35 Packaging Kafka Streams with Azkarra $ docker run --net host streamthoughts/azkarra-streams-worker:latest \ -v ./application.conf=/etc/azkarra/azkarra.conf \ -v ./local-topologies=/usr/share/azkarra-components/ \ streamthoughts/azkarra-streams-worker Jib + Docker + Azkarra = ❤

Slide 36

Slide 36 text

. Using Kubernetes, topologies can be downloaded and mount using an init-container. 36 Deploying Kafka Streams with Azkarra (in Kubernetes) Deployment, StatefulSet, or... Container (image: azkarra-worker) InitContainer my-topology-with-dependencies-1.0.jar HTTP GET / Repository Manager e.g., Nexus / Artifactory Shared volume /var/lib/components/ azkarra.component.paths

Slide 37

Slide 37 text

▢ Test the app is working as expected ▢ Externalize configuration ▢ Handle transient errors ▢ Handle deserialization exceptions In less than 30 min using Azkarra🚀 DONE 37 ▢ Expose the state of the Kafka Streams application ▢ Be able to monitor offsets and lags of consumers and state stores ▢ Interactive Queries (optional) ▢ Package the Kafka Streams application for production

Slide 38

Slide 38 text

38 Demo (new coins...we have still 5’ left)🤫

Slide 39

Slide 39 text

. Kafka Streams is a very good choice to quickly create streaming applications. But, building applications for production can be a lot of work. Azkarra aims to be a fast path for production by providing all the cool features you need: ■ Built-in mechanisms for handling exceptions ■ Built-in REST API for executing Interactive Queries. ■ Consumers Offsets Lag ■ Topology Visualization ■ Dashboard UI Take Aways Conclusion 39

Slide 40

Slide 40 text

. ■ Add support for querying stale stores. ■ Add support for deploying and managing Kafka Streams topologies directly into Kubernetes ❏ i.e., KubStreamsExecutionEnvironment ■ Enhance the WebUI to add some visualizations for the key metrics to monitor. Take Aways Roadmap 40

Slide 41

Slide 41 text

. Official Website: https://www.azkarrastreams.io/ GitHub: https://github.com/streamthoughts/azkarra-streams (for contributing and adding⭐) Slack: https://communityinviter.com/apps/azkarra-streams/azkarra-streams-community Demo: https://github.com/streamthoughts/demo-kafka-streams-scottify Take Aways Links 41 Join us on Slack!

Slide 42

Slide 42 text

Thank you @fhussonnois Florian HUSSONNOIS ▪ fl[email protected]

Slide 43

Slide 43 text

. 43 Azkarra Dashboard

Slide 44

Slide 44 text

. 44 Azkarra Dashboard

Slide 45

Slide 45 text

. Images ■ Photo by Mark König on Unsplash ■ Photo by CHUTTERSNAP on Unsplash 45 Images & Icons