Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Kafka Summit 2021: Writing Blazing Fast, and Production-Ready Kafka Streams apps in less than 30 min

Kafka Summit 2021: Writing Blazing Fast, and Production-Ready Kafka Streams apps in less than 30 min

If you have already worked on various Kafka Streams applications before, then you have probably found yourself in the situation of rewriting the same piece of code again and again.

Whether it's to manage processing failures or bad records, to use interactive queries, to organize your code, to deploy or to monitor your Kafka Streams app, build some in-house libraries to standardize common patterns across your projects seems to be unavoidable.

And, if you're new to Kafka Streams you might be interested to know what are those patterns to use for your next streaming project.

In this talk, I propose to introduce you to Azkarra, an open-source lightweight Java framework that was designed to provide most of that stuffs off-the-shelf by leveraging the best-of-breed ideas and proven practices from the Apache Kafka community.

7d679b46aff86d0110b7934152bac035?s=128

Florian Hussonnois

May 12, 2021
Tweet

Transcript

  1. Writing Blazing Fast, and Production Ready Kafka Streams apps (in

    less than 30 min) using Azkarra Kafka Summit Europe 2021 Florian HUSSONNOIS
  2. . @fhussonnois Consultant, Trainer Software Engineer Co-founder @StreamThoughts Confluent Community

    Catalyst (2019/2021) Apache Kafka Streams contributor Open Source Technology Enthusiastic - Azkarra Streams - Kafka Connect File Pulse - Kafka Streams CEP - Kafka Client for Kotlin Hi, Im Florian Hussonnois 2
  3. 3 Like me, you probably started with the famous Word

    Count ! KStream<String, String> source = builder.stream("streams-plaintext-input"); source.flatMapValues(splitAndToLowercase()) .groupBy((key, value) -> value) .count(Materialized.as("counts-store")) .toStream() .to("streams-wordcount-output", Produced.with(Serdes.String(), Serdes.Long())); Topology topology = builder.build();
  4. 4 KStream<String, String> source = builder.stream("streams-plaintext-input"); source.flatMapValues(splitAndToLowercase()) .groupBy((key, value) ->

    value) .count(Materialized.as("counts-store")) .toStream() .to("streams-wordcount-output", Produced.with(Serdes.String(), Serdes.Long())); Topology topology = builder.build(); GroupBy(Key) Repartition Stateful Stream Processing Consume Transform Aggregate / Join Produce 1 2 3
  5. public class WordCount { public static void main(String[] args) {

    var builder = new StreamsBuilder (); KStream<String, String> source = builder.stream("streams-plaintext-input" ); source.flatMapValues(splitAndToLowercase ()) .groupBy((key, value) -> value) .count(Materialized.as("counts-store" )) .toStream() .to("streams-wordcount-output" , Produced.with(Serdes.String(), Serdes.Long())); var topology = builder.build(); Properties props = new Properties(); props.put(StreamsConfig.APPLICATION_ID_CONFIG , "streams-wordcount" ); props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG , "localhost:9092" ); props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG , Serdes.String().getClass()); props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG , Serdes.String().getClass()); var streams = new KafkaStreams(topology, props); Runtime.getRuntime().addShutdownHook (new Thread(streams::close )); } } Core Logic Execution 5 Configuration
  6. 6 Can we deploy a Kafka Streams application like this

    one in production, without any changes?
  7. 7 The Answer is No!

  8. 8 (Well, unless you are testing your app in production…cough,

    cough...)
  9. 9 (Well, unless you are testing your app in production…cough,

    cough...) OK, Nobody does that!
  10. ▢ Test the app is working as expected ▢ Externalize

    configuration ▢ Handle transient errors ▢ Handle deserialization exceptions Some requirements before moving into production Our TODO list 10 ▢ Expose the state of the Kafka Streams application ▢ Be able to monitor offsets and lags of consumers and state stores ▢ Interactive Queries (optional) ▢ Package the Kafka Streams application for production
  11. . Business Value vs Effort Topology (Business Logic) Business Value

    High Kafka Streams Management IQ Error Handling logic Monitoring / Health-check Security Config Externalization Low Effort Low/Medium High Streams Lifecycle Kafka Streams Application 11 RocksDB Offsets and Lags Packaging
  12. . A lightweight Java framework to make a Kafka Streams

    application production-ready in just a few lines of code. ▪ Distributed under the Apache License 2.0. ▪ Was developed based on experience on a wide range of projects ▪ Uses best-practices developed by Kafka users and the open-source community. Overview: ▪ REST API: Health Check, Monitoring, Interactive Queries, etc ▪ Embedded WebUI: Topology DAG Visualization ▪ Built-in features for handling exceptions and tuning RocksDB ▪ Support for Server-Sent Events Azkarra Framework in a nutshell 12 #azkarrastreams
  13. . Available on Maven Central Azkarra Stream How to use

    It ? 13 <dependency> <groupId>io.streamthoughts </groupId> <artifactId>azkarra-streams </artifactId> <version>0.9.2</version> </dependency> Azkarra Framework: <dependency> <groupId>io.streamthoughts </groupId> <artifactId>azkarra-commons </artifactId> <version>0.9.2</version> </dependency> Provides reusable classes for Kafka Streams : mvn archetype:generate -DarchetypeGroupId =io.streamthoughts \ -DarchetypeArtifactId =azkarra-quickstart-java \ -DarchetypeVersion =0.9.2 \ -DgroupId=azkarra.streams \ -DartifactId=azkarra-getting-started \ -Dversion=1.0 \ -Dpackage=azkarra \ -DinteractiveMode =false Quick start:
  14. 14 Let’s re-write the “Word Count” using with Azkarra (we

    have still 25’ left) 👾
  15. . . . Concepts TopologyProvider Topology Provider Topology Container for

    building and configuring a Topology 15 class WordCountTopology implements TopologyProvider, Configurable { private Conf conf; @Override public Topology topology() { var source = conf.getString("topic.source.name"); var sink = conf.getString("topic.sink.name"); var store = conf.getString("store.name"); var builder = new StreamsBuilder(); builder .<String, String>stream(source) .flatMapValues(splitAndToLowercase()) .groupBy((key, value) -> value) .count(Materialized.as(store)) .toStream() .to(sink, Produced.with(Serdes.String(), Serdes.Long())); return builder.build(); } @Override public void configure(final Conf conf) { this.conf = conf; } @Override public String version() { return "1.0"; } }
  16. . . . Concepts Execution Environment StreamsExecution Environment Manages the

    life cycle of KafkaStreams instances. Topology Provider Topology 16 // (1) Define the KafkaStreams configuration var streamsConfig = Conf.of( BOOTSTRAP_SERVERS_CONFIG, "localhost:9092", DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass(), DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass() ); // (2) Define the Topology configuration var topologyConfig = Conf.of( "topic.source.name", "topic-text-lines", "topic.sink.name", "topic-text-word-count", "store.name", "Count" ); // (3) Create and configure a local execution environment var env = LocalStreamsExecutionEnvironment .create(Conf.of("streams", streamsConfig)) // (4) Register our topology to run .registerTopology( WordCountTopology::new, Executed.as("WordCount").withConfig(topologyConfig) ); // (5) Start the environment env.start(); // (6) Add Shutdown Hook Runtime.getRuntime() .addShutdownHook(new Thread(env::stop));
  17. . 17 Let’s start KafkaStreams Boom! Transient Errors word-count-1-0-ae1a9bf9-101d-4796-ad36-2e1130e83573-StreamThread-1] Received

    error code INCOMPLETE_SOURCE_TOPIC_METADATA 16:05:12.585 [word-count-1-0-ae1a9bf9-101d-4796-ad36-2e1130e83573-StreamThread-1] ERROR org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=word-count-1-0-ae1a9bf9-101d-4796-ad36-2e1130e83573-StreamThread-1-consumer, groupId=word-count-1-0] User provided listener org.apache.kafka.streams.processor.internals.StreamsRebalanceListener failed on invocation of onPartitionsAssigned for partitions [] org.apache.kafka.streams.errors.MissingSourceTopicException: One or more source topics were missing during rebalance at org.apache.kafka.streams.processor.internals.StreamsRebalanceListener.onPartitionsAssigned(StreamsRebalanceListener.java:57) ~[kafka-streams-2.7.0.jar:?] at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.invokePartitionsAssigned(ConsumerCoordinator.java:293) [kafka-clients-2.7.0.jar:?] at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:430) [kafka-clients-2.7.0.jar:?] at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:451) [kafka-clients-2.7.0.jar:?] at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:367) [kafka-clients-2.7.0.jar:?] at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:508) [kafka-clients-2.7.0.jar:?]
  18. . . . 18 StreamLifecycleInterceptor Concepts Interface StreamsLifecycleInterceptor { /**

    * Intercepts the streams instance before being started. */ default void onStart(StreamsLifecycleContext context, StreamsLifecycleChain chain) { chain.execute(); } /** * Intercepts the streams instance before being stopped. */ default void onStop(StreamsLifecycleContext context, StreamsLifecycleChain chain) { chain.execute(); } /** * Used for logging information. */ default String name() { return getClass().getSimpleName(); } } A pluggable interface that allows intercepting a KafkaStreams instance before being started or stopped. Built-in Implementations: ▪ AutoCreateTopicsInterceptor ▪ WaitForSourceTopicsInterceptor ▪ KafkaBrokerReadyInterceptor ...and a few more (discussed later) 😉 Most Interceptors are configurable.
  19. . . . 19 AutoCreateTopicsInterceptor Concepts import static io.s.a.r.i.AutoCreateTopicsInterceptorConfig.*; //

    (1) Define the KafkaStreams configuration var streamsConfig = ... // (2) Define the Topology configuration var topologyConfig = ... // (3) Define the Environment configuration var envConfig = Conf.of( "streams", streamsConfig, AUTO_CREATE_TOPICS_NUM_PARTITIONS_CONFIG, 2, AUTO_CREATE_TOPICS_REPLICATION_FACTOR_CONFIG, 1, // WARN - ONLY DURING DEVELOPMENT AUTO_DELETE_TOPICS_ENABLE_CONFIG, true ); // (4) Create and configure the local execution environment LocalStreamsExecutionEnvironment .create(envConfig) // (5) Add the StreamLifecycleInterceptor .addStreamsLifecycleInterceptor( AutoCreateTopicsInterceptor::new ) // ...code omitted for clarity Automatically infers the source and sink topics to be created from the Topology.describe(). ▪ Internally, uses the AdminClient API. ▪ Can be used during development for deleting all topics when the instance is stopped. for
  20. ▢ Test the app is working as expected ▢ Externalize

    configuration ▢ Handle transient errors ▢ Handle deserialization exceptions Externalizing configuration (we have 20’ left)😀 What's left to do ? 20 ▢ Expose the state of the Kafka Streams application ▢ Be able to monitor offsets and lags of consumers and state stores ▢ Interactive Queries (optional) ▢ Package the Kafka Streams application for production
  21. . . . 21 Conf & AzkarraConf External Configuration //

    file:application.conf azkarra { // The configuration settings passed to the Kafka Streams // instance should be prefixed with `.streams` streams { bootstrap.servers = "localhost:9092" default.key.serde = "org.apache.kafka..Serdes$StringSerde" default.value.serde = "org.apache.kafka..Serdes$StringSerde" } topic.source.name = "topic-text-lines" topic.sink.name = "topic-text-word-count" store.name = "Count" auto.create.topics.num.partitions = 2 auto.create.topics.replication.factor = 1 auto.delete.topics.enable = true } // file:Main.class var config = AzkarraConf.create().getSubConf("azkarra"); Azkarra provides the Configurable interface which can be implemented by most of the Azkarra components. ▪ AzkarraConf: Uses the Lightbend Config library. ◦ Allows loading configuration settings from HOCON files. void configure(final Conf configuration);
  22. . . . Concepts AzkarraContext AzkarraContext StreamsExecution Environment Container for

    Dependency Injection. Used to automatically configures streams execution environments. Topology Provider Topology 22 public static void main(final String[] args) { // (1) Load the configuration (application.conf) var config = AzkarraConf.create().getSubConf("azkarra"); // (2) Create the Azkarra Context var context = DefaultAzkarraContext.create(config); // (3) Register StreamLifecycleInterceptor as component context.registerComponent( ConsoleStreamsLifecycleInterceptor.class ); // (4) Register the Topology to the default environment context.addTopology( WordCountTopology.class, Executed.as("word-count") ); // (5) Start the context context .setRegisterShutdownHook(true) .start(); }
  23. . . . Concepts AzkarraApplication AzkarraContext AzkarraApplication StreamsExecution Environment Used

    to bootstrap and configure an Azkarra application. Provides Embedded HTTP-Server Provides Component Scanning Topology Provider Topology 23 public class WordCount { public static void main(final String[] args) { // (1) Load the configuration (application.conf) var config = AzkarraConf.create(); // (2) Create the Azkarra Context var context = DefaultAzkarraContext.create(); // (3) Register the Topology to the default environment context.addTopology( WordCountTopology.class, Executed.as("word-count") ); // (4) Create Azkarra application new AzkarraApplication() .setContext(context) .setConfiguration(config) // (5) Enable and configure embedded HTTP server .setHttpServerEnable(true) .setHttpServerConf(ServerConfig.newBuilder() .setListener("localhost") .setPort(8080) .build() ) // (6) Start Azkarra .run(args); } }
  24. . . . Concepts AzkarraApplication AzkarraContext AzkarraApplication StreamsExecution Environment Topology

    Provider Topology 24 @AzkarraStreamsApplication public class WordCount { public static void main(String[] args) { AzkarraApplication.run(WordCount.class, args); } @Component public static class WordCountTopology implements TopologyProvider, Configurable { private Conf conf; @Override public Topology topology() { var builder = new StreamsBuilder(); // ...code omitted for clarity return builder.build(); } @Override public void configure(Conf conf) { this.conf = conf; } @Override public String version() { return "1.0"; } } } Used to bootstrap and configure an Azkarra application. Provides Embedded HTTP-Server Provides Component Scanning
  25. ▢ Test the app is working as expected ▢ Externalize

    configuration ▢ Handle transient errors ▢ Handle deserialization exceptions Handling Deserialization Exceptions (we have 15’ left)🤔 What's left to do ? 25 ▢ Expose the state of the Kafka Streams application ▢ Be able to monitor offsets and lags of consumers and state stores ▢ Interactive Queries (optional) ▢ Package the Kafka Streams application for production
  26. . default.deserialization.exception.handler ▪ CONTINUE: continue with processing ▪ FAIL: fail

    the processing and stop Two available implementations : ▪ LogAndContinueExceptionHandler ▪ LogAndFailExceptionHandler 26 Solution #1 Built-in mechanisms Not really suitable for production. Cannot monitor efficiently corrupted messages
  27. . . . 27 Solution #2 Dead Letter Queue Topic

    Solution #3 Sentinel Value DeserializationExceptionHandler Send corrupted messages to a special topic. Deserializer<T> Catch any exception thrown during deserialization and return a default value (e.g: null, “N/A”, etc). Handler ? Source Topic Topology (skip) Dead Letter Topic ! ! ! ! Source Topic SafeDeserializer Delegate Deserializer (null)(null) ! !
  28. . . . 28 Solution #2 Using Azkarra Solution #3

    DeadLetterTopicExceptionHandler ▪ By default, sends corrupted records to <Topic>-rejected ▪ Doesn’t change the schema/format of the corrupted message. ▪ Use Kafka Headers to trace exception cause and origin, e.g. : ◦ __errors.exception.stacktrace __errors.exception.message ◦ __errors.exception.class.name ◦ __errors.timestamp ◦ __errors.application.id ◦ __errors.record.[topic|partition|offset] ▪ Can be configured to send records to a distinct Kafka Cluster than the one used for KafkaStreams. SafeSerdes SafeSerdes.Long(-1L); SafeSerdes.UUID(null); SafeSerdes.serdeFrom( new JsonSerializer (), new JsonDeserializer (), NullNode.getInstance () );
  29. ▢ Test the app is working as expected ▢ Externalize

    configuration ▢ Handle transient errors ▢ Handle deserialization exceptions Monitoring (we have 10’ left)🙃 Our TODO list 29 ▢ Expose the state of the Kafka Streams application ▢ Be able to monitor offsets and lags of consumers and state stores ▢ Interactive Queries (optional) ▢ Package the Kafka Streams application for production
  30. . The Kafka Streams API provides few methods for monitoring

    the state of the running instance. ▪ KafkaStreams#state(), KafkaStreams#setStateListener() ⎼ CREATED, REBALANCING, RUNNING, PENDING_SHUTDOWN, NOT_RUNNING, ERROR ⎼ can be used for checking the Liveness and Readiness for the instance. ▪ KafkaStreams#localThreadsMetadata ⎼ returns information about local Threads/Tasks and partition assignments. ▪ KafkaStreams#metrics() Best Practices: ▪ Build some REST APIs to expose the states of Kafka Streams ▪ Export Metrics using JMX, Prometheus, etc 30 How to monitor Kafka Streams ?
  31. . 31 Kafka Consumer Lag and Offsets Maybe the most

    fundamental indicator to monitor Consumer KafkaStreams#allLocalStorePartitionLags() KafkaStreams#setGlobalStateRestoreListener ▪ NOTE: Internal KafkaStreams Threads do not start consuming messages until stores are recovered. public interface ConsumerInterceptor <K, V> extends Configurable , AutoCloseable { ConsumerRecords <K, V> onConsume (ConsumerRecords <K, V> record); void onCommit (Map<TopicPartition , OffsetAndMetadata > offsets); void close(); } KafkaStreams Configured using : main.consumer.interceptor.classes How far behind the Kafka Streams consumers are from the producers ? Is the Kafka Streams application ready to process records and can serve interactive queries ?
  32. . Azkarra supports a REST API for managing, monitoring and

    querying Kafka Streams instances. ▪ Provides support for Interactive Queries ▪ Built-in authentication and authorization mechanisms (Basic Auth, SSL 2-Way). ▪ Allows registration of new JAX-RS resources using plugin interface: AzkarraRestExtension 32 Azkarra REST API • Get information about the local streams instance GET /api/v1/streams • Get the status for the streams instance GET /api/v1/streams/(string: id)/status • Get the configuration for the streams instance GET /api/v1/streams/(string: id)/config • Get current metrics for the streams instance GET /api/v1/streams/(string: applicationId)/metrics • Get all metrics in Prometheus format GET /prometheus Micrometer Prometheus
  33. . . . Azkarra can be configured for periodically reporting

    the internal states of a KafkaStreams instance. ▪ Use StreamLifecycleInterceptor: ⎼ MonitoringStreamsInterceptor ▪ Accepts a pluggable reporter class ⎼ Default : KafkaMonitoringReporter ⎼ Publishes events that adhere to the CloudEvents specification. 33 Putting it all together Exporting Kafka Streams States Anywhere { "id": "appid:word-count;appsrv:localhost:8080;ts:1620691200000", "source": "azkarra/ks/localhost:8080", "specversion": "1.0", "type": "io.streamthoughts.azkarra.streams.stateupdateevent", "time": "2021-05-11T00:00:00.000+0000", "datacontenttype": "application/json", "ioazkarramonitorintervalms": 10000, "ioazkarrastreamsappid": "word-count", "ioazkarraversion": "0.9.2", "ioazkarrastreamsappserver": "localhost:8080", "data": { "state": "RUNNING", "threads": [ { "name": "word-count-...-93e9a84057ad-StreamThread-1", "state": "RUNNING", "active_tasks": [], "standby_tasks": [], "clients": {} } ], "offsets": { "group": "", "consumers": [] }, "stores": { "partitionRestoreInfos": [], "partitionLagInfos": [] }, "state_changed_time": 1620691200000 } } Cloud Events
  34. ▢ Test the app is working as expected ▢ Externalize

    configuration ▢ Handle transient errors ▢ Handle deserialization exceptions Packaging (we have still 5’ left) 😬 Our TODO list 34 ▢ Expose the state of the Kafka Streams application ▢ Be able to monitor offsets and lags of consumers and state stores ▢ Interactive Queries (optional) ▢ Package the Kafka Streams application for production
  35. . Azkarra-based applications can be packaged as any other Kafka

    Streams apps. Azkarra Worker → An empty Azkarra application ▪ Topologies and components can be loaded from an external uber-jar ⎼ Similar to Kafka Connect plugins and connectors ▪ Can be used as the base image for Docker ⎼ Use Jib to build optimized Docker images for Java 35 Packaging Kafka Streams with Azkarra $ docker run --net host streamthoughts/azkarra-streams-worker:latest \ -v ./application.conf=/etc/azkarra/azkarra.conf \ -v ./local-topologies=/usr/share/azkarra-components/ \ streamthoughts/azkarra-streams-worker Jib + Docker + Azkarra = ❤
  36. . Using Kubernetes, topologies can be downloaded and mount using

    an init-container. 36 Deploying Kafka Streams with Azkarra (in Kubernetes) Deployment, StatefulSet, or... Container (image: azkarra-worker) InitContainer my-topology-with-dependencies-1.0.jar HTTP GET / Repository Manager e.g., Nexus / Artifactory Shared volume /var/lib/components/ azkarra.component.paths
  37. ▢ Test the app is working as expected ▢ Externalize

    configuration ▢ Handle transient errors ▢ Handle deserialization exceptions In less than 30 min using Azkarra🚀 DONE 37 ▢ Expose the state of the Kafka Streams application ▢ Be able to monitor offsets and lags of consumers and state stores ▢ Interactive Queries (optional) ▢ Package the Kafka Streams application for production
  38. 38 Demo (new coins...we have still 5’ left)🤫

  39. . Kafka Streams is a very good choice to quickly

    create streaming applications. But, building applications for production can be a lot of work. Azkarra aims to be a fast path for production by providing all the cool features you need: ▪ Built-in mechanisms for handling exceptions ▪ Built-in REST API for executing Interactive Queries. ▪ Consumers Offsets Lag ▪ Topology Visualization ▪ Dashboard UI Take Aways Conclusion 39
  40. . ▪ Add support for querying stale stores. ▪ Add

    support for deploying and managing Kafka Streams topologies directly into Kubernetes ❏ i.e., KubStreamsExecutionEnvironment ▪ Enhance the WebUI to add some visualizations for the key metrics to monitor. Take Aways Roadmap 40
  41. . Official Website: https://www.azkarrastreams.io/ GitHub: https://github.com/streamthoughts/azkarra-streams (for contributing and adding⭐)

    Slack: https://communityinviter.com/apps/azkarra-streams/azkarra-streams-community Demo: https://github.com/streamthoughts/demo-kafka-streams-scottify Take Aways Links 41 Join us on Slack!
  42. Thank you @fhussonnois Florian HUSSONNOIS ▪ florian@streamthoughts.io

  43. . 43 Azkarra Dashboard

  44. . 44 Azkarra Dashboard

  45. . Images ▪ Photo by Mark König on Unsplash ▪

    Photo by CHUTTERSNAP on Unsplash 45 Images & Icons