Stream processing with ksqlDB and Apache Kafka

Stream processing with ksqlDB and Apache Kafka @KeithResar Kafka Developer
conﬂuent.io

1 2 3 Data Integration (a primer!) Kafka + data
transformation (another primer) ksqlDB - SQL interface to transform data in motion

A B Data Source Data Target Data Integration Relational Database
NoSQL, HBase Application Logs Custom Data

A B Data Source 1 Data Target 2 Dead Letters
3 Direct Data Integration MySQL Salesforce

A B Data Source 1 Data Target 2 Custom Integration
3 Custom Data Integration

Stateless transformations are easy but they limit your long-term capability.
Stateful transformations add dependencies and a lot of complicated scaffolding to build and maintain. You will do these wrong. Consider how to recover from failure or restart your integration following a new version. Custom Data Integration FAIL A B Ephemeral isn’t useful, stateful is hard. 1 of 3

The ﬁrst one sucks, and every one after that is
even worse. Tight coupling slows development velocity and adds operational risk. ➤ A → B, sure that’s doable. What about A → C? A → D? (when is it too much?) ➤ How do you manage different encoding, transformations, schemas? Custom Data Integration FAIL A B Point to point scales like bags of rocks. 2 of 3

Frequent advice recommends avoiding premature scaling. Where will you invest
development, testing, and ops time? ➤ Build for scale from day one or rebuild? ➤ What drives scale - trafﬁc volume or longer (synchronous) processing requirements? ➤ Support scale out, coordination / clustering, work delegation? Custom Data Integration FAIL A B 3 of 3 Design for now but fail with scale.

A B Data Source 1 Data Target 2 Kafka Integration
3 Kafka Data Integration

Kafka a Closer Look A B Topic Individual event records
Consumer offset

Kafka Connect (quick sidebar) A B Connectors deliver code free
data ingest / egress

Instantly Connect Popular Data Sources & Sinks Data Diode 120+
pre-built connectors 90+ Confluent Developed & Supported 30+ Partner Supported, Confluent Verified

Kafka Data Transformation A B Single Message Transforms in Connect.
Offer basic stateless routing, key changes, column changes.

Kafka Data Transformation A B Single Message Transforms in Connect.
Offer basic stateless routing, key changes, column changes. z // Example Single Message Transform (SMT) // { "name": "exampleSMTRouter", "config": { …… "transforms": "routeRecords", "transforms.routeRecords.type": "org.apache.kafka.connect.transforms.RegexRouter", "transforms.routeRecords.regex": "(.*)", "transforms.routeRecords.replacement": "$1-test" …… } }

Kafka Data Transformation A B KStreams App Advanced message transforms
in Java Source Topic Destination Topic

Kafka Data Transformation A B KStreams App Advanced message transforms
in Java Source Topic Destination Topic z // Simple word count app using KStreams library // val builder = new StreamsBuilder() val textLines: KStream[String, String] = builder.stream[String, String]("streams-plaintext-input") val wordCounts: KTable[String, Long] = textLines.flatMapValues( textLine => textLine.toLowerCase.split("\\W+")) .groupBy((_, word) => word) .count() wordCounts.toStream.to("streams-wordcount-output") val streams: KafkaStreams = new KafkaStreams(builder.build(), config) // Simple word count app using KStreams library // val builder = new StreamsBuilder() val textLines: KStream[String, String] = builder.stream[String, String](" streams-plaintext-input") val wordCounts: KTable[String, Long] = textLines.flatMapValues( textLine => textLine.toLowerCase.split("\\W+")) .groupBy((_, word) => word) .count() wordCounts.toStream.to("streams-wordcount-output") val streams: KafkaStreams = new KafkaStreams(builder.build(), config) // Simple word count app using KStreams library // val builder = new StreamsBuilder() val textLines: KStream[String, String] = builder.stream[String, String]("streams-plaintext-input") val wordCounts: KTable[String, Long] = textLines.flatMapValues( textLine => textLine.toLowerCase.split("\\W+")) .groupBy((_, word) => word) .count() wordCounts.toStream.to("streams-wordcount-output") val streams: KafkaStreams = new KafkaStreams(builder.build(), config) // Simple word count app using KStreams library // val builder = new StreamsBuilder() val textLines: KStream[String, String] = builder.stream[String, String]("streams-plaintext-input") val wordCounts: KTable[String, Long] = textLines.flatMapValues( textLine => textLine.toLowerCase.split("\\W+")) .groupBy((_, word) => word) .count() wordCounts.toStream.to("streams-wordcount-output") val streams: KafkaStreams = new KafkaStreams(builder.build(), config) // Simple word count app using KStreams library // val builder = new StreamsBuilder() val textLines: KStream[String, String] = builder.stream[String, String]("streams-plaintext-input") val wordCounts: KTable[String, Long] = textLines.flatMapValues( textLine => textLine.toLowerCase.split("\\W+")) .groupBy((_, word) => word) .count() wordCounts.toStream.to("streams-wordcount-output") val streams: KafkaStreams = new KafkaStreams(builder.build(), config) // Simple word count app using KStreams library // val builder = new StreamsBuilder() val textLines: KStream[String, String] = builder.stream[String, String]("streams-plaintext-input") val wordCounts: KTable[String, Long] = textLines.flatMapValues( textLine => textLine.toLowerCase.split("\\W+")) .groupBy((_, word) => word) .count() wordCounts.toStream.to("streams-wordcount-output") val streams: KafkaStreams = new KafkaStreams(builder.build(), config) // Simple word count app using KStreams library // val builder = new StreamsBuilder() val textLines: KStream[String, String] = builder.stream[String, String]("streams-plaintext-input") val wordCounts: KTable[String, Long] = textLines.flatMapValues( textLine => textLine.toLowerCase.split("\\W+")) .groupBy((_, word) => word) .count() wordCounts.toStream.to(" streams-wordcount-output") val streams: KafkaStreams = new KafkaStreams(builder.build(), config)

Kafka Data Transformation A B ksqlDB Advanced message transforms in
Java SQL! Source Topic Destination Topic

Demo Data Schema flights airlines positions code name KEY flight
code flight_num landing_time takeoff_time KEY flight timestamp_ms altitude lat lon KEY

ﬂights airlines positions Demo Data Schema

ﬂights airlines positions code name KEY Static Data, mapping airline
codes with airline names. Used to dereference encoded ﬂight names with something more human readable. code name DAL Delta Air Lines SWA Southwest Airlines Demo Data Schema

flights airlines positions Streaming Data, updated as each new flight
plan is registered and when the flight status changes. flight code flight_num takeoff_time landing_time DAL1232 DAL 1232 1620337680000 null SWA345 SWA 345 1620335280000 1620338340000 flight code flight_num landing_time takeoff_time KEY Demo Data Schema

flights airlines positions Streaming Data, updated with each new position
report throughout the flight. flight timestamp_ms altitude lat lon SKW3984 1620314506000 35000 45.701 -104.29 Demo Data Schema flight timestamp_ms altitude lat lon KEY

window n Windowed Aggregation time Tumbling window n window n+1
window n+2 Hopping window n+1 window n+2 Session window n window n+1 Δt>inactivity gap

Integration Getting data from A → B Custom integrations are
evil Kafka + Transformation Kafka loosely coupled integration Transformation via Connect, SMT, or KStreams ksqlDB SQL interface to streaming data Approachable and viable for production scale

Where to go from here? 5 Resources Look in the
chat for links to each (including swag!)

ksqlDB.io documentation, examples, code

github examples Query cookbook, quick starts, code

Conﬂuent Cloud Free access with new accounts

Thank You @KeithResar Kafka Developer conﬂuent.io

Stream processing with ksqlDB and Apache Kafka

Stream processing with ksqlDB and Apache Kafka

Keith Resar

More Decks by Keith Resar

Other Decks in Programming

Featured

Transcript