Slide 1

Slide 1 text

Stream processing with ksqlDB and Apache Kafka @KeithResar Kafka Developer confluent.io

Slide 2

Slide 2 text

1 2 3 Data Integration (a primer!) Kafka + data transformation (another primer) ksqlDB - SQL interface to transform data in motion

Slide 3

Slide 3 text

A B Data Source Data Target Data Integration Relational Database NoSQL, HBase Application Logs Custom Data

Slide 4

Slide 4 text

A B Data Source 1 Data Target 2 Dead Letters 3 Direct Data Integration MySQL Salesforce

Slide 5

Slide 5 text

A B Data Source 1 Data Target 2 Custom Integration 3 Custom Data Integration

Slide 6

Slide 6 text

Stateless transformations are easy but they limit your long-term capability. Stateful transformations add dependencies and a lot of complicated scaffolding to build and maintain. You will do these wrong. Consider how to recover from failure or restart your integration following a new version. Custom Data Integration FAIL A B Ephemeral isn’t useful, stateful is hard. 1 of 3

Slide 7

Slide 7 text

The first one sucks, and every one after that is even worse. Tight coupling slows development velocity and adds operational risk. ➤ A → B, sure that’s doable. What about A → C? A → D? (when is it too much?) ➤ How do you manage different encoding, transformations, schemas? Custom Data Integration FAIL A B Point to point scales like bags of rocks. 2 of 3

Slide 8

Slide 8 text

Frequent advice recommends avoiding premature scaling. Where will you invest development, testing, and ops time? ➤ Build for scale from day one or rebuild? ➤ What drives scale - traffic volume or longer (synchronous) processing requirements? ➤ Support scale out, coordination / clustering, work delegation? Custom Data Integration FAIL A B 3 of 3 Design for now but fail with scale.

Slide 9

Slide 9 text

A B Data Source 1 Data Target 2 Kafka Integration 3 Kafka Data Integration

Slide 10

Slide 10 text

Kafka a Closer Look A B Topic Individual event records Consumer offset

Slide 11

Slide 11 text

Kafka Connect (quick sidebar) A B Connectors deliver code free data ingest / egress

Slide 12

Slide 12 text

Instantly Connect Popular Data Sources & Sinks Data Diode 120+ pre-built connectors 90+ Confluent Developed & Supported 30+ Partner Supported, Confluent Verified

Slide 13

Slide 13 text

Kafka Data Transformation A B Single Message Transforms in Connect. Offer basic stateless routing, key changes, column changes.

Slide 14

Slide 14 text

Kafka Data Transformation A B Single Message Transforms in Connect. Offer basic stateless routing, key changes, column changes. z // Example Single Message Transform (SMT) // { "name": "exampleSMTRouter", "config": { …… "transforms": "routeRecords", "transforms.routeRecords.type": "org.apache.kafka.connect.transforms.RegexRouter", "transforms.routeRecords.regex": "(.*)", "transforms.routeRecords.replacement": "$1-test" …… } }

Slide 15

Slide 15 text

Kafka Data Transformation A B KStreams App Advanced message transforms in Java Source Topic Destination Topic

Slide 16

Slide 16 text

Kafka Data Transformation A B KStreams App Advanced message transforms in Java Source Topic Destination Topic z // Simple word count app using KStreams library // val builder = new StreamsBuilder() val textLines: KStream[String, String] = builder.stream[String, String]("streams-plaintext-input") val wordCounts: KTable[String, Long] = textLines.flatMapValues( textLine => textLine.toLowerCase.split("\\W+")) .groupBy((_, word) => word) .count() wordCounts.toStream.to("streams-wordcount-output") val streams: KafkaStreams = new KafkaStreams(builder.build(), config) // Simple word count app using KStreams library // val builder = new StreamsBuilder() val textLines: KStream[String, String] = builder.stream[String, String](" streams-plaintext-input") val wordCounts: KTable[String, Long] = textLines.flatMapValues( textLine => textLine.toLowerCase.split("\\W+")) .groupBy((_, word) => word) .count() wordCounts.toStream.to("streams-wordcount-output") val streams: KafkaStreams = new KafkaStreams(builder.build(), config) // Simple word count app using KStreams library // val builder = new StreamsBuilder() val textLines: KStream[String, String] = builder.stream[String, String]("streams-plaintext-input") val wordCounts: KTable[String, Long] = textLines.flatMapValues( textLine => textLine.toLowerCase.split("\\W+")) .groupBy((_, word) => word) .count() wordCounts.toStream.to("streams-wordcount-output") val streams: KafkaStreams = new KafkaStreams(builder.build(), config) // Simple word count app using KStreams library // val builder = new StreamsBuilder() val textLines: KStream[String, String] = builder.stream[String, String]("streams-plaintext-input") val wordCounts: KTable[String, Long] = textLines.flatMapValues( textLine => textLine.toLowerCase.split("\\W+")) .groupBy((_, word) => word) .count() wordCounts.toStream.to("streams-wordcount-output") val streams: KafkaStreams = new KafkaStreams(builder.build(), config) // Simple word count app using KStreams library // val builder = new StreamsBuilder() val textLines: KStream[String, String] = builder.stream[String, String]("streams-plaintext-input") val wordCounts: KTable[String, Long] = textLines.flatMapValues( textLine => textLine.toLowerCase.split("\\W+")) .groupBy((_, word) => word) .count() wordCounts.toStream.to("streams-wordcount-output") val streams: KafkaStreams = new KafkaStreams(builder.build(), config) // Simple word count app using KStreams library // val builder = new StreamsBuilder() val textLines: KStream[String, String] = builder.stream[String, String]("streams-plaintext-input") val wordCounts: KTable[String, Long] = textLines.flatMapValues( textLine => textLine.toLowerCase.split("\\W+")) .groupBy((_, word) => word) .count() wordCounts.toStream.to("streams-wordcount-output") val streams: KafkaStreams = new KafkaStreams(builder.build(), config) // Simple word count app using KStreams library // val builder = new StreamsBuilder() val textLines: KStream[String, String] = builder.stream[String, String]("streams-plaintext-input") val wordCounts: KTable[String, Long] = textLines.flatMapValues( textLine => textLine.toLowerCase.split("\\W+")) .groupBy((_, word) => word) .count() wordCounts.toStream.to(" streams-wordcount-output") val streams: KafkaStreams = new KafkaStreams(builder.build(), config)

Slide 17

Slide 17 text

No content

Slide 18

Slide 18 text

Kafka Data Transformation A B ksqlDB Advanced message transforms in Java SQL! Source Topic Destination Topic

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

No content

Slide 21

Slide 21 text

Demo Data Schema flights airlines positions code name KEY flight code flight_num landing_time takeoff_time KEY flight timestamp_ms altitude lat lon KEY

Slide 22

Slide 22 text

flights airlines positions Demo Data Schema

Slide 23

Slide 23 text

flights airlines positions code name KEY Static Data, mapping airline codes with airline names. Used to dereference encoded flight names with something more human readable. code name DAL Delta Air Lines SWA Southwest Airlines Demo Data Schema

Slide 24

Slide 24 text

flights airlines positions Streaming Data, updated as each new flight plan is registered and when the flight status changes. flight code flight_num takeoff_time landing_time DAL1232 DAL 1232 1620337680000 null SWA345 SWA 345 1620335280000 1620338340000 flight code flight_num landing_time takeoff_time KEY Demo Data Schema

Slide 25

Slide 25 text

flights airlines positions Streaming Data, updated with each new position report throughout the flight. flight timestamp_ms altitude lat lon SKW3984 1620314506000 35000 45.701 -104.29 Demo Data Schema flight timestamp_ms altitude lat lon KEY

Slide 26

Slide 26 text

window n Windowed Aggregation time Tumbling window n window n+1 window n+2 Hopping window n+1 window n+2 Session window n window n+1 Δt>inactivity gap

Slide 27

Slide 27 text

Integration Getting data from A → B Custom integrations are evil Kafka + Transformation Kafka loosely coupled integration Transformation via Connect, SMT, or KStreams ksqlDB SQL interface to streaming data Approachable and viable for production scale

Slide 28

Slide 28 text

Where to go from here? 5 Resources Look in the chat for links to each (including swag!)

Slide 29

Slide 29 text

ksqlDB.io documentation, examples, code

Slide 30

Slide 30 text

github examples Query cookbook, quick starts, code

Slide 31

Slide 31 text

Confluent Cloud Free access with new accounts

Slide 32

Slide 32 text

Thank You @KeithResar Kafka Developer confluent.io