Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Stream processing with ksqlDB and Apache Kafka

Stream processing with ksqlDB and Apache Kafka

Kafka delivers real-time events at scale, and with libraries like KStreams, Java developers are able to transform those events. In this talk we introduce ksqlDB, which offers a SQL interface on top of KStreams to enable continuous, interactive queries without requiring any Java or Python knowledge. ksqlDB enables all Apache Kafka consumers to route messages and perform both stateful and stateless transformations to unlock new data insights. With ksqlDB, your data in motion is as accessible as the stale records traditionally locked away in a relational database.

In this session, after a brief introduction to Apache Kafka, we'll dive into using ksqlDB to manage data streams pulled - in real time - from the Minneapolis air traffic control system. During this journey, you'll learn the ins and outs behind how ksqlDB works and introduce patterns applicable more broadly in common high-volume use cases like log monitoring, insurance, financial services, and consumer retail.

B5a1b84c06bcad998068fc5557e3c8d9?s=128

Keith Resar

May 20, 2021
Tweet

Transcript

  1. Stream processing with ksqlDB and Apache Kafka @KeithResar Kafka Developer

    confluent.io
  2. 1 2 3 Data Integration (a primer!) Kafka + data

    transformation (another primer) ksqlDB - SQL interface to transform data in motion
  3. A B Data Source Data Target Data Integration Relational Database

    NoSQL, HBase Application Logs Custom Data
  4. A B Data Source 1 Data Target 2 Dead Letters

    3 Direct Data Integration MySQL Salesforce
  5. A B Data Source 1 Data Target 2 Custom Integration

    3 Custom Data Integration
  6. Stateless transformations are easy but they limit your long-term capability.

    Stateful transformations add dependencies and a lot of complicated scaffolding to build and maintain. You will do these wrong. Consider how to recover from failure or restart your integration following a new version. Custom Data Integration FAIL A B Ephemeral isn’t useful, stateful is hard. 1 of 3
  7. The first one sucks, and every one after that is

    even worse. Tight coupling slows development velocity and adds operational risk. ➤ A → B, sure that’s doable. What about A → C? A → D? (when is it too much?) ➤ How do you manage different encoding, transformations, schemas? Custom Data Integration FAIL A B Point to point scales like bags of rocks. 2 of 3
  8. Frequent advice recommends avoiding premature scaling. Where will you invest

    development, testing, and ops time? ➤ Build for scale from day one or rebuild? ➤ What drives scale - traffic volume or longer (synchronous) processing requirements? ➤ Support scale out, coordination / clustering, work delegation? Custom Data Integration FAIL A B 3 of 3 Design for now but fail with scale.
  9. A B Data Source 1 Data Target 2 Kafka Integration

    3 Kafka Data Integration
  10. Kafka a Closer Look A B Topic Individual event records

    Consumer offset
  11. Kafka Connect (quick sidebar) A B Connectors deliver code free

    data ingest / egress
  12. Instantly Connect Popular Data Sources & Sinks Data Diode 120+

    pre-built connectors 90+ Confluent Developed & Supported 30+ Partner Supported, Confluent Verified
  13. Kafka Data Transformation A B Single Message Transforms in Connect.

    Offer basic stateless routing, key changes, column changes.
  14. Kafka Data Transformation A B Single Message Transforms in Connect.

    Offer basic stateless routing, key changes, column changes. z // Example Single Message Transform (SMT) // { "name": "exampleSMTRouter", "config": { …… "transforms": "routeRecords", "transforms.routeRecords.type": "org.apache.kafka.connect.transforms.RegexRouter", "transforms.routeRecords.regex": "(.*)", "transforms.routeRecords.replacement": "$1-test" …… } }
  15. Kafka Data Transformation A B KStreams App Advanced message transforms

    in Java Source Topic Destination Topic
  16. Kafka Data Transformation A B KStreams App Advanced message transforms

    in Java Source Topic Destination Topic z // Simple word count app using KStreams library // val builder = new StreamsBuilder() val textLines: KStream[String, String] = builder.stream[String, String]("streams-plaintext-input") val wordCounts: KTable[String, Long] = textLines.flatMapValues( textLine => textLine.toLowerCase.split("\\W+")) .groupBy((_, word) => word) .count() wordCounts.toStream.to("streams-wordcount-output") val streams: KafkaStreams = new KafkaStreams(builder.build(), config) // Simple word count app using KStreams library // val builder = new StreamsBuilder() val textLines: KStream[String, String] = builder.stream[String, String](" streams-plaintext-input") val wordCounts: KTable[String, Long] = textLines.flatMapValues( textLine => textLine.toLowerCase.split("\\W+")) .groupBy((_, word) => word) .count() wordCounts.toStream.to("streams-wordcount-output") val streams: KafkaStreams = new KafkaStreams(builder.build(), config) // Simple word count app using KStreams library // val builder = new StreamsBuilder() val textLines: KStream[String, String] = builder.stream[String, String]("streams-plaintext-input") val wordCounts: KTable[String, Long] = textLines.flatMapValues( textLine => textLine.toLowerCase.split("\\W+")) .groupBy((_, word) => word) .count() wordCounts.toStream.to("streams-wordcount-output") val streams: KafkaStreams = new KafkaStreams(builder.build(), config) // Simple word count app using KStreams library // val builder = new StreamsBuilder() val textLines: KStream[String, String] = builder.stream[String, String]("streams-plaintext-input") val wordCounts: KTable[String, Long] = textLines.flatMapValues( textLine => textLine.toLowerCase.split("\\W+")) .groupBy((_, word) => word) .count() wordCounts.toStream.to("streams-wordcount-output") val streams: KafkaStreams = new KafkaStreams(builder.build(), config) // Simple word count app using KStreams library // val builder = new StreamsBuilder() val textLines: KStream[String, String] = builder.stream[String, String]("streams-plaintext-input") val wordCounts: KTable[String, Long] = textLines.flatMapValues( textLine => textLine.toLowerCase.split("\\W+")) .groupBy((_, word) => word) .count() wordCounts.toStream.to("streams-wordcount-output") val streams: KafkaStreams = new KafkaStreams(builder.build(), config) // Simple word count app using KStreams library // val builder = new StreamsBuilder() val textLines: KStream[String, String] = builder.stream[String, String]("streams-plaintext-input") val wordCounts: KTable[String, Long] = textLines.flatMapValues( textLine => textLine.toLowerCase.split("\\W+")) .groupBy((_, word) => word) .count() wordCounts.toStream.to("streams-wordcount-output") val streams: KafkaStreams = new KafkaStreams(builder.build(), config) // Simple word count app using KStreams library // val builder = new StreamsBuilder() val textLines: KStream[String, String] = builder.stream[String, String]("streams-plaintext-input") val wordCounts: KTable[String, Long] = textLines.flatMapValues( textLine => textLine.toLowerCase.split("\\W+")) .groupBy((_, word) => word) .count() wordCounts.toStream.to(" streams-wordcount-output") val streams: KafkaStreams = new KafkaStreams(builder.build(), config)
  17. None
  18. Kafka Data Transformation A B ksqlDB Advanced message transforms in

    Java SQL! Source Topic Destination Topic
  19. None
  20. None
  21. Demo Data Schema flights airlines positions code name KEY flight

    code flight_num landing_time takeoff_time KEY flight timestamp_ms altitude lat lon KEY
  22. flights airlines positions Demo Data Schema

  23. flights airlines positions code name KEY Static Data, mapping airline

    codes with airline names. Used to dereference encoded flight names with something more human readable. code name DAL Delta Air Lines SWA Southwest Airlines Demo Data Schema
  24. flights airlines positions Streaming Data, updated as each new flight

    plan is registered and when the flight status changes. flight code flight_num takeoff_time landing_time DAL1232 DAL 1232 1620337680000 null SWA345 SWA 345 1620335280000 1620338340000 flight code flight_num landing_time takeoff_time KEY Demo Data Schema
  25. flights airlines positions Streaming Data, updated with each new position

    report throughout the flight. flight timestamp_ms altitude lat lon SKW3984 1620314506000 35000 45.701 -104.29 Demo Data Schema flight timestamp_ms altitude lat lon KEY
  26. window n Windowed Aggregation time Tumbling window n window n+1

    window n+2 Hopping window n+1 window n+2 Session window n window n+1 Δt>inactivity gap
  27. Integration Getting data from A → B Custom integrations are

    evil Kafka + Transformation Kafka loosely coupled integration Transformation via Connect, SMT, or KStreams ksqlDB SQL interface to streaming data Approachable and viable for production scale
  28. Where to go from here? 5 Resources Look in the

    chat for links to each (including swag!)
  29. ksqlDB.io documentation, examples, code

  30. github examples Query cookbook, quick starts, code

  31. Confluent Cloud Free access with new accounts

  32. Thank You @KeithResar Kafka Developer confluent.io