KSQL - The power of SQL, the simplicity of SQL

KSQL THE POWER OF KAFKA THE SIMPLICITY OF SQL ALEXIS
SEIGNEURIN - 2019

ME Data Engineer at Kafka certified Run the Kafka Meetup:
meetup.com/Apache-Kafka-NYC

FREE E-BOOK I WROTE A guide to producing, consuming and
processing events en.ippon.tech/apache-kafka

MY BLOG aseigneurin.github.io

KAFKA: A FEW REMINDERS Publish-subscribe system Messages are published to
topics Messages are persisted Messages can be read by multiple consumers Multiple partitions for scaling

PRODUCER AND CONSUMER API (JAVA) Low-level API Producer: send() Consumer:
subscribe(), seek(), poll()… Consumer groups

KAFKA STREAMS Client library for Java and Scala DSL: from(),
map(), filter(), to()… Aggregations, joins, windowing Simple deployment model Allows to create “microservices” KStream / KTable

KAFKA STREAMS: QUICK EXAMPLE Properties props = new Properties(); props.put(StreamsConfig.APPLICATION_ID_CONFIG,
"text-transformer"); props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092"); props.put(StreamsConfig.NUM_STREAM_THREADS_CONFIG, "8"); props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest"); StreamsBuilder streamsBuilder = new StreamsBuilder(); streamsBuilder.stream("text-input", Consumed.with(Serdes.String(), Serdes.String())) .filter((key, value) -> value.contains("a")) .mapValues(text -> text.toUpperCase()) .to("text-output", Produced.with(Serdes.String(), Serdes.String())); Topology topology = streamsBuilder.build(); KafkaStreams streams = new KafkaStreams(topology, props); streams.start(); Application ID = Kafka consumer group Threads for parallel processing (relates to partitions) Topic to read from + key/value deserializers Transformations: map, ﬁlter… Topic to write to + key/value serializers

KAFKA STREAMS: KSTREAM KStream: stream of events Records are independent
from each other Stateless Example (inspired from the documentation): ‣ Sum values as records arrive ‣ Records: ‣ (alice, 1) = 1 ‣ (charlie, 1) = 2 ‣ (alice, 3) = 5 ‣ → Adds to (alice, 1)

KAFKA STREAMS: KTABLE KTable: change log stream New records =
updates Stateful (requires a state store) Example (inspired from the documentation): ‣ Sum values as records arrive ‣ Records: ‣ (alice, 1) = 1 ‣ (charlie, 1) = 2 ‣ (alice, 3) = 4 ‣ → Replaces (alice, 1)

KSQL Built on top of Kafka Streams SQL queries translate
into Kafka Streams applications ‣ CREATE STREAM → KStream ‣ CREATE TABLE → KTable Interactive or non-interactive queries

WHICH API TO USE? Source: https://docs.conﬂuent.io/current/ksql/docs/concepts/ksql-and-kafka-streams.html

KSQL Works with structured data: ‣ Delimited ‣ JSON ‣
Avro (Schema Registry)

KSQL: EXAMPLE { “title":"Toy Story”, ”id”:"862", "imdb_id":"tt0114709", … } {
"title":"Full Metal Jacket”, “id”:”600”, ”imdb_id":"tt0093058", … } … { “userId":"14", “movieId":"1287", “rating":2.0, "timestamp":1260759187} {“userId”:”28", "movieId":"862", "rating": 2.0, "timestamp":1260759148} {“userId”:”7", "movieId":"1339", "rating": 3.5, “timestamp":1260759125} … Movielens dataset Topic: ratings Topic: movies

KTABLE OF MOVIES CREATE TABLE movies ( \ adult VARCHAR,
\ belongs_to_collection VARCHAR, \ budget VARCHAR, \ genres VARCHAR, \ homepage VARCHAR, \ id VARCHAR, \ imdb_id VARCHAR, \ original_language VARCHAR, \ original_title VARCHAR, \ overview VARCHAR, \ popularity VARCHAR, \ poster_path VARCHAR, \ production_companies VARCHAR, \ production_countries VARCHAR, \ release_date VARCHAR, \ revenue VARCHAR, \ runtime VARCHAR, \ spoken_languages VARCHAR, \ status VARCHAR, \ tagline VARCHAR, \ title VARCHAR, \ video VARCHAR, \ vote_average VARCHAR, \ vote_count VARCHAR) \ WITH (KAFKA_TOPIC = 'movies', \ VALUE_FORMAT='JSON', \ KEY = 'id'); SHOW TABLES; DESCRIBE EXTENDED movies;

KTABLE OF MOVIES $ kafka-console-consumer --bootstrap-server localhost:9092 --topic movies SELECT
ID, TITLE FROM MOVIES WHERE TITLE LIKE 'Toy %';

KSTREAM OF RATINGS CREATE STREAM ratings ( \ userId VARCHAR,
\ movieId VARCHAR, \ rating DOUBLE, \ timestamp BIGINT \ ) \ WITH (KAFKA_TOPIC = 'ratings', VALUE_FORMAT='JSON'); DESCRIBE EXTENDED ratings; SELECT USERID, MOVIEID, RATING FROM RATINGS;

JOIN RATINGS WITH MOVIES SELECT USERID, MOVIEID, TITLE, RATING \
FROM RATINGS \ LEFT JOIN MOVIES ON RATINGS.MOVIEID = MOVIES.ID;

QUERY RUNNING IN THE BACKGROUND CREATE STREAM ratings_enriched \ WITH
(KAFKA_TOPIC = 'ratings_enriched', VALUE_FORMAT='JSON') \ AS SELECT USERID, MOVIEID, TITLE, RATING \ FROM RATINGS \ LEFT JOIN MOVIES ON RATINGS.MOVIEID = MOVIES.ID; EXPLAIN CSAS_RATINGS_ENRICHED_0; kafka-console-consumer --bootstrap-server localhost:9092 --topic ratings_enriched Kill the KSQL CLI... TERMINATE CSAS_RATINGS_ENRICHED_0; DROP STREAM ratings_enriched;

DEPLOYMENT OPTIONS Interactive: KSQL CLI Non-interactive: ksql-server-start /path/to/ksql-server.properties --queries-file query.sql

NON-INTERACTIVE QUERIES cd dev/kafka-movies ksql-server-start \ ~/software/confluent-5.1.0/etc/ksql/ksql-server.properties \ \—queries-file src/main/resources/query1.sql
(Start a second KSQL server)

WINDOWED AGGREGATIONS Windows: tumbling, hopping, session Functions: WINDOWSTART() and WINDOWEND()

WINDOWED AGGREGATIONS $ kafka-console-consumer --bootstrap-server localhost:9092 --topic events CREATE STREAM
events ( \ id VARCHAR, \ info VARCHAR, \ source VARCHAR, \ userid BIGINT ) \ WITH (KAFKA_TOPIC = 'events', VALUE_FORMAT='JSON');

WINDOWED AGGREGATIONS SELECT * FROM EVENTS; SELECT \ SOURCE, \
WINDOWSTART(), \ WINDOWEND(), \ COUNT(*) \ FROM EVENTS \ WINDOW TUMBLING(SIZE 5 SECONDs) \ GROUP BY SOURCE; SELECT \ SOURCE, \ TIMESTAMPTOSTRING(WINDOWSTART(), 'yyyy-MM-dd HH:mm:ss'), \ TIMESTAMPTOSTRING(WINDOWEND(), 'yyyy-MM-dd HH:mm:ss'), \ COUNT(*) \ FROM EVENTS \ WINDOW TUMBLING(SIZE 5 SECONDs) \ GROUP BY SOURCE;

WINDOWED AGGREGATIONS Support for late events Event time vs Processing
time | t1 | t2 | t3 | ... |* * * * |** * | * *** * *| ... | t1 | t2 | t3 | ... |* * * . |** * | * *** * *| ... \-----> *(t1)

WRAPPING UP Streaming SQL engine Tables + Streams Windowed aggregations
Interactive vs non-interactive

THANKS! @aseigneurin aseigneurin.github.io

KSQL - The power of SQL, the simplicity of SQL

KSQL - The power of SQL, the simplicity of SQL

Alexis Seigneurin

More Decks by Alexis Seigneurin

Other Decks in Technology

Featured

Transcript

KSQL THE POWER OF KAFKA THE SIMPLICITY OF SQL ALEXIS

ME Data Engineer at Kafka certified Run the Kafka Meetup:

FREE E-BOOK I WROTE A guide to producing, consuming and

MY BLOG aseigneurin.github.io

KAFKA: A FEW REMINDERS Publish-subscribe system Messages are published to

PRODUCER AND CONSUMER API (JAVA) Low-level API Producer: send() Consumer:

KAFKA STREAMS Client library for Java and Scala DSL: from(),

KAFKA STREAMS: QUICK EXAMPLE Properties props = new Properties(); props.put(StreamsConfig.APPLICATION_ID_CONFIG,

KAFKA STREAMS: KSTREAM KStream: stream of events Records are independent

KAFKA STREAMS: KTABLE KTable: change log stream New records =

KSQL Built on top of Kafka Streams SQL queries translate

WHICH API TO USE? Source: https://docs.conﬂuent.io/current/ksql/docs/concepts/ksql-and-kafka-streams.html

KSQL Works with structured data: ‣ Delimited ‣ JSON ‣

KSQL: EXAMPLE { “title":"Toy Story”, ”id”:"862", "imdb_id":"tt0114709", … } {

KTABLE OF MOVIES CREATE TABLE movies ( \ adult VARCHAR,

KTABLE OF MOVIES $ kafka-console-consumer --bootstrap-server localhost:9092 --topic movies SELECT

KSTREAM OF RATINGS CREATE STREAM ratings ( \ userId VARCHAR,

JOIN RATINGS WITH MOVIES SELECT USERID, MOVIEID, TITLE, RATING \

QUERY RUNNING IN THE BACKGROUND CREATE STREAM ratings_enriched \ WITH

DEPLOYMENT OPTIONS Interactive: KSQL CLI Non-interactive: ksql-server-start /path/to/ksql-server.properties --queries-file query.sql

NON-INTERACTIVE QUERIES cd dev/kafka-movies ksql-server-start \ ~/software/confluent-5.1.0/etc/ksql/ksql-server.properties \ \—queries-file src/main/resources/query1.sql

WINDOWED AGGREGATIONS Windows: tumbling, hopping, session Functions: WINDOWSTART() and WINDOWEND()

WINDOWED AGGREGATIONS $ kafka-console-consumer --bootstrap-server localhost:9092 --topic events CREATE STREAM

WINDOWED AGGREGATIONS SELECT * FROM EVENTS; SELECT \ SOURCE, \

WINDOWED AGGREGATIONS Support for late events Event time vs Processing

WRAPPING UP Streaming SQL engine Tables + Streams Windowed aggregations

THANKS! @aseigneurin aseigneurin.github.io