KSQL - The power of SQL, the simplicity of SQL

KSQL - The power of SQL, the simplicity of SQL

Kafka, KSQL

B1ed299a884f153fd23b9a1b81b798ac?s=128

Alexis Seigneurin

January 29, 2019
Tweet

Transcript

  1. KSQL THE POWER OF KAFKA THE SIMPLICITY OF SQL ALEXIS

    SEIGNEURIN - 2019
  2. ME Data Engineer at Kafka certified Run the Kafka Meetup:

    meetup.com/Apache-Kafka-NYC
  3. FREE E-BOOK I WROTE A guide to producing, consuming and

    processing events en.ippon.tech/apache-kafka
  4. MY BLOG aseigneurin.github.io

  5. KAFKA: A FEW REMINDERS Publish-subscribe system Messages are published to

    topics Messages are persisted Messages can be read by multiple consumers Multiple partitions for scaling
  6. PRODUCER AND CONSUMER API (JAVA) Low-level API Producer: send() Consumer:

    subscribe(), seek(), poll()… Consumer groups
  7. KAFKA STREAMS Client library for Java and Scala DSL: from(),

    map(), filter(), to()… Aggregations, joins, windowing Simple deployment model Allows to create “microservices” KStream / KTable
  8. KAFKA STREAMS: QUICK EXAMPLE Properties props = new Properties(); props.put(StreamsConfig.APPLICATION_ID_CONFIG,

    "text-transformer"); props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092"); props.put(StreamsConfig.NUM_STREAM_THREADS_CONFIG, "8"); props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest"); StreamsBuilder streamsBuilder = new StreamsBuilder(); streamsBuilder.stream("text-input", Consumed.with(Serdes.String(), Serdes.String())) .filter((key, value) -> value.contains("a")) .mapValues(text -> text.toUpperCase()) .to("text-output", Produced.with(Serdes.String(), Serdes.String())); Topology topology = streamsBuilder.build(); KafkaStreams streams = new KafkaStreams(topology, props); streams.start(); Application ID = Kafka consumer group Threads for parallel processing (relates to partitions) Topic to read from + key/value deserializers Transformations: map, filter… Topic to write to + key/value serializers
  9. KAFKA STREAMS: KSTREAM KStream: stream of events Records are independent

    from each other Stateless Example (inspired from the documentation): ‣ Sum values as records arrive ‣ Records: ‣ (alice, 1) = 1 ‣ (charlie, 1) = 2 ‣ (alice, 3) = 5 ‣ → Adds to (alice, 1)
  10. KAFKA STREAMS: KTABLE KTable: change log stream New records =

    updates Stateful (requires a state store) Example (inspired from the documentation): ‣ Sum values as records arrive ‣ Records: ‣ (alice, 1) = 1 ‣ (charlie, 1) = 2 ‣ (alice, 3) = 4 ‣ → Replaces (alice, 1)
  11. KSQL Built on top of Kafka Streams SQL queries translate

    into Kafka Streams applications ‣ CREATE STREAM → KStream ‣ CREATE TABLE → KTable Interactive or non-interactive queries
  12. WHICH API TO USE? Source: https://docs.confluent.io/current/ksql/docs/concepts/ksql-and-kafka-streams.html

  13. KSQL Works with structured data: ‣ Delimited ‣ JSON ‣

    Avro (Schema Registry)
  14. KSQL: EXAMPLE { “title":"Toy Story”, ”id”:"862", "imdb_id":"tt0114709", … } {

    "title":"Full Metal Jacket”, “id”:”600”, ”imdb_id":"tt0093058", … } … { “userId":"14", “movieId":"1287", “rating":2.0, "timestamp":1260759187} {“userId”:”28", "movieId":"862", "rating": 2.0, "timestamp":1260759148} {“userId”:”7", "movieId":"1339", "rating": 3.5, “timestamp":1260759125} … Movielens dataset Topic: ratings Topic: movies
  15. KTABLE OF MOVIES CREATE TABLE movies ( \ adult VARCHAR,

    \ belongs_to_collection VARCHAR, \ budget VARCHAR, \ genres VARCHAR, \ homepage VARCHAR, \ id VARCHAR, \ imdb_id VARCHAR, \ original_language VARCHAR, \ original_title VARCHAR, \ overview VARCHAR, \ popularity VARCHAR, \ poster_path VARCHAR, \ production_companies VARCHAR, \ production_countries VARCHAR, \ release_date VARCHAR, \ revenue VARCHAR, \ runtime VARCHAR, \ spoken_languages VARCHAR, \ status VARCHAR, \ tagline VARCHAR, \ title VARCHAR, \ video VARCHAR, \ vote_average VARCHAR, \ vote_count VARCHAR) \ WITH (KAFKA_TOPIC = 'movies', \ VALUE_FORMAT='JSON', \ KEY = 'id'); SHOW TABLES; DESCRIBE EXTENDED movies;
  16. KTABLE OF MOVIES $ kafka-console-consumer --bootstrap-server localhost:9092 --topic movies SELECT

    ID, TITLE FROM MOVIES WHERE TITLE LIKE 'Toy %';
  17. KSTREAM OF RATINGS CREATE STREAM ratings ( \ userId VARCHAR,

    \ movieId VARCHAR, \ rating DOUBLE, \ timestamp BIGINT \ ) \ WITH (KAFKA_TOPIC = 'ratings', VALUE_FORMAT='JSON'); DESCRIBE EXTENDED ratings; SELECT USERID, MOVIEID, RATING FROM RATINGS;
  18. JOIN RATINGS WITH MOVIES SELECT USERID, MOVIEID, TITLE, RATING \

    FROM RATINGS \ LEFT JOIN MOVIES ON RATINGS.MOVIEID = MOVIES.ID;
  19. QUERY RUNNING IN THE BACKGROUND CREATE STREAM ratings_enriched \ WITH

    (KAFKA_TOPIC = 'ratings_enriched', VALUE_FORMAT='JSON') \ AS SELECT USERID, MOVIEID, TITLE, RATING \ FROM RATINGS \ LEFT JOIN MOVIES ON RATINGS.MOVIEID = MOVIES.ID; EXPLAIN CSAS_RATINGS_ENRICHED_0; kafka-console-consumer --bootstrap-server localhost:9092 --topic ratings_enriched Kill the KSQL CLI... TERMINATE CSAS_RATINGS_ENRICHED_0; DROP STREAM ratings_enriched;
  20. DEPLOYMENT OPTIONS Interactive: KSQL CLI Non-interactive: ksql-server-start /path/to/ksql-server.properties --queries-file query.sql

  21. NON-INTERACTIVE QUERIES cd dev/kafka-movies ksql-server-start \ ~/software/confluent-5.1.0/etc/ksql/ksql-server.properties \ \—queries-file src/main/resources/query1.sql

    (Start a second KSQL server)
  22. WINDOWED AGGREGATIONS Windows: tumbling, hopping, session Functions: WINDOWSTART() and WINDOWEND()

  23. WINDOWED AGGREGATIONS $ kafka-console-consumer --bootstrap-server localhost:9092 --topic events CREATE STREAM

    events ( \ id VARCHAR, \ info VARCHAR, \ source VARCHAR, \ userid BIGINT ) \ WITH (KAFKA_TOPIC = 'events', VALUE_FORMAT='JSON');
  24. WINDOWED AGGREGATIONS SELECT * FROM EVENTS; SELECT \ SOURCE, \

    WINDOWSTART(), \ WINDOWEND(), \ COUNT(*) \ FROM EVENTS \ WINDOW TUMBLING(SIZE 5 SECONDs) \ GROUP BY SOURCE; SELECT \ SOURCE, \ TIMESTAMPTOSTRING(WINDOWSTART(), 'yyyy-MM-dd HH:mm:ss'), \ TIMESTAMPTOSTRING(WINDOWEND(), 'yyyy-MM-dd HH:mm:ss'), \ COUNT(*) \ FROM EVENTS \ WINDOW TUMBLING(SIZE 5 SECONDs) \ GROUP BY SOURCE;
  25. WINDOWED AGGREGATIONS Support for late events Event time vs Processing

    time | t1 | t2 | t3 | ... |* * * * |** * | * *** * *| ... | t1 | t2 | t3 | ... |* * * . |** * | * *** * *| ... \-----> *(t1)
  26. WRAPPING UP Streaming SQL engine Tables + Streams Windowed aggregations

    Interactive vs non-interactive
  27. THANKS! @aseigneurin aseigneurin.github.io