Upgrade to Pro — share decks privately, control downloads, hide ads and more …

KSQL - The power of SQL, the simplicity of SQL

KSQL - The power of SQL, the simplicity of SQL

Kafka, KSQL

Alexis Seigneurin

January 29, 2019
Tweet

More Decks by Alexis Seigneurin

Other Decks in Technology

Transcript

  1. KSQL
    THE POWER OF KAFKA
    THE SIMPLICITY OF SQL
    ALEXIS SEIGNEURIN - 2019

    View Slide

  2. ME
    Data Engineer at
    Kafka certified
    Run the Kafka Meetup: meetup.com/Apache-Kafka-NYC

    View Slide

  3. FREE E-BOOK I WROTE
    A guide to producing, consuming and processing
    events
    en.ippon.tech/apache-kafka

    View Slide

  4. MY BLOG
    aseigneurin.github.io

    View Slide

  5. KAFKA: A FEW REMINDERS
    Publish-subscribe system
    Messages are published to topics
    Messages are persisted
    Messages can be read by multiple consumers
    Multiple partitions for scaling

    View Slide

  6. PRODUCER AND CONSUMER API (JAVA)
    Low-level API
    Producer: send()
    Consumer: subscribe(), seek(), poll()…
    Consumer groups

    View Slide

  7. KAFKA STREAMS
    Client library for Java and Scala
    DSL: from(), map(), filter(), to()…
    Aggregations, joins, windowing
    Simple deployment model
    Allows to create “microservices”
    KStream / KTable

    View Slide

  8. KAFKA STREAMS: QUICK EXAMPLE
    Properties props = new Properties();
    props.put(StreamsConfig.APPLICATION_ID_CONFIG, "text-transformer");
    props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
    props.put(StreamsConfig.NUM_STREAM_THREADS_CONFIG, "8");
    props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
    StreamsBuilder streamsBuilder = new StreamsBuilder();
    streamsBuilder.stream("text-input", Consumed.with(Serdes.String(), Serdes.String()))
    .filter((key, value) -> value.contains("a"))
    .mapValues(text -> text.toUpperCase())
    .to("text-output", Produced.with(Serdes.String(), Serdes.String()));
    Topology topology = streamsBuilder.build();
    KafkaStreams streams = new KafkaStreams(topology, props);
    streams.start();
    Application ID = Kafka consumer
    group
    Threads for parallel processing (relates
    to partitions)
    Topic to read from + key/value
    deserializers
    Transformations: map, filter…
    Topic to write to + key/value serializers

    View Slide

  9. KAFKA STREAMS: KSTREAM
    KStream: stream of events
    Records are independent from each other
    Stateless
    Example (inspired from the documentation):
    ‣ Sum values as records arrive
    ‣ Records:
    ‣ (alice, 1) = 1
    ‣ (charlie, 1) = 2
    ‣ (alice, 3) = 5
    ‣ → Adds to (alice, 1)

    View Slide

  10. KAFKA STREAMS: KTABLE
    KTable: change log stream
    New records = updates
    Stateful (requires a state store)
    Example (inspired from the documentation):
    ‣ Sum values as records arrive
    ‣ Records:
    ‣ (alice, 1) = 1
    ‣ (charlie, 1) = 2
    ‣ (alice, 3) = 4
    ‣ → Replaces (alice, 1)

    View Slide

  11. KSQL
    Built on top of Kafka Streams
    SQL queries translate into Kafka Streams applications
    ‣ CREATE STREAM → KStream
    ‣ CREATE TABLE → KTable
    Interactive or non-interactive queries

    View Slide

  12. WHICH API TO USE?
    Source: https://docs.confluent.io/current/ksql/docs/concepts/ksql-and-kafka-streams.html

    View Slide

  13. KSQL
    Works with structured data:
    ‣ Delimited
    ‣ JSON
    ‣ Avro (Schema Registry)

    View Slide

  14. KSQL: EXAMPLE
    { “title":"Toy Story”, ”id”:"862",
    "imdb_id":"tt0114709", … }
    { "title":"Full Metal Jacket”, “id”:”600”,
    ”imdb_id":"tt0093058", … }

    { “userId":"14", “movieId":"1287",
    “rating":2.0, "timestamp":1260759187}
    {“userId”:”28", "movieId":"862", "rating":
    2.0, "timestamp":1260759148}
    {“userId”:”7", "movieId":"1339", "rating":
    3.5, “timestamp":1260759125}

    Movielens dataset
    Topic: ratings
    Topic: movies

    View Slide

  15. KTABLE OF MOVIES
    CREATE TABLE movies ( \
    adult VARCHAR, \
    belongs_to_collection VARCHAR, \
    budget VARCHAR, \
    genres VARCHAR, \
    homepage VARCHAR, \
    id VARCHAR, \
    imdb_id VARCHAR, \
    original_language VARCHAR, \
    original_title VARCHAR, \
    overview VARCHAR, \
    popularity VARCHAR, \
    poster_path VARCHAR, \
    production_companies VARCHAR, \
    production_countries VARCHAR, \
    release_date VARCHAR, \
    revenue VARCHAR, \
    runtime VARCHAR, \
    spoken_languages VARCHAR, \
    status VARCHAR, \
    tagline VARCHAR, \
    title VARCHAR, \
    video VARCHAR, \
    vote_average VARCHAR, \
    vote_count VARCHAR) \
    WITH (KAFKA_TOPIC = 'movies', \
    VALUE_FORMAT='JSON', \
    KEY = 'id');
    SHOW TABLES;
    DESCRIBE EXTENDED movies;

    View Slide

  16. KTABLE OF MOVIES
    $ kafka-console-consumer --bootstrap-server localhost:9092 --topic movies
    SELECT ID, TITLE FROM MOVIES WHERE TITLE LIKE 'Toy %';

    View Slide

  17. KSTREAM OF RATINGS
    CREATE STREAM ratings ( \
    userId VARCHAR, \
    movieId VARCHAR, \
    rating DOUBLE, \
    timestamp BIGINT \
    ) \
    WITH (KAFKA_TOPIC = 'ratings', VALUE_FORMAT='JSON');
    DESCRIBE EXTENDED ratings;
    SELECT USERID, MOVIEID, RATING FROM RATINGS;

    View Slide

  18. JOIN RATINGS WITH MOVIES
    SELECT USERID, MOVIEID, TITLE, RATING \
    FROM RATINGS \
    LEFT JOIN MOVIES ON RATINGS.MOVIEID = MOVIES.ID;

    View Slide

  19. QUERY RUNNING IN THE BACKGROUND
    CREATE STREAM ratings_enriched \
    WITH (KAFKA_TOPIC = 'ratings_enriched', VALUE_FORMAT='JSON') \
    AS SELECT USERID, MOVIEID, TITLE, RATING \
    FROM RATINGS \
    LEFT JOIN MOVIES ON RATINGS.MOVIEID = MOVIES.ID;
    EXPLAIN CSAS_RATINGS_ENRICHED_0;
    kafka-console-consumer --bootstrap-server localhost:9092 --topic ratings_enriched
    Kill the KSQL CLI...
    TERMINATE CSAS_RATINGS_ENRICHED_0;
    DROP STREAM ratings_enriched;

    View Slide

  20. DEPLOYMENT OPTIONS
    Interactive: KSQL CLI
    Non-interactive:
    ksql-server-start /path/to/ksql-server.properties --queries-file query.sql

    View Slide

  21. NON-INTERACTIVE QUERIES
    cd dev/kafka-movies
    ksql-server-start \
    ~/software/confluent-5.1.0/etc/ksql/ksql-server.properties \
    \—queries-file src/main/resources/query1.sql
    (Start a second KSQL server)

    View Slide

  22. WINDOWED AGGREGATIONS
    Windows: tumbling, hopping, session
    Functions: WINDOWSTART() and WINDOWEND()

    View Slide

  23. WINDOWED AGGREGATIONS
    $ kafka-console-consumer --bootstrap-server localhost:9092 --topic events
    CREATE STREAM events ( \
    id VARCHAR, \
    info VARCHAR, \
    source VARCHAR, \
    userid BIGINT
    ) \
    WITH (KAFKA_TOPIC = 'events', VALUE_FORMAT='JSON');

    View Slide

  24. WINDOWED AGGREGATIONS
    SELECT * FROM EVENTS;
    SELECT \
    SOURCE, \
    WINDOWSTART(), \
    WINDOWEND(), \
    COUNT(*) \
    FROM EVENTS \
    WINDOW TUMBLING(SIZE 5 SECONDs) \
    GROUP BY SOURCE;
    SELECT \
    SOURCE, \
    TIMESTAMPTOSTRING(WINDOWSTART(), 'yyyy-MM-dd HH:mm:ss'), \
    TIMESTAMPTOSTRING(WINDOWEND(), 'yyyy-MM-dd HH:mm:ss'), \
    COUNT(*) \
    FROM EVENTS \
    WINDOW TUMBLING(SIZE 5 SECONDs) \
    GROUP BY SOURCE;

    View Slide

  25. WINDOWED AGGREGATIONS
    Support for late events
    Event time vs Processing time
    | t1 | t2 | t3 | ...
    |* * * * |** * | * *** * *| ...
    | t1 | t2 | t3 | ...
    |* * * . |** * | * *** * *| ...
    \-----> *(t1)

    View Slide

  26. WRAPPING UP
    Streaming SQL engine
    Tables + Streams
    Windowed aggregations
    Interactive vs non-interactive

    View Slide

  27. THANKS!
    @aseigneurin
    aseigneurin.github.io

    View Slide