Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Processing Streaming Data with KSQL

Processing Streaming Data with KSQL

Apache Kafka is a de facto standard streaming data processing platform, being widely deployed as a messaging system, and having a robust data integration framework (Kafka Connect) and stream processing API (Kafka Streams) to meet the needs that common attend real-time message processing. But there’s more!

Kafka now offers KSQL, a declarative, SQL-like stream processing language that lets you define powerful stream-processing applications easily. What once took some moderately sophisticated Java code can now be done at the command line with a familiar and eminently approachable syntax. Come to this talk for an overview of KSQL with live coding on live streaming data.

Viktor Gamov

October 06, 2018
Tweet

More Decks by Viktor Gamov

Other Decks in Technology

Transcript

  1. Processing Streaming Data
    with KSQL
    @gamussa
    #SQLSaturday

    View Slide

  2. @gamussa #SQLSaturday @confluentinc
    Declarative
    Stream
    Language
    Processing
    KSQL
    is a

    View Slide

  3. @gamussa #SQLSaturday @confluentinc
    KSQL
    is the
    Streaming
    SQL Engine
    for
    Apache Kafka

    View Slide

  4. @
    @gamussa #SQLSaturday @confluentinc
    Solutions Architect
    Developer Advocate
    @gamussa in internetz
    Hey you, yes, you,
    go follow me in twitter ©
    Who am I?

    View Slide

  5. @gamussa #SQLSaturday @confluentinc
    Stream Processing by Analogy
    Kafka Cluster
    Connect API Stream Processing Connect API
    $ cat < in.txt | grep “ksql” | tr a-z A-Z > out.txt

    View Slide

  6. @
    @gamussa #SQLSaturday @confluentinc
    Kafka is a Streaming Platform
    The Log Connectors
    Connectors
    Producer Consumer
    Streaming Engine

    View Slide

  7. @
    @gamussa #SQLSaturday @confluentinc
    Streaming 

    is the toolset for dealing 

    with events 

    as they move!

    View Slide

  8. @
    @gamussa #SQLSaturday @confluentinc
    authorization_attempts possible_fraud
    What exactly is Stream Processing?

    View Slide

  9. @
    @gamussa #SQLSaturday @confluentinc
    CREATE STREAM possible_fraud AS
    SELECT card_number, count(*)
    FROM authorization_attempts
    WINDOW TUMBLING (SIZE 5 MINUTE)
    GROUP BY card_number
    HAVING count(*) > 3;
    authorization_attempts possible_fraud
    What exactly is Stream Processing?

    View Slide

  10. @
    @gamussa #SQLSaturday @confluentinc
    CREATE STREAM possible_fraud AS
    SELECT card_number, count(*)
    FROM authorization_attempts
    WINDOW TUMBLING (SIZE 5 MINUTE)
    GROUP BY card_number
    HAVING count(*) > 3;
    authorization_attempts possible_fraud
    What exactly is Stream Processing?

    View Slide

  11. @
    @gamussa #SQLSaturday @confluentinc
    CREATE STREAM possible_fraud AS
    SELECT card_number, count(*)
    FROM authorization_attempts
    WINDOW TUMBLING (SIZE 5 MINUTE)
    GROUP BY card_number
    HAVING count(*) > 3;
    authorization_attempts possible_fraud
    What exactly is Stream Processing?

    View Slide

  12. @
    @gamussa #SQLSaturday @confluentinc
    CREATE STREAM possible_fraud AS
    SELECT card_number, count(*)
    FROM authorization_attempts
    WINDOW TUMBLING (SIZE 5 MINUTE)
    GROUP BY card_number
    HAVING count(*) > 3;
    authorization_attempts possible_fraud
    What exactly is Stream Processing?

    View Slide

  13. @
    @gamussa #SQLSaturday @confluentinc
    CREATE STREAM possible_fraud AS
    SELECT card_number, count(*)
    FROM authorization_attempts
    WINDOW TUMBLING (SIZE 5 MINUTE)
    GROUP BY card_number
    HAVING count(*) > 3;
    authorization_attempts possible_fraud
    What exactly is Stream Processing?

    View Slide

  14. @
    @gamussa #SQLSaturday @confluentinc
    CREATE STREAM possible_fraud AS
    SELECT card_number, count(*)
    FROM authorization_attempts
    WINDOW TUMBLING (SIZE 5 MINUTE)
    GROUP BY card_number
    HAVING count(*) > 3;
    authorization_attempts possible_fraud
    What exactly is Stream Processing?

    View Slide

  15. View Slide

  16. @
    @gamussa #SQLSaturday @confluentinc
    Table-Stream Duality

    View Slide

  17. Do you think that’s a
    table you are querying ?

    View Slide

  18. @
    @gamussa #SQLSaturday @confluentinc
    Streams to Tables

    View Slide

  19. @
    @gamussa #SQLSaturday @confluentinc

    View Slide

  20. @
    @gamussa #SQLSaturday @confluentinc
    Stream/Table Duality

    View Slide

  21. @
    @gamussa #SQLSaturday @confluentinc
    Stream/Table Duality

    View Slide

  22. @
    @gamussa #SQLSaturday @confluentinc
    Gary 1
    Gary 1
    Viktor 1
    Gary 2
    Viktor 1
    Gary 2
    Viktor 1
    Soby 1
    TABLE STREAM TABLE
    (“Gary”, 1)
    (“Viktor”, 1)
    (“Gary”, 2)
    (“Soby”, 1)
    Gary 1
    Gary 1
    Viktor 1
    Gary 2
    Viktor 1
    Gary 2
    Viktor 1
    Soby 1

    View Slide

  23. @
    @gamussa #SQLSaturday @confluentinc
    Join Streams and Tables
    Compacted

    Topic
    Join
    Stream
    Table
    Kafka Kafka Streams
    Topic

    View Slide

  24. Demo

    View Slide

  25. @gamussa #SQLSaturday @confluentinc
    Where is KSQL not such a great fit?
    BI reports (Tableau etc.)
    •No indexes
    •No JDBC (most BI
    tools are not good with
    continuous results!)
    Ad-hoc queries
    •Limited span of time
    usually retained in
    Kafka
    •No indexes

    View Slide

  26. @gamussa #SQLSaturday @confluentinc
    Resources and Next Steps
    https://github.com/confluentinc/ksql
    http://confluent.io/ksql
    https://slackpass.io/confluentcommunity #ksql

    View Slide

  27. @
    @gamussa #SQLSaturday @confluentinc
    Thanks!
    @gamussa
    [email protected]
    We are hiring!
    https://www.confluent.io/careers/

    View Slide