Processing Streaming Data with KSQL

Processing Streaming Data with KSQL

Apache Kafka is a de facto standard streaming data processing platform, being widely deployed as a messaging system, and having a robust data integration framework (Kafka Connect) and stream processing API (Kafka Streams) to meet the needs that common attend real-time message processing. But there’s more!

Kafka now offers KSQL, a declarative, SQL-like stream processing language that lets you define powerful stream-processing applications easily. What once took some moderately sophisticated Java code can now be done at the command line with a familiar and eminently approachable syntax. Come to this talk for an overview of KSQL with live coding on live streaming data.

0680be1c881abcf19219f09f1e8cf140?s=128

Viktor Gamov

October 06, 2018
Tweet

Transcript

  1. 4.

    @ @gamussa #SQLSaturday @confluentinc Solutions Architect Developer Advocate @gamussa in

    internetz Hey you, yes, you, go follow me in twitter © Who am I?
  2. 5.

    @gamussa #SQLSaturday @confluentinc Stream Processing by Analogy Kafka Cluster Connect

    API Stream Processing Connect API $ cat < in.txt | grep “ksql” | tr a-z A-Z > out.txt
  3. 6.

    @ @gamussa #SQLSaturday @confluentinc Kafka is a Streaming Platform The

    Log Connectors Connectors Producer Consumer Streaming Engine
  4. 7.
  5. 9.

    @ @gamussa #SQLSaturday @confluentinc CREATE STREAM possible_fraud AS SELECT card_number,

    count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; authorization_attempts possible_fraud What exactly is Stream Processing?
  6. 10.

    @ @gamussa #SQLSaturday @confluentinc CREATE STREAM possible_fraud AS SELECT card_number,

    count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; authorization_attempts possible_fraud What exactly is Stream Processing?
  7. 11.

    @ @gamussa #SQLSaturday @confluentinc CREATE STREAM possible_fraud AS SELECT card_number,

    count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; authorization_attempts possible_fraud What exactly is Stream Processing?
  8. 12.

    @ @gamussa #SQLSaturday @confluentinc CREATE STREAM possible_fraud AS SELECT card_number,

    count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; authorization_attempts possible_fraud What exactly is Stream Processing?
  9. 13.

    @ @gamussa #SQLSaturday @confluentinc CREATE STREAM possible_fraud AS SELECT card_number,

    count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; authorization_attempts possible_fraud What exactly is Stream Processing?
  10. 14.

    @ @gamussa #SQLSaturday @confluentinc CREATE STREAM possible_fraud AS SELECT card_number,

    count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; authorization_attempts possible_fraud What exactly is Stream Processing?
  11. 15.
  12. 22.

    @ @gamussa #SQLSaturday @confluentinc Gary 1 Gary 1 Viktor 1

    Gary 2 Viktor 1 Gary 2 Viktor 1 Soby 1 TABLE STREAM TABLE (“Gary”, 1) (“Viktor”, 1) (“Gary”, 2) (“Soby”, 1) Gary 1 Gary 1 Viktor 1 Gary 2 Viktor 1 Gary 2 Viktor 1 Soby 1
  13. 24.
  14. 25.

    @gamussa #SQLSaturday @confluentinc Where is KSQL not such a great

    fit? BI reports (Tableau etc.) •No indexes •No JDBC (most BI tools are not good with continuous results!) Ad-hoc queries •Limited span of time usually retained in Kafka •No indexes