Slide 1

Slide 1 text

RETHINKING Stream Processing with with Kafka Streams and KSQL

Slide 2

Slide 2 text

@ @gamussa @confluentinc Solutions Architect Developer Advocate @gamussa in internetz Hey you, yes, you, go follow me in twitter © Who am I?

Slide 3

Slide 3 text

@ @gamussa @confluentinc Producers Consumers

Slide 4

Slide 4 text

@ @gamussa @confluentinc What is Stream Processing? A machine for combining streams of events

Slide 5

Slide 5 text

@gamussa @confluentinc 5 1.0 Enterprise
 Ready 0.10 Data Processing (Streams API) 0.11 Exactly-once
 Semantics Kafka the Streaming Data Platform 2013 2014 2015 2016 2017 2018 0.8 Intra-cluster replication 0.9 Data Integration (Connect API)

Slide 6

Slide 6 text

6 As developers, we want 
 to build APPS not INFRASTRUCTURE

Slide 7

Slide 7 text

@gamussa @confluentinc 7 We want our apps to be: Scalable Elastic Fault-tolerant Stateful Distributed

Slide 8

Slide 8 text

8 Where do I put my compute?

Slide 9

Slide 9 text

9 Where do I put my state?

Slide 10

Slide 10 text

10 The actual question is
 Where is my code?

Slide 11

Slide 11 text

@gamussa @confluentinc 11 the KAFKA STREAMS API is a 
 JAVA API to 
 BUILD REAL-TIME APPLICATIONS to 
 POWER THE BUSINESS

Slide 12

Slide 12 text

12 App Streams API Not running inside brokers!

Slide 13

Slide 13 text

13 Brokers? Nope! App Streams API App Streams API App Streams API Same app, many instances

Slide 14

Slide 14 text

@gamussa @confluentinc 14 Before Dashboard Processing Cluster Your Job Shared Database

Slide 15

Slide 15 text

@gamussa @confluentinc 15 After Dashboard APP Streams API

Slide 16

Slide 16 text

@gamussa @confluentinc 16 this means you can 
 DEPLOY your app ANYWHERE using WHATEVER TECHNOLOGY YOU WANT

Slide 17

Slide 17 text

@gamussa @confluentinc 17 Things Kafka Streams Does Runs everywhere Clustering done for you Exactly-once processing Event-time processing Integrated database Joins, windowing, aggregation S/M/L/XL/XXL/XXXL sizes

Slide 18

Slide 18 text

18 First, some
 API CONCEPTS

Slide 19

Slide 19 text

19 STREAMS are EVERYWHERE

Slide 20

Slide 20 text

20 TABLES are EVERYWHERE

Slide 21

Slide 21 text

@gamussa @confluentinc 21 Streams to Tables

Slide 22

Slide 22 text

@gamussa @confluentinc 22 Tables to Streams

Slide 23

Slide 23 text

@gamussa @confluentinc 23 Stream/Table Duality

Slide 24

Slide 24 text

@gamussa @confluentinc 24 Stream/Table Duality

Slide 25

Slide 25 text

25 STREAMS <-> TABLES

Slide 26

Slide 26 text

@gamussa @confluentinc 26 // Example: reading data from Kafka KStream textLines = builder.stream("textlines-topic", Consumed.with( Serdes.ByteArray(), Serdes.String())); // Example: transforming data KStream upperCasedLines= rawRatings.mapValues(String::toUpperCase)); KStream

Slide 27

Slide 27 text

@gamussa @confluentinc 27 // Example: aggregating data KTable wordCounts = textLines
 .flatMapValues(textLine -> Arrays.asList(textLine.toLowerCase().split("\ \W+")))
 .groupBy((key, word) -> word)
 .count(); KTable

Slide 28

Slide 28 text

28 DEMO

Slide 29

Slide 29 text

Declarative Stream Language Processing KSQL is a

Slide 30

Slide 30 text

KSQL is the Streaming SQL Engine for Apache Kafka

Slide 31

Slide 31 text

Stream Processing by Analogy Kafka Cluster Connect API Stream Processing Connect API $ cat < in.txt | grep “ksql” | tr a-z A-Z > out.txt

Slide 32

Slide 32 text

KSQL for Data Exploration SELECT status, bytes FROM clickstream WHERE user_agent = ‘Mozilla/5.0 (compatible; MSIE 6.0)’; An easy way to inspect data in a running cluster

Slide 33

Slide 33 text

KSQL for Streaming ETL •Kafka is popular for data pipelines. •KSQL enables easy transformations of data within the pipe. •Transforming data while moving from Kafka to another system. CREATE STREAM vip_actions AS 
 SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.user_id 
 WHERE u.level = 'Platinum';

Slide 34

Slide 34 text

KSQL for Anomaly Detection CREATE TABLE possible_fraud AS
 SELECT card_number, count(*)
 FROM authorization_attempts 
 WINDOW TUMBLING (SIZE 5 SECONDS)
 GROUP BY card_number
 HAVING count(*) > 3; Identifying patterns or anomalies in real-time data, surfaced in milliseconds

Slide 35

Slide 35 text

KSQL for Real-Time Monitoring • Log data monitoring, tracking and alerting • Sensor / IoT data CREATE TABLE error_counts AS 
 SELECT error_code, count(*) 
 FROM monitoring_stream 
 WINDOW TUMBLING (SIZE 1 MINUTE) 
 WHERE type = 'ERROR' 
 GROUP BY error_code;

Slide 36

Slide 36 text

KSQL for Data Transformation CREATE STREAM views_by_userid WITH (PARTITIONS=6, VALUE_FORMAT=‘JSON’, TIMESTAMP=‘view_time’) AS 
 SELECT * FROM clickstream PARTITION BY user_id; Make simple derivations of existing topics from the command line

Slide 37

Slide 37 text

Where is KSQL not such a great fit? BI reports (Tableau etc.) •No indexes •No JDBC (most BI tools are not good with continuous results!) Ad-hoc queries •Limited span of time usually retained in Kafka •No indexes

Slide 38

Slide 38 text

CREATE STREAM clickstream ( time BIGINT, url VARCHAR, status INTEGER, bytes INTEGER, userid VARCHAR, agent VARCHAR) WITH ( value_format = ‘JSON’, kafka_topic=‘my_clickstream_topic’ ); Creating a Stream

Slide 39

Slide 39 text

CREATE TABLE users ( user_id INTEGER, registered_at LONG, username VARCHAR, name VARCHAR, city VARCHAR, level VARCHAR) WITH ( key = ‘user_id', kafka_topic=‘clickstream_users’, value_format=‘JSON'); Creating a Table

Slide 40

Slide 40 text

CREATE STREAM vip_actions AS SELECT userid, fullname, url, status 
 FROM clickstream c 
 LEFT JOIN users u ON c.userid = u.user_id WHERE u.level = 'Platinum'; Joins for Enrichment

Slide 41

Slide 41 text

Trade-Offs • subscribe() • poll() • send() • flush() Consumer, Producer • filter() • join() • aggregate() Kafka Streams • Select…from… • Join…where… • Group by.. KSQL Flexibility Simplicity

Slide 42

Slide 42 text

Kafka Cluster JVM KSQL Server KSQL CLI #1 STAND-ALONE AKA ‘LOCAL MODE’ How to run KSQL

Slide 43

Slide 43 text

How to run KSQL JVM KSQL Server KSQL CLI JVM KSQL Server JVM KSQL Server Kafka Cluster #2 CLIENT-SERVER

Slide 44

Slide 44 text

How to run KSQL Kafka Cluster JVM KSQL Server JVM KSQL Server JVM KSQL Server #3 AS A STANDALONE APPLICATION

Slide 45

Slide 45 text

Resources and Next Steps https://github.com/confluentinc/cp-demo http://confluent.io/ksql https://slackpass.io/confluentcommunity #ksql

Slide 46

Slide 46 text

46 Remember, we want to build 
 APPS not 
 INFRASTRUCTURE

Slide 47

Slide 47 text

@ @gamussa @confluentinc We are hiring! https://www.confluent.io/careers/

Slide 48

Slide 48 text

@ @gamussa @confluentinc Thanks! questions? @gamussa [email protected] We are hiring! https://www.confluent.io/careers/