Slide 1

Slide 1 text

Apache Kafka and Flink Stateful Streaming Data Pipelines made easy with SQL Francesco Tisiot - Developer Advocate @ftisiot

Slide 2

Slide 2 text

@ftisiot

Slide 3

Slide 3 text

@ftisiot

Slide 4

Slide 4 text

@ftisiot

Slide 5

Slide 5 text

@ftisiot

Slide 6

Slide 6 text

@ftisiot What is Apache Kafka?

Slide 7

Slide 7 text

@ftisiot Topic A Topic B 0 1 2 3 4 0 1 2 3 Producer Consumer Producer Consumer Consumer

Slide 8

Slide 8 text

@ftisiot Brokers Replication Factor 3 2

Slide 9

Slide 9 text

@ftisiot Integrating Apache Kafka

Slide 10

Slide 10 text

@ftisiot Kafka Connect Source Kafka Connect Sink

Slide 11

Slide 11 text

@ftisiot from kafka import KafkaProducer kafka-python producer = KafkaProducer( bootstrap_servers=['broker1:1234'] ) producer.send( 'my-topic-name', b'my-message' ) producer.flush()

Slide 12

Slide 12 text

@ftisiot { "id": 1, "shop": “Mario's Pizza", "name": "Arsenio Pisaroni-Boccaccio", "phoneNumber": "+39 51 0290746", "address": "Via Ugo 01, Montegrotto, 85639 Padova(PD)", "pizzas": [ { "pizzaName": "Margherita", "additionalToppings": ["ham"] }, { "pizzaName": "Diavola", "additionalToppings": ["mozzarella","banana","onion"] }] } https://github.com/aiven/kafka-python-fake-data-producer

Slide 13

Slide 13 text

@ftisiot

Slide 14

Slide 14 text

@ftisiot Database Apache Kafka

Slide 15

Slide 15 text

@ftisiot Table Log vs The Fridge Dilemma

Slide 16

Slide 16 text

@ftisiot Table Log 3 6 +7 +3 +6 -5 -2 +5 5

Slide 17

Slide 17 text

@ftisiot Create Stateful Apps

Slide 18

Slide 18 text

@ftisiot Kafka Streams Faust KSQL Apache Flink

Slide 19

Slide 19 text

@ftisiot

Slide 20

Slide 20 text

@ftisiot

Slide 21

Slide 21 text

@ftisiot SQL Table API DataStream API

Slide 22

Slide 22 text

@ftisiot Filter Join Aggregate Explode Detect Change Shape

Slide 23

Slide 23 text

@ftisiot

Slide 24

Slide 24 text

@ftisiot Connect Flink

Slide 25

Slide 25 text

@ftisiot { "id": 1, "shop": “Mario's Pizza", "name": "Arsenio Pisaroni-Boccaccio", "phoneNumber": "+39 51 0290746", "address": "Via Ugo 01, Montegrotto, 85639 Padova(PD)", "pizzas": [ { "pizzaName": "Margherita", "additionalToppings": ["ham"] }] } pizza_name base_price Marinara 4 Diavola 6 Mari & Monti 8 Salami 7 Peperoni 8 Margherita 5

Slide 26

Slide 26 text

@ftisiot CREATE TABLE pizza_orders ( id INT, shop VARCHAR, name VARCHAR, phoneNumber VARCHAR, address VARCHAR, pizzas ARRAY )> ) CREATE TABLE pizza_orders ( id INT, shop VARCHAR, name VARCHAR, phoneNumber VARCHAR, address VARCHAR, pizzas ARRAY )> ) WITH ( 'connector' = 'kafka', 'properties.bootstrap.servers' = ‘kafka:13041', 'topic' = 'pizza-orders', 'scan.startup.mode' = 'earliest-offset', … ); Kafka Source

Slide 27

Slide 27 text

@ftisiot CREATE TEMPORARY TABLE pizza_prices ( pizza_name VARCHAR, base_price INT, PRIMARY KEY (pizza_name) NOT ENFORCED ) CREATE TEMPORARY TABLE pizza_prices ( pizza_name VARCHAR, base_price INT, PRIMARY KEY (pizza_name) NOT ENFORCED ) WITH ( 'connector' = 'jdbc', 'url' = ‘jdbc:postgresql:/pghost:13039/db', 'username'='avnadmin', 'password'='verysecurepassword123', 'table-name' = 'pizza_price' ); Pg Source

Slide 28

Slide 28 text

@ftisiot CREATE TABLE order_price ( id INT, pizza_name VARCHAR, base_price INT, nr_pizzas BIGINT NOT NULL, PRIMARY KEY (id, pizza_name) NOT ENFORCED ) Pg Tgt CREATE TABLE order_price ( id INT, pizza_name VARCHAR, base_price INT, nr_pizzas BIGINT NOT NULL, PRIMARY KEY (id, pizza_name) NOT ENFORCED ) WITH ( 'connector' = 'jdbc', 'url' = ‘jdbc:postgresql://pghost:13039/db', 'username'='avnadmin', 'password'='verysecurepassword123', 'table-name' = 'order_price' );

Slide 29

Slide 29 text

@ftisiot Create Pipeline insert into order_price insert into order_price select id, b.pizzaName, base_price, count(*) nr_pizzas from pizza_orders cross join UNNEST(pizzas) b insert into order_price select id, b.pizzaName, base_price, count(*) nr_pizzas from pizza_orders cross join UNNEST(pizzas) b LEFT OUTER JOIN pizza_prices FOR SYSTEM_TIME AS OF orderProctime AS pp ON b.pizzaName = pp.pizza_name insert into order_price select id, b.pizzaName, base_price, count(*) nr_pizzas from pizza_orders cross join UNNEST(pizzas) b LEFT OUTER JOIN pizza_prices FOR SYSTEM_TIME AS OF orderProctime AS pp ON b.pizzaName = pp.pizza_name group by id, b.pizzaName, base_price;

Slide 30

Slide 30 text

@ftisiot

Slide 31

Slide 31 text

@ftisiot

Slide 32

Slide 32 text

@ftisiot Resources https://aiven.io http://flink.apache.org/ https://aiven.io/blog/create-your-own-data-stream-for- kafka-with-python-and-faker https://kafka.apache.org/ https://github.com/aiven/flink-sql-cli-docker