Slide 1

Slide 1 text

Olena Kutsenko - Francesco Tisiot - Dev Advocates @OlenaKutsenko - @ftisiot Build an Open Source Streaming Data Pipeline

Slide 2

Slide 2 text

@OlenaKutsenko | @ftisiot | @aiven_io

Slide 3

Slide 3 text

@OlenaKutsenko | @ftisiot | @aiven_io

Slide 4

Slide 4 text

@OlenaKutsenko | @ftisiot | @aiven_io

Slide 5

Slide 5 text

@OlenaKutsenko | @ftisiot | @aiven_io What is Apache Kafka?

Slide 6

Slide 6 text

@OlenaKutsenko | @ftisiot | @aiven_io

Slide 7

Slide 7 text

@OlenaKutsenko | @ftisiot | @aiven_io Topic A Topic B 0 1 2 3 4 0 1 2 3 Producer Consumer Producer Consumer Consumer

Slide 8

Slide 8 text

@OlenaKutsenko | @ftisiot | @aiven_io Brokers Replication Factor 3 2

Slide 9

Slide 9 text

@OlenaKutsenko | @ftisiot | @aiven_io Brokers Producer Consumer

Slide 10

Slide 10 text

@OlenaKutsenko | @ftisiot | @aiven_io Integrating Apache Kafka

Slide 11

Slide 11 text

@OlenaKutsenko | @ftisiot | @aiven_io Kafka Connect Source Kafka Connect Sink

Slide 12

Slide 12 text

@OlenaKutsenko | @ftisiot | @aiven_io kafka-python from kafka import KafkaProducer producer = KafkaProducer( bootstrap_servers=['broker1:1234'] ) producer.send( 'my-topic-name', b'my-message' ) producer.flush()

Slide 13

Slide 13 text

@OlenaKutsenko | @ftisiot | @aiven_io { "id": 1, "shop": “Mario's Pizza", "name": "Arsenio Pisaroni-Boccaccio", "phoneNumber": "+39 51 0290746", "address": "Via Ugo 01, Montegrotto, 85639 Padova(PD)", "pizzas": [ { "pizzaName": "Margherita", "additionalToppings": ["ham"] }, { "pizzaName": "Diavola", "additionalToppings": ["mozzarella","banana","onion"] }] } https:/ /github.com/aiven/python-fake-data-producer-for-apache-kafka

Slide 14

Slide 14 text

@OlenaKutsenko | @ftisiot | @aiven_io Compute State

Slide 15

Slide 15 text

@OlenaKutsenko | @ftisiot | @aiven_io Apache Flink

Slide 16

Slide 16 text

@OlenaKutsenko | @ftisiot | @aiven_io

Slide 17

Slide 17 text

@OlenaKutsenko | @ftisiot | @aiven_io Filter Join Aggregate Explode Detect Transform

Slide 18

Slide 18 text

@OlenaKutsenko | @ftisiot | @aiven_io

Slide 19

Slide 19 text

@OlenaKutsenko | @ftisiot | @aiven_io Flink in Action

Slide 20

Slide 20 text

@OlenaKutsenko | @ftisiot | @aiven_io { "id": 1, "shop": “Mario's Pizza", "name": "Arsenio Pisaroni-Boccaccio", "phoneNumber": "+39 51 0290746", "address": "Via Ugo 01, Montegrotto, 85639 Padova(PD)", "pizzas": [ { "pizzaName": "Margherita", "additionalToppings": ["ham"] }] } pizza_name base_price Marinara 4 Diavola 6 Mari & Monti 8 Salami 7 Peperoni 8 Margherita 5

Slide 21

Slide 21 text

@OlenaKutsenko | @ftisiot | @aiven_io CREATE TABLE pizza_orders ( id INT, shop VARCHAR, name VARCHAR, phoneNumber VARCHAR, address VARCHAR, pizzas ARRAY )> ) CREATE TABLE pizza_orders ( id INT, shop VARCHAR, name VARCHAR, phoneNumber VARCHAR, address VARCHAR, pizzas ARRAY )> ) WITH ( 'connector' = 'kafka', 'properties.bootstrap.servers' = ‘kafka:13041', 'topic' = 'pizza-orders', 'scan.startup.mode' = 'earliest-offset', Kafka Source

Slide 22

Slide 22 text

@OlenaKutsenko | @ftisiot | @aiven_io CREATE TEMPORARY TABLE pizza_prices ( pizza_name VARCHAR, base_price INT, PRIMARY KEY (pizza_name) NOT ENFORCED ) CREATE TEMPORARY TABLE pizza_prices ( pizza_name VARCHAR, base_price INT, PRIMARY KEY (pizza_name) NOT ENFORCED ) WITH ( 'connector' = 'jdbc', 'url' = ‘jdbc:postgresql:/pghost:13039/db', 'username'='avnadmin', 'password'='verysecurepassword123', 'table-name' = 'pizza_price' ); Pg Source

Slide 23

Slide 23 text

@OlenaKutsenko | @ftisiot | @aiven_io Pg Target CREATE TABLE order_price ( id INT, total_price BIGINT, PRIMARY KEY (id) NOT ENFORCED ) CREATE TABLE order_price ( id INT, total_price BIGINT, PRIMARY KEY (id) NOT ENFORCED ) WITH ( 'connector' = 'jdbc', 'url' = 'jdbc:postgresql://pghost:13039/db', 'username'='avnadmin', 'password'='verysecurepassword123', 'table-name' = 'order_price' );

Slide 24

Slide 24 text

@OlenaKutsenko | @ftisiot | @aiven_io Create Pipeline insert into order_price insert into order_price select id, sum(base_price) total_price group by id; insert into order_price select id, sum(base_price) total_price from pizza_orders cross join UNNEST(pizzas) b group by id; insert into order_price select id, sum(base_price) total_price from pizza_orders cross join UNNEST(pizzas) b LEFT OUTER JOIN pizza_prices FOR SYSTEM_TIME AS OF orderProctime AS pp ON b.pizzaName = pp.pizza_name group by id;

Slide 25

Slide 25 text

@OlenaKutsenko | @ftisiot | @aiven_io

Slide 26

Slide 26 text

@OlenaKutsenko | @ftisiot | @aiven_io References https://aiven.io http://flink.apache.org/ https://aiven.io/blog/create-your-own- data-stream-for-kafka-with-python-and- faker https://kafka.apache.org/ https://github.com/aiven/sql-cli-for- apache-flink-docker