Rapid Deployment with Apache Flink and Ververica Platform

Slide 1

Slide 1 text

Rapid Deployment with Apache Flink and Ververica Platform Alexey Novakov, Ververica Alibaba Cloud Developer Summit 2023

Slide 2

Slide 2 text

Alexey Novakov Product Solution Architect @ Ververica, Germany - Last 6 years working in data management - 17 years in Software Development - Distributed Systems, Big/Small/Fast Data - Astronomy, Music

Slide 3

Slide 3 text

Contents 01 Getting started with Apache Flink and VVP 02 Flink Application Lifecycle 03 Flink SQL Application Lifecycle 04 Summary

Slide 4

Slide 4 text

01 Getting started with Apache Flink and Ververica Platform

Slide 5

Slide 5 text

Original Creators of Apache Flink® Enterprise Stream Processing with Ververica Platform Subsidiary of Alibaba Group About Ververica

Slide 6

Slide 6 text

Features: - High-Availability, Incremental checkpointing - Sophisticated late data handling - Low latency, High throughput - Scala, Java, SQL and Python APIs - …. and much more Apache Flink

Slide 7

Slide 7 text

Some Apache Flink Users

Slide 8

Slide 8 text

What is Ververica Platform (VVP)? Purpose-built for stateful stream processing architectures and makes operating these powerful systems easier than ever before by offering an entirely new experience for developing, deploying, and managing stream processing applications. Ververica’s mission is to ensure that developers invest their time on their core business objectives, not on maintenance and infrastructure.

Slide 9

Slide 9 text

Ververica Platform

Slide 10

Slide 10 text

VVP Components

Slide 11

Slide 11 text

Ververica Installation

Slide 12

Slide 12 text

VVP Pre-requirements Bring-your-own Kubernetes - From Cloud Providers: - AWS EKS - Azure AKS - Google GKE - Alibaba Cloud Kubernetes - On-prem cluster, OpenShift - Local development: minikube, k3s

Slide 13

Slide 13 text

VVP Helm Package $ helm repo add ververica \ https://charts.ververica.com $ helm -ns vvp \ install vvp ververica/ververica-platform \ --values values-vvp.yaml *see more https://docs.ververica.com/getting_started/installation.html#setting-up-the-playground

Slide 14

Slide 14 text

VVP Control Plane $ kubectl get pod -n vvp -l app=vvp-ververica-platform NAME READY STATUS RESTARTS AGE vvp-ververica-platform-75c54fcd6d-95wgh 3/3 Running 0 1m Now it is ready to run Flink applications

Slide 15

Slide 15 text

$ kubectl -ns vvp port-forward services/vvp-ververica-platform 8080:80 VVP UI: Deployments

Slide 16

Slide 16 text

02 Flink Application Lifecycle in VVP

Slide 17

Slide 17 text

case class Transaction( accountId: Long, timestamp: Long, amount: Double ) @main def FraudDetectionJob = val env = StreamExecutionEnvironment.getExecutionEnvironment val transactions = env .addSource(TransactionsSource.iterator) .name("transactions") val alerts = transactions .keyBy(_.accountId) .process(FraudDetector()) .name("fraud-detector") alerts .addSink(AlertSink()) .name("send-alerts") env.execute("Fraud Detection") Step 0: Build JAR file fraud detection logic print to console full source code: https://github.com/novakov-alexey/flink-sandbox

Slide 18

Slide 18 text

Step 1: Upload JAR file

Slide 19

Slide 19 text

Step 2: Create New Deployment 1. Flink Session Cluster 2. Kubernetes Namespace

Slide 20

Slide 20 text

Step 2: Create New Deployment - Option 2 YAML Can be submitted via REST API also

Slide 21

Slide 21 text

Step 3: Start Deployment

Slide 22

Slide 22 text

Save state for this or another Deployment Create new Deployment with the same configuration Monitor running Job Step 4: Manage Deployment

Slide 23

Slide 23 text

Familiar Tools for Monitoring pod logs

Slide 24

Slide 24 text

03 Flink SQL Application Lifecycle in VVP

Slide 25

Slide 25 text

- Rapid creation of Tables, Views, Functions - Embedded Data Catalog with Schema Explorer - SQL Validation - Saving as Script SQL Editor

Slide 26

Slide 26 text

Interactive SQL Workflow

Slide 27

Slide 27 text

SQL App Example - Tables

Slide 28

Slide 28 text

SQL App Example - Query SELECT o.id AS order_id, o.order_time, s.shipment_time, TIMESTAMPDIFF(DAY,o.order_time,s.shipment_time) AS day_diff FROM orders o JOIN shipments s ON o.id = s.order_id WHERE o.order_time BETWEEN s.shipment_time - INTERVAL '3' DAY AND s.shipment_time;

Slide 29

Slide 29 text

SQL App Example - Start/Choose Cluster

Slide 30

Slide 30 text

SQL App Example - Run Query Real-time result view

Slide 31

Slide 31 text

Working with Scripts

Slide 32

Slide 32 text

VVP New App Workflow Reconfigure, if needed Start Deployment Create Deployment Monitor Create SQL Script / Upload Flink JAR

Slide 33

Slide 33 text

Deploy SQL job as Deployment

Slide 34

Slide 34 text

Deployment from Script CREATE TABLE order_shipments ( order_id INT, order_time TIMESTAMP, shipment_time TIMESTAMP, day_diff INT, PRIMARY KEY (order_id) NOT ENFORCED ) WITH ( 'connector' = 'upsert-kafka', 'key.format' = 'csv', 'properties.bootstrap.servers' = '...', 'topic' = 'order_shipmenst', 'value.format' = 'csv' ); INSERT INTO order_shipments SELECT o.id AS order_id, o.order_time, s.shipment_time, TIMESTAMPDIFF( DAY,o.order_time,s.shipment_time ) AS day_diff FROM orders o JOIN shipments s ON o.id = s.order_id WHERE o.order_time BETWEEN s.shipment_time - INTERVAL '3' DAY AND s.shipment_time; Sink table:

Slide 35

Slide 35 text

Run Deployment

Slide 36

Slide 36 text

● Stability: Well-tested pre-configured Flink runtime ● Flexibility: Deployable to your own K8s cluster ● Fast Dev Loop: use SQL editor, use different image versions, etc. ● Easier Operations: practical UI to manage all possible Flink settings ● ….. VVP Benefits

Slide 37

Slide 37 text

Thank you. Questions? [email protected] www.ververica.com @VervericaData