Rapid Deployment with Apache Flink and Ververica Platform

Rapid Deployment with Apache Flink and Ververica Platform Alexey Novakov,
Ververica Alibaba Cloud Developer Summit 2023

Alexey Novakov Product Solution Architect @ Ververica, Germany - Last
6 years working in data management - 17 years in Software Development - Distributed Systems, Big/Small/Fast Data - Astronomy, Music

Contents 01 Getting started with Apache Flink and VVP 02
Flink Application Lifecycle 03 Flink SQL Application Lifecycle 04 Summary

01 Getting started with Apache Flink and Ververica Platform

Original Creators of Apache Flink® Enterprise Stream Processing with Ververica
Platform Subsidiary of Alibaba Group About Ververica

Features: - High-Availability, Incremental checkpointing - Sophisticated late data handling
- Low latency, High throughput - Scala, Java, SQL and Python APIs - …. and much more Apache Flink

Some Apache Flink Users

What is Ververica Platform (VVP)? Purpose-built for stateful stream processing
architectures and makes operating these powerful systems easier than ever before by offering an entirely new experience for developing, deploying, and managing stream processing applications. Ververica’s mission is to ensure that developers invest their time on their core business objectives, not on maintenance and infrastructure.

Ververica Platform

VVP Components

Ververica Installation

VVP Pre-requirements Bring-your-own Kubernetes - From Cloud Providers: - AWS
EKS - Azure AKS - Google GKE - Alibaba Cloud Kubernetes - On-prem cluster, OpenShift - Local development: minikube, k3s

VVP Helm Package $ helm repo add ververica \ https://charts.ververica.com
$ helm -ns vvp \ install vvp ververica/ververica-platform \ --values values-vvp.yaml *see more https://docs.ververica.com/getting_started/installation.html#setting-up-the-playground

VVP Control Plane $ kubectl get pod -n vvp -l
app=vvp-ververica-platform NAME READY STATUS RESTARTS AGE vvp-ververica-platform-75c54fcd6d-95wgh 3/3 Running 0 1m Now it is ready to run Flink applications

$ kubectl -ns vvp port-forward services/vvp-ververica-platform 8080:80 VVP UI: Deployments

02 Flink Application Lifecycle in VVP

case class Transaction( accountId: Long, timestamp: Long, amount: Double )
@main def FraudDetectionJob = val env = StreamExecutionEnvironment.getExecutionEnvironment val transactions = env .addSource(TransactionsSource.iterator) .name("transactions") val alerts = transactions .keyBy(_.accountId) .process(FraudDetector()) .name("fraud-detector") alerts .addSink(AlertSink()) .name("send-alerts") env.execute("Fraud Detection") Step 0: Build JAR file fraud detection logic print to console full source code: https://github.com/novakov-alexey/flink-sandbox

Step 1: Upload JAR file

Step 2: Create New Deployment 1. Flink Session Cluster 2.
Kubernetes Namespace

Step 2: Create New Deployment - Option 2 YAML Can
be submitted via REST API also

Step 3: Start Deployment

Save state for this or another Deployment Create new Deployment
with the same configuration Monitor running Job Step 4: Manage Deployment

Familiar Tools for Monitoring pod logs

03 Flink SQL Application Lifecycle in VVP

- Rapid creation of Tables, Views, Functions - Embedded Data
Catalog with Schema Explorer - SQL Validation - Saving as Script SQL Editor

Interactive SQL Workflow

SQL App Example - Tables

SQL App Example - Query SELECT o.id AS order_id, o.order_time,
s.shipment_time, TIMESTAMPDIFF(DAY,o.order_time,s.shipment_time) AS day_diff FROM orders o JOIN shipments s ON o.id = s.order_id WHERE o.order_time BETWEEN s.shipment_time - INTERVAL '3' DAY AND s.shipment_time;

SQL App Example - Start/Choose Cluster

SQL App Example - Run Query Real-time result view

Working with Scripts

VVP New App Workflow Reconfigure, if needed Start Deployment Create
Deployment Monitor Create SQL Script / Upload Flink JAR

Deploy SQL job as Deployment

Deployment from Script CREATE TABLE order_shipments ( order_id INT, order_time
TIMESTAMP, shipment_time TIMESTAMP, day_diff INT, PRIMARY KEY (order_id) NOT ENFORCED ) WITH ( 'connector' = 'upsert-kafka', 'key.format' = 'csv', 'properties.bootstrap.servers' = '...', 'topic' = 'order_shipmenst', 'value.format' = 'csv' ); INSERT INTO order_shipments SELECT o.id AS order_id, o.order_time, s.shipment_time, TIMESTAMPDIFF( DAY,o.order_time,s.shipment_time ) AS day_diff FROM orders o JOIN shipments s ON o.id = s.order_id WHERE o.order_time BETWEEN s.shipment_time - INTERVAL '3' DAY AND s.shipment_time; Sink table:

Run Deployment

• Stability: Well-tested pre-configured Flink runtime • Flexibility: Deployable to
your own K8s cluster • Fast Dev Loop: use SQL editor, use different image versions, etc. • Easier Operations: practical UI to manage all possible Flink settings • ….. VVP Benefits

Thank you. Questions? [email protected] www.ververica.com @VervericaData

Rapid Deployment with Apache Flink and Ververic...

Rapid Deployment with Apache Flink and Ververica Platform

More Decks by Alexey Novakov

Other Decks in Programming

Featured

Transcript