Slide 1

Slide 1 text

Rapid Deployment with Apache Flink and Ververica Platform Alexey Novakov Frankfurt, May 2023 Frankfurt Meetup

Slide 2

Slide 2 text

Alexey Novakov Product Solution Architect @ Ververica, Germany - Last 6 years working in data management - 17 years in Software Development - Distributed Systems, Big/Small/Fast Data - Astronomy, Music

Slide 3

Slide 3 text

Contents 01 Getting started with Apache Flink and VVP 02 Flink Application Lifecycle 03 VVP Integrations 04 Summary

Slide 4

Slide 4 text

01 Getting started with Apache Flink and Ververica

Slide 5

Slide 5 text

Original Creators of Apache Flink® Enterprise Stream Processing with Ververica Platform Subsidiary of Alibaba Group About Ververica

Slide 6

Slide 6 text

Features: - High-Availability, Incremental checkpointing - Sophisticated late data handling - Low latency, High throughput - Scala, Java, SQL and Python APIs - …. and much more Apache Flink

Slide 7

Slide 7 text

Some Apache Flink Users

Slide 8

Slide 8 text

What is Ververica Platform (VVP)? Purpose-built for stateful stream processing architectures and makes operating these powerful systems easier than ever before by offering an entirely new experience for developing, deploying, and managing stream processing applications. Ververica’s mission is to ensure that developers invest their time on their core business objectives, not on maintenance and infrastructure.

Slide 9

Slide 9 text

Ververica Platform

Slide 10

Slide 10 text

VVP Components

Slide 11

Slide 11 text

Ververica Installation

Slide 12

Slide 12 text

VVP Pre-requirements Bring-your-own Kubernetes - From Cloud Providers: - AWS EKS - Azure AKS - Google GKE - On-prem cluster, OpenShift - Local development: minikube, k3s

Slide 13

Slide 13 text

VVP Helm Package $ helm repo add ververica https://charts.ververica.com $ helm -ns vvp \ install vvp ververica/ververica-platform \ --values values-vvp.yaml *see more https://docs.ververica.com/getting_started/installation.html#setting-up-the-playground

Slide 14

Slide 14 text

VVP Control Plane $ kubectl get pod -n vvp -l app=vvp-ververica-platform NAME READY STATUS RESTARTS AGE vvp-ververica-platform-75c54fcd6d-95wgh 3/3 Running 0 1m Now it is ready to run Flink applications

Slide 15

Slide 15 text

$ kubectl -ns vvp port-forward services/vvp-ververica-platform 8080:80 VVP UI: Deployments

Slide 16

Slide 16 text

02 Flink Application Lifecycle in VVP

Slide 17

Slide 17 text

User Workflow in VVP New App Deployment Reconfigure, if needed Start Deployment Upload Flink JAR / Python Script Create Deployment Monitor Create SQL Script /

Slide 18

Slide 18 text

@main def FraudDetectionJob = val env = StreamExecutionEnvironment.getExecutionEnvironment val transactions = env .addSource(TransactionsSource.iterator) .name("transactions") val alerts = transactions .keyBy(_.accountId) .process(FraudDetector()) .name("fraud-detector") alerts .addSink(AlertSink()) .name("send-alerts") env.execute("Fraud Detection") Step 0: Build JAR file some fraud detection logic print to console case class Transaction( accountId: Long, timestamp: Long, amount: Double ) Full source code: https://github.com/novakov-alexey/flink-sandbox

Slide 19

Slide 19 text

Step 1: Upload JAR file

Slide 20

Slide 20 text

Step 2: Create New Deployment 1. Flink Session Cluster 2. Kubernetes Namespace

Slide 21

Slide 21 text

Step 2: Create New Deployment - Option 2 YAML Can be submitted via REST API also. (K8s CRD is coming soon)

Slide 22

Slide 22 text

Step 2: Create New Deployment - Option 3 REST API POST vvp-resources/deployment_target.yaml to VVP REST API to create the Deployment Target: Afterwards, you can POST vvp-resources/deployment.yaml to the REST API to create the Deployment: $ curl localhost:8080/api/v1/namespaces/default/deployment-targets \ -X POST \ -H "Content-Type: application/yaml" \ -H "Accept: application/yaml" \ --data-binary @vvp-resources/deployment_target.yaml $ curl localhost:8080/api/v1/namespaces/default/deployments \ -X POST \ -H "Content-Type: application/yaml" \ -H "Accept: application/yaml" \ --data-binary @vvp-resources/deployment.yaml

Slide 23

Slide 23 text

Step 3: Start Deployment

Slide 24

Slide 24 text

Save state for another Deployment Create new Deployment with the same configuration Monitor running Job Step 4: Manage Deployment

Slide 25

Slide 25 text

Familiar Tools for Monitoring kubectl logs

Slide 26

Slide 26 text

All Pods in VVP Namespace Optional Pods

Slide 27

Slide 27 text

03 VVP Integration

Slide 28

Slide 28 text

VVP Universal Blob Storage • Minio is one of the option for Universal Blob Storage • Minio is useful at the development phase • Blob Storage is used for: - Code Artifacts - Flink checkpoints & savepoints https://docs.ververica.com/getting_started/installation.html#setting-up-the-playground

Slide 29

Slide 29 text

Supportted Storage Supported Storage Services AWS S3 s3:// Microsoft ABS wasbs:// Apache Hadoop® HDFS hdfs:// Google GCS gs:// Alibaba OSS oss:// Microsoft ABS Workload Identity wiaz:// values.yaml: ### Configure Minio for Universal Blob Storage vvp: blobStorage: baseUri: s3://vvp s3: endpoint: http://minio.vvp.svc:9000

Slide 30

Slide 30 text

Access Control Authenticaiton • OpenID Connect • SAML Authorization • Roles: viewer, editor, owner, admin Resource Viewer Editor Owner Admin Artifacts List, GetMetadata All All None ApiToken None None All None DeploymentDefaults Get Get All None DeploymentTarget List, Get List, Get All None Apache Flink® UI Get* All All None Namespace None None Get, Update All SecretValue None All All None All Others List, Get All All None

Slide 31

Slide 31 text

Metrics Ververica Platform bundles metrics reporters for: • Prometheus • InfluxDB • Datadog • Prometheus Pushgateway • Graphite • StatsD • SLF4J spec: template: spec: flinkConfiguration: metrics.reporters: prometheus metrics.reporter.prometheus.class: org.apache.flink.metrics.prometheus.PrometheusReporter metrics.reporter.prometheus.port: 9249

Slide 32

Slide 32 text

● Stability: Well-tested pre-configured Flink runtime ● Flexibility: Deployable to your own K8s cluster ● Integration: most popular 3-rd party systems are integrated (S3, AD, OAuth2, etc). ● Easier Operations: practical UI and REST API. Soon K8s Operator and CRD VVP Benefits

Slide 33

Slide 33 text

To Learn More Ververica Blog: https://www.ververica.com/blog VVP documentation https://docs.ververica.com/index.html Knowledge Base: https://ververica.zendesk.com/hc/en-us

Slide 34

Slide 34 text

34 Ververica Cloud Ververica Cloud (beta) Ultra-high performance cloud- native service for real-time data processing based on Apache Flink. Sign up for free If you do not want to run yourselves on K8s, then try …

Slide 35

Slide 35 text

Thank you [email protected] www.ververica.com @VervericaData