Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Rapid Deployment with Apache Flink and Ververic...

Rapid Deployment with Apache Flink and Ververica Platform

As a developer, in order to deploy locally tested Apache Flink Job to production it may take a while. Especially when you need to work with DevOps team to provide robust environment with Flink cluster, which can scale horizontally. A Flink cluster must provide all the tooling to monitor and debug job issues easilly. When deployment of the next streaming app managed by others than developers, it slows down the entire business who wants to leverage streaming data and quickly enable new business use cases.

In Ververica, we have taken this issue very seriously by developing Ververica Platform (VVP) which runs Apache Flink with additional unique features. Using VVP UI to create a Job deployment has never been so easy. It also allows Flink SQL developers to create application jobs automatically from VVP SQL Editor. VVP can be installed on your Kubernetes cluster in minutes.

Alexey Novakov

January 10, 2023
Tweet

More Decks by Alexey Novakov

Other Decks in Programming

Transcript

  1. Alexey Novakov Product Solution Architect @ Ververica, Germany - Last

    6 years working in data management - 17 years in Software Development - Distributed Systems, Big/Small/Fast Data - Astronomy, Music
  2. Contents 01 Getting started with Apache Flink and VVP 02

    Flink Application Lifecycle 03 Flink SQL Application Lifecycle 04 Summary
  3. Original Creators of Apache Flink® Enterprise Stream Processing with Ververica

    Platform Subsidiary of Alibaba Group About Ververica
  4. Features: - High-Availability, Incremental checkpointing - Sophisticated late data handling

    - Low latency, High throughput - Scala, Java, SQL and Python APIs - …. and much more Apache Flink
  5. What is Ververica Platform (VVP)? Purpose-built for stateful stream processing

    architectures and makes operating these powerful systems easier than ever before by offering an entirely new experience for developing, deploying, and managing stream processing applications. Ververica’s mission is to ensure that developers invest their time on their core business objectives, not on maintenance and infrastructure.
  6. VVP Pre-requirements Bring-your-own Kubernetes - From Cloud Providers: - AWS

    EKS - Azure AKS - Google GKE - Alibaba Cloud Kubernetes - On-prem cluster, OpenShift - Local development: minikube, k3s
  7. VVP Helm Package $ helm repo add ververica \ https://charts.ververica.com

    $ helm -ns vvp \ install vvp ververica/ververica-platform \ --values values-vvp.yaml *see more https://docs.ververica.com/getting_started/installation.html#setting-up-the-playground
  8. VVP Control Plane $ kubectl get pod -n vvp -l

    app=vvp-ververica-platform NAME READY STATUS RESTARTS AGE vvp-ververica-platform-75c54fcd6d-95wgh 3/3 Running 0 1m Now it is ready to run Flink applications
  9. case class Transaction( accountId: Long, timestamp: Long, amount: Double )

    @main def FraudDetectionJob = val env = StreamExecutionEnvironment.getExecutionEnvironment val transactions = env .addSource(TransactionsSource.iterator) .name("transactions") val alerts = transactions .keyBy(_.accountId) .process(FraudDetector()) .name("fraud-detector") alerts .addSink(AlertSink()) .name("send-alerts") env.execute("Fraud Detection") Step 0: Build JAR file fraud detection logic print to console full source code: https://github.com/novakov-alexey/flink-sandbox
  10. Step 2: Create New Deployment - Option 2 YAML Can

    be submitted via REST API also
  11. Save state for this or another Deployment Create new Deployment

    with the same configuration Monitor running Job Step 4: Manage Deployment
  12. - Rapid creation of Tables, Views, Functions - Embedded Data

    Catalog with Schema Explorer - SQL Validation - Saving as Script SQL Editor
  13. SQL App Example - Query SELECT o.id AS order_id, o.order_time,

    s.shipment_time, TIMESTAMPDIFF(DAY,o.order_time,s.shipment_time) AS day_diff FROM orders o JOIN shipments s ON o.id = s.order_id WHERE o.order_time BETWEEN s.shipment_time - INTERVAL '3' DAY AND s.shipment_time;
  14. VVP New App Workflow Reconfigure, if needed Start Deployment Create

    Deployment Monitor Create SQL Script / Upload Flink JAR
  15. Deployment from Script CREATE TABLE order_shipments ( order_id INT, order_time

    TIMESTAMP, shipment_time TIMESTAMP, day_diff INT, PRIMARY KEY (order_id) NOT ENFORCED ) WITH ( 'connector' = 'upsert-kafka', 'key.format' = 'csv', 'properties.bootstrap.servers' = '...', 'topic' = 'order_shipmenst', 'value.format' = 'csv' ); INSERT INTO order_shipments SELECT o.id AS order_id, o.order_time, s.shipment_time, TIMESTAMPDIFF( DAY,o.order_time,s.shipment_time ) AS day_diff FROM orders o JOIN shipments s ON o.id = s.order_id WHERE o.order_time BETWEEN s.shipment_time - INTERVAL '3' DAY AND s.shipment_time; Sink table:
  16. • Stability: Well-tested pre-configured Flink runtime • Flexibility: Deployable to

    your own K8s cluster • Fast Dev Loop: use SQL editor, use different image versions, etc. • Easier Operations: practical UI to manage all possible Flink settings • ….. VVP Benefits