Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Rapid Deployment with Apache Flink and Ververica Platform

Rapid Deployment with Apache Flink and Ververica Platform

As a developer, in order to deploy locally tested Apache Flink Job to production it may take a while. Especially when you need to work with DevOps team to provide robust environment with Flink cluster, which can scale horizontally. A Flink cluster must provide all the tooling to monitor and debug job issues easilly. When deployment of the next streaming app managed by others than developers, it slows down the entire business who wants to leverage streaming data and quickly enable new business use cases.

In Ververica, we have taken this issue very seriously by developing Ververica Platform (VVP) which runs Apache Flink with additional unique features. Using VVP UI to create a Job deployment has never been so easy. It also allows Flink SQL developers to create application jobs automatically from VVP SQL Editor. VVP can be installed on your Kubernetes cluster in minutes.

Alexey Novakov

January 10, 2023
Tweet

More Decks by Alexey Novakov

Other Decks in Programming

Transcript

  1. Rapid Deployment
    with Apache Flink
    and Ververica
    Platform
    Alexey Novakov, Ververica
    Alibaba Cloud Developer Summit 2023

    View Slide

  2. Alexey Novakov
    Product Solution Architect
    @ Ververica, Germany
    - Last 6 years working in data management
    - 17 years in Software Development
    - Distributed Systems, Big/Small/Fast Data
    - Astronomy, Music

    View Slide

  3. Contents
    01 Getting started with Apache Flink and VVP
    02 Flink Application Lifecycle
    03 Flink SQL Application Lifecycle
    04 Summary

    View Slide

  4. 01
    Getting started with
    Apache Flink and
    Ververica Platform

    View Slide

  5. Original Creators of
    Apache Flink®
    Enterprise Stream Processing
    with Ververica Platform
    Subsidiary of
    Alibaba Group
    About Ververica

    View Slide

  6. Features:
    - High-Availability, Incremental checkpointing
    - Sophisticated late data handling
    - Low latency, High throughput
    - Scala, Java, SQL and Python APIs
    - …. and much more
    Apache Flink

    View Slide

  7. Some Apache Flink Users

    View Slide

  8. What is Ververica Platform (VVP)?
    Purpose-built for stateful stream processing architectures and
    makes operating these powerful systems easier than ever
    before by offering an entirely new experience for developing,
    deploying, and managing stream processing applications.
    Ververica’s mission is to ensure that developers invest their time
    on their core business objectives, not on maintenance and
    infrastructure.

    View Slide

  9. Ververica Platform

    View Slide

  10. VVP Components

    View Slide

  11. Ververica Installation

    View Slide

  12. VVP Pre-requirements
    Bring-your-own Kubernetes
    - From Cloud Providers:
    - AWS EKS
    - Azure AKS
    - Google GKE
    - Alibaba Cloud Kubernetes
    - On-prem cluster, OpenShift
    - Local development: minikube, k3s

    View Slide

  13. VVP Helm Package
    $ helm repo add ververica \
    https://charts.ververica.com
    $ helm -ns vvp \
    install vvp ververica/ververica-platform \
    --values values-vvp.yaml
    *see more https://docs.ververica.com/getting_started/installation.html#setting-up-the-playground

    View Slide

  14. VVP Control Plane
    $ kubectl get pod -n vvp -l app=vvp-ververica-platform
    NAME READY STATUS RESTARTS AGE
    vvp-ververica-platform-75c54fcd6d-95wgh 3/3 Running 0 1m
    Now it is ready to run Flink applications

    View Slide

  15. $ kubectl -ns vvp port-forward services/vvp-ververica-platform 8080:80
    VVP UI: Deployments

    View Slide

  16. 02
    Flink Application Lifecycle in VVP

    View Slide

  17. case class Transaction(
    accountId: Long,
    timestamp: Long,
    amount: Double
    )
    @main def FraudDetectionJob =
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    val transactions = env
    .addSource(TransactionsSource.iterator)
    .name("transactions")
    val alerts = transactions
    .keyBy(_.accountId)
    .process(FraudDetector())
    .name("fraud-detector")
    alerts
    .addSink(AlertSink())
    .name("send-alerts")
    env.execute("Fraud Detection")
    Step 0: Build JAR file
    fraud detection logic
    print to console
    full source code: https://github.com/novakov-alexey/flink-sandbox

    View Slide

  18. Step 1: Upload JAR file

    View Slide

  19. Step 2: Create New Deployment
    1. Flink Session Cluster
    2. Kubernetes Namespace

    View Slide

  20. Step 2: Create New Deployment - Option 2 YAML
    Can be submitted
    via REST API also

    View Slide

  21. Step 3: Start Deployment

    View Slide

  22. Save state for this or
    another Deployment
    Create new Deployment
    with the same
    configuration
    Monitor running Job
    Step 4: Manage Deployment

    View Slide

  23. Familiar Tools for Monitoring
    pod logs

    View Slide

  24. 03
    Flink SQL Application Lifecycle in VVP

    View Slide

  25. - Rapid creation of Tables, Views,
    Functions
    - Embedded Data Catalog with
    Schema Explorer
    - SQL Validation
    - Saving as Script
    SQL Editor

    View Slide

  26. Interactive SQL Workflow

    View Slide

  27. SQL App Example - Tables

    View Slide

  28. SQL App Example - Query
    SELECT
    o.id AS order_id,
    o.order_time,
    s.shipment_time,
    TIMESTAMPDIFF(DAY,o.order_time,s.shipment_time) AS day_diff
    FROM orders o
    JOIN shipments s ON o.id = s.order_id
    WHERE
    o.order_time
    BETWEEN
    s.shipment_time - INTERVAL '3' DAY AND s.shipment_time;

    View Slide

  29. SQL App Example - Start/Choose Cluster

    View Slide

  30. SQL App Example - Run Query
    Real-time
    result view

    View Slide

  31. Working with Scripts

    View Slide

  32. VVP New App Workflow
    Reconfigure,
    if needed
    Start
    Deployment
    Create
    Deployment
    Monitor
    Create SQL
    Script / Upload
    Flink JAR

    View Slide

  33. Deploy SQL job as Deployment

    View Slide

  34. Deployment from Script
    CREATE TABLE order_shipments (
    order_id INT,
    order_time TIMESTAMP,
    shipment_time TIMESTAMP,
    day_diff INT,
    PRIMARY KEY (order_id) NOT ENFORCED
    )
    WITH (
    'connector' = 'upsert-kafka',
    'key.format' = 'csv',
    'properties.bootstrap.servers' = '...',
    'topic' = 'order_shipmenst',
    'value.format' = 'csv'
    );
    INSERT INTO order_shipments
    SELECT
    o.id AS order_id,
    o.order_time,
    s.shipment_time,
    TIMESTAMPDIFF(
    DAY,o.order_time,s.shipment_time
    ) AS day_diff
    FROM orders o
    JOIN shipments s ON o.id = s.order_id
    WHERE
    o.order_time
    BETWEEN
    s.shipment_time - INTERVAL '3' DAY
    AND
    s.shipment_time;
    Sink table:

    View Slide

  35. Run Deployment

    View Slide

  36. ● Stability: Well-tested
    pre-configured Flink
    runtime
    ● Flexibility: Deployable to
    your own K8s cluster
    ● Fast Dev Loop: use SQL
    editor, use different
    image versions, etc.
    ● Easier Operations:
    practical UI to manage all
    possible Flink settings
    ● …..
    VVP Benefits

    View Slide

  37. Thank you.
    Questions?
    [email protected]
    www.ververica.com
    @VervericaData

    View Slide