Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Stateful Functions – Building general-purpose Applications and Services on Apache Flink

Stateful Functions – Building general-purpose Applications and Services on Apache Flink

The Slides from my keynote at Flink Forward Europe 2019 in Berlin

The presentation introduces "Stateful Functions", a new library to use Apache Flink for general purpose applications. It brings together ideas from Stateful Stream Processing and FaaS to create a new way of building Stateful Applications.

As an introduction, the talk shows the growth of the Flink community in the last year, and recaps some of the work on streaming data processing.

Stephan Ewen

October 08, 2019
Tweet

Other Decks in Programming

Transcript

  1. © 2019 Ververica
    Stephan Ewen
    Co-founder, CTO @ Ververica
    Apache Flink PMC Member
    Stateful Functions –
    Building general-purpose Applications and Services on
    Apache Flink

    View full-size slide

  2. © 2019 Ververica
    The State of
    Apache Flink

    View full-size slide

  3. © 2019 Ververica
    Passed 10,000 Stars
    on GitHub in August

    View full-size slide

  4. © 2019 Ververica
    Top 3 project in
    Apache, by
    mailing list activity
    … and top 7 by
    commit activity
    Source: Apache Annual Report 2018, https://s3.amazonaws.com/files-dist/AnnualReports/FY2018%20Annual%20Report.pdf

    View full-size slide

  5. © 2019 Ververica
    Flink 1.9 is the biggest
    Apache Flink release
    to date

    View full-size slide

  6. © 2019 Ververica
    The last six months, feature-wise
    State-of-the-Art Batch Processing
    On a Stream Processor

    View full-size slide

  7. © 2019 Ververica
    7
    Stand back!
    I’m going to run a batch job…
    Flink ≤ 1.8 Flink 1.9 / 1.10
    Batching it like a pro there…

    View full-size slide

  8. © 2019 Ververica
    8
    Batch on Streaming in Apache Flink 1.9
    Fine-grained batch
    fault tolerance
    Table API Restructuring
    Blink Query Engine
    Python Table API
    Catalogs
    Hive Table Support Unified Operator Runtime

    View full-size slide

  9. © 2019 Ververica
    9
    Batch / Streaming - Features in Progress (selection)
    Full Hive compatibility
    Python UDFs
    Interactive Programs
    Better Memory Management
    for Streaming State Backends
    New Scheduler
    Resource Profile Support
    Machine Learning Pipelines
    Unaligned Checkpoints
    Unified Source API

    View full-size slide

  10. © 2019 Ververica
    10
    API Stack in Flink 1.9
    Flink Task Runtime
    batch env. stream env.
    DataSet
    batch
    DataStream
    streaming
    batch & streaming
    StreamTransformation
    Old Flink Query Proc. Blink Query Proc.
    batch & streaming
    SQL / Table API
    batch & streaming

    View full-size slide

  11. © 2019 Ververica
    11
    API Stack future goal
    Flink Task Runtime
    DataStream
    batch & streaming
    batch & streaming
    StreamTransformation
    Blink Query Proc.
    batch & streaming
    SQL / Table API
    batch & streaming

    View full-size slide

  12. © 2019 Ververica
    Let’s look at building Applications

    View full-size slide

  13. © 2019 Ververica
    13
    Stream Processing
    offline | real-time
    Data Processing
    event-driven | databases
    Applications
    Stream Processing is at the Intersection
    of Data Processing and Applications

    View full-size slide

  14. © 2019 Ververica
    14
    Building an Application Today

    View full-size slide

  15. © 2019 Ververica
    15
    Building an Application Today
    The big trend: Serverless
    FaaS

    View full-size slide

  16. © 2019 Ververica
    16
    Functions as a Service
    λ
    an event-driven function

    View full-size slide

  17. © 2019 Ververica
    17
    Functions as a Service
    λ
    λ
    λ λ λ
    λ
    λ
    λ
    λ λ λ
    λ λ
    elastically scalable “lightweight resource footprint”

    View full-size slide

  18. © 2019 Ververica
    18
    Functions as a Service – Handling State in Applications
    λ
    λ
    λ λ λ
    λ
    λ
    λ
    λ λ λ
    λ λ
    state
    consistency?
    scaling the
    database?
    connections,
    request rates, …
    often bottlenecked
    by state access & I/O

    View full-size slide

  19. © 2019 Ververica
    19
    Handling state remains a challenge for applications,
    also in the serverless world.

    View full-size slide

  20. © 2019 Ververica
    20
    Composition of Functions
    λ
    λ
    λ
    λ
    λ
    Not straightforward to build more complex applications
    Lack of messaging / composition primitives
    workflows of functions as a workaround, but not a general solution

    View full-size slide

  21. © 2019 Ververica
    21
    state management
    composable
    Stream Processing
    ...that sound like…
    event-driven

    View full-size slide

  22. © 2019 Ververica
    22
    Stream Processing F-a-a-S
    λ
    λ
    λ
    λ
    simplicity / generality
    state management
    composability
    lightweight resources
    performance
    event-driven
    Can we combine some
    of these properties
    ?

    View full-size slide

  23. © 2019 Ververica
    23
    …we announce…
    Today…

    View full-size slide

  24. © 2019 Ververica
    24
    Bringing together ideas from Stateful Stream Processing and FaaS
    to create a new way of building Stateful Applications
    https://statefun.io/

    View full-size slide

  25. © 2019 Ververica
    25
    Stateful Functions
    f(a,b)
    f(a,b)
    f(a,b)
    f(a,b)
    f(a,b) mass storage
    (S3, GCF, ECS, HDFS, …)
    event ingress
    event egress
    f(a,b)
    snapshot
    state

    View full-size slide

  26. © 2019 Ververica
    26
    Stateful Functions
    f(a,b)
    f(a,b)
    f(a,b)
    f(a,b)
    f(a,b) mass storage
    (S3, GCF, ECS, HDFS, …)
    event ingress
    event egress
    f(a,b)
    snapshot
    state
    Event ingresses supply events that trigger functions

    View full-size slide

  27. © 2019 Ververica
    27
    Stateful Functions
    f(a,b)
    f(a,b)
    f(a,b)
    f(a,b)
    f(a,b) mass storage
    (S3, GCF, ECS, HDFS, …)
    event ingress
    event egress
    f(a,b)
    snapshot
    state
    Multiple functions send event to each other
    Arbitrary addressing, no restriction to DAG

    View full-size slide

  28. © 2019 Ververica
    28
    Stateful Functions
    f(a,b)
    f(a,b)
    f(a,b)
    f(a,b)
    f(a,b) mass storage
    (S3, GCF, ECS, HDFS, …)
    event ingress
    event egress
    f(a,b)
    snapshot
    state
    Functions have locally embedded state

    View full-size slide

  29. © 2019 Ververica
    29
    Stateful Functions
    f(a,b)
    f(a,b)
    f(a,b)
    f(a,b)
    f(a,b) mass storage
    (S3, GCF, ECS, HDFS, …)
    event ingress
    event egress
    f(a,b)
    snapshot
    state
    State and messaging are consistent
    with exactly-once semantics

    View full-size slide

  30. © 2019 Ververica
    30
    Stateful Functions
    f(a,b)
    f(a,b)
    f(a,b)
    f(a,b)
    f(a,b) mass storage
    (S3, GCF, ECS, HDFS, …)
    event ingress
    event egress
    f(a,b)
    snapshot
    state
    No database required
    All persistence goes directly to blob storage

    View full-size slide

  31. © 2019 Ververica
    31
    Stateful Functions
    f(a,b)
    f(a,b)
    f(a,b)
    f(a,b)
    f(a,b) mass storage
    (S3, GCF, ECS, HDFS, …)
    event ingress
    event egress
    f(a,b)
    snapshot
    state
    Event egresses to respond via event streams

    View full-size slide

  32. © 2019 Ververica
    32
    Logical/Virtual Instances
    A
    F
    C
    memory
    secondary
    storage
    Shard 1
    G H I
    B
    function virtual instance
    Shard 2
    D E
    K L M N

    View full-size slide

  33. © 2019 Ververica
    33
    Logical/Virtual Instances
    A
    F
    C
    Shard 1
    G H I
    B
    Shard 2
    D E
    K L M N
    message to "K"
    load "K"
    possibly evict other
    K.invoke(message)

    View full-size slide

  34. © 2019 Ververica
    34
    Apache Flink is the State and Event Streaming Fabric
    Ingress
    & Router
    Function
    Dispatcher
    Ingress
    & Router
    Function
    Dispatcher
    Feedback
    Operator
    Feedback
    Operator
    Egress
    Egress
    (keyBy) (keyBy)
    (side output)
    (loop)
    Apache Flink Dataflow Graph
    Conceptual Dataflow
    Ingress/
    Router
    Functions
    Ingress/
    Router
    Functions
    Egress
    Egress

    View full-size slide

  35. © 2019 Ververica
    35
    Running Stateful Functions on Apache Flink
    Exactly-once checkpointing for
    streaming loops
    Function
    Dispatcher
    Feedback
    Operator
    loop feedback

    View full-size slide

  36. © 2019 Ververica
    36
    Example: Ride Sharing App
    Driver status
    updates
    Passenger
    ride requests
    Ride
    status update
    Driver
    Ride
    Pass-
    enger
    Geo-
    index
    update create
    bill
    Inform /
    book
    bid
    lookup
    update cell
    seeking
    confirmed
    riding
    free
    bidding
    booked

    View full-size slide

  37. © 2019 Ververica
    37
    data preparation
    combining knowledge/information
    filtering, enriching,
    aggregating, joining events
    coordination,
    (interacting) state machines
    complex event/state
    interactions
    “occasional” actions or
    spiky loads
    compute-intensive
    or blocking
    Stream Processing
    Streaming SQL
    Stateful Functions F-a-a-S
    f(a,b)
    f(a,b)
    f(a,b)
    λ
    λ
    λ
    λ
    state-centric
    event/stream-centric stateless / compute-centric

    View full-size slide

  38. © 2019 Ververica
    38
    Putting it all together: Ridesharing again
    f(a,b)
    f(a,b)
    f(a,b)
    λ
    λ
    λ
    λ
    FaaS
    render map/route image
    create a receipt PDF
    send email
    Stateful Functions
    ride life-cycle
    driver-to-ride matching
    Stream Processing
    traffic models
    demand forecast & pricing
    Billing
    Passenger updates
    Driver position updates
    Driver status updates

    View full-size slide

  39. © 2019 Ververica
    39
    Is Stateful Functions part of Apache Flink?
    Fully Open Source
    on Ververica’s GitHub
    under ASL 2
    Propose contribution
    for Apache Flink
    (Flink Improvement Proposal)
    Community discussion
    about project proposal
    Upon acceptance,
    handover to the
    Flink project
    Not yet, but we would like it to be!

    View full-size slide

  40. © 2019 Ververica
    40
    The Megastars behind the
    Stateful Functions Project
    Daryl, Robert, Ufuk, Konstantin, Holger, Olivia, Markos, Enrico, Charles, Jamie G.,
    Thomas, Greg, Jamie C., Ricky, …
    And a big “Thank you!” to everyone who helped and tried it out!

    View full-size slide

  41. © 2019 Ververica
    41
    Learn more at
    Technical deep-dive session
    • https://statefun.io/
    • https://github.com/ververica/stateful-functions/
    • https://ververica.com/blog/

    View full-size slide

  42. © 2019 Ververica
    Enjoy the conference!

    View full-size slide