Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Streaming with Apache Flink

Introduction to Streaming with Apache Flink

After a quick description of event streams, and stream processing, this presentation moves to an introduction of Apache Flink :
- basic architecture
- sample code
- windowing and time concepts
- complex event processing CEP

This presentation was delivered during Devoxx France 2017

Tugdual Grall

April 06, 2017
Tweet

More Decks by Tugdual Grall

Other Decks in Technology

Transcript

  1. #DevoxxFR
    Stream Processing with Apache Flink
    Tugdual “Tug” Grall
    Technical Evangelist @ MapR
    [email protected]
    @tgrall
    1

    View Slide

  2. #DevoxxFR
    {“about” : “me”}
    2
    Tugdual “Tug” Grall
    • MapR : Technical Evangelist
    • MongoDB, Couchbase, eXo, Oracle
    • NantesJUG co-founder

    • @tgrall
    • http://tgrall.github.io
    [email protected] / [email protected]

    View Slide

  3. #DevoxxFR 3
    Open Source Engines & Tools Commercial Engines & Applications
    Enterprise-Grade Platform Services
    Data Processing
    Web-Scale Storage
    MapR-FS MapR-DB
    Search and
    Others
    Real Time Unified Security Multi-tenancy Disaster Recovery Global Namespace
    High Availability
    MapR Streams
    Cloud and
    Managed
    Services
    Search and
    Others
    Unified Management and Monitoring
    Search and
    Others
    Event Streaming
    Database
    Custom
    Apps
    HDFS API POSIX, NFS HBase API JSON API Kafka API
    MapR Converged Data Platform

    View Slide

  4. #DevoxxFR 4
    Streaming technology is enabling the obvious:
    continuous processing on data
    that is continuously produced
    Hint: you already have streaming data

    View Slide

  5. #DevoxxFR
    Decoupling
    5
    App B
    App A
    App C
    State managed centralized
    App B
    App A
    App C
    Applications build their own state

    View Slide

  6. #DevoxxFR 6
    Event
    Stream
    =
    Data
    Pipelines

    View Slide

  7. #DevoxxFR
    Streaming and Batch
    7
    2016-3-1

    12:00 am
    2016-3-1

    1:00 am
    2016-3-1

    2:00 am
    2016-3-11

    11:00pm
    2016-3-12

    12:00am
    2016-3-12

    1:00am
    2016-3-11

    10:00pm
    2016-3-12

    2:00am
    2016-3-12

    3:00am

    partition
    partition

    View Slide

  8. #DevoxxFR
    Streaming and Batch
    8
    2016-3-1

    12:00 am
    2016-3-1

    1:00 am
    2016-3-1

    2:00 am
    2016-3-11

    11:00pm
    2016-3-12

    12:00am
    2016-3-12

    1:00am
    2016-3-11

    10:00pm
    2016-3-12

    2:00am
    2016-3-12

    3:00am

    partition
    partition
    Stream (low latency)
    Stream (high latency)

    View Slide

  9. #DevoxxFR
    Streaming and Batch
    9
    2016-3-1

    12:00 am
    2016-3-1

    1:00 am
    2016-3-1

    2:00 am
    2016-3-11

    11:00pm
    2016-3-12

    12:00am
    2016-3-12

    1:00am
    2016-3-11

    10:00pm
    2016-3-12

    2:00am
    2016-3-12

    3:00am

    partition
    partition
    Stream (low latency)
    Batch
    (bounded stream)
    Stream (high latency)

    View Slide

  10. #DevoxxFR
    Processing
    10
    • Request / Response

    View Slide

  11. #DevoxxFR
    Processing
    11
    • Request / Response
    • Batch

    View Slide

  12. #DevoxxFR
    Processing
    12
    • Request / Response
    • Batch
    • Stream Processing

    View Slide

  13. #DevoxxFR
    Processing
    13
    • Request / Response
    • Batch
    • Stream Processing
    • Real-time reaction to events
    • Continuous applications
    • Process both real-time and historical data

    View Slide

  14. #DevoxxFR 14

    View Slide

  15. #DevoxxFR
    Flink Architecture
    15

    View Slide

  16. #DevoxxFR
    Flink Architecture
    16
    Deployment
    Local Cluster Cloud
    Single JVM Standalone, YARN, Mesos AWS, Google

    View Slide

  17. #DevoxxFR
    Flink Architecture
    17
    Deployment
    Local Cluster Cloud
    Single JVM Standalone, YARN, Mesos AWS, Google
    Core
    Runtime
    Distributed Streaming Dataflow

    View Slide

  18. #DevoxxFR 18
    Deployment
    Local Cluster Cloud
    Single JVM Standalone, YARN, Mesos AWS, Google
    Core
    Runtime
    Distributed Streaming Dataflow
    DataSet API
    Batch Processing
    API
    &
    Libraries

    View Slide

  19. #DevoxxFR
    Flink Architecture
    19
    Deployment
    Local Cluster Cloud
    Single JVM Standalone, YARN, Mesos AWS, Google
    Core
    Runtime
    Distributed Streaming Dataflow
    DataSet API
    Batch Processing
    API
    &
    Libraries
    FlinkML
    Machine Learning
    Gelly
    Graph Processing
    Table
    Relational

    View Slide

  20. #DevoxxFR
    Flink Architecture
    20
    Deployment
    Local Cluster Cloud
    Single JVM Standalone, YARN, Mesos AWS, Google
    Core
    Runtime
    Distributed Streaming Dataflow
    DataSet API
    Batch Processing
    DataStream API
    Stream Processing
    API
    &
    Libraries
    FlinkML
    Machine Learning
    Gelly
    Graph Processing
    Table
    Relational

    View Slide

  21. #DevoxxFR
    Flink Architecture
    21
    Deployment
    Local Cluster Cloud
    Single JVM Standalone, YARN, Mesos AWS, Google
    Core
    Runtime
    Distributed Streaming Dataflow
    DataSet API
    Batch Processing
    DataStream API
    Stream Processing
    API
    &
    Libraries
    FlinkML
    Machine Learning
    Gelly
    Graph Processing
    Table
    Relational
    CEP
    Event Processing
    Table
    Relational

    View Slide

  22. #DevoxxFR 22
    Demonstration
    Flink Basics

    View Slide

  23. #DevoxxFR
    Batch & Stream
    23
    case class Word (word: String, frequency: Int)
    // DataSet API - Batch
    val lines: DataSet[String] = env.readTextFile(…)
    lines.flatMap {line => line.split(“ ”).map(word => Word(word,1))}
    .groupBy("word").sum("frequency")
    .print()
    // DataStream API - Streaming
    val lines: DataSream[String] = env.fromSocketStream(...)
    lines.flatMap {line => line.split(“ ”).map(word => Word(word,1))}
    .keyBy("word”).window(Time.of(5,SECONDS))
    .every(Time.of(1,SECONDS)).sum(”frequency")
    .print()

    View Slide

  24. #DevoxxFR
    Steam Processing
    24
    Source
    Filter /

    Transform
    Sink

    View Slide

  25. #DevoxxFR
    Flink Ecosystem
    25
    Source Sink
    Apache Kafka
    MapR Streams
    AWS Kinesis
    RabbitMQ
    Twitter
    Apache Bahir

    Apache Kafka
    MapR Streams
    AWS Kinesis
    RabbitMQ
    Elasticsearch
    HDFS/MapR-FS

    View Slide

  26. #DevoxxFR
    Stateful Steam Processing
    26
    Source
    Filter /

    Transform
    State

    read/write
    Sink

    View Slide

  27. #DevoxxFR 27
    Is Flink used?

    View Slide

  28. #DevoxxFR
    Powered by Flink
    28

    View Slide

  29. #DevoxxFR 29
    10 Billion events/day
    2Tb of data/day
    30 Applications
    2Pb of storage and growing
    Source Bouyges Telecom : http://berlin.flink-forward.org/wp-content/uploads/2016/07/Thomas-Lamirault_Mohamed-Amine-Abdessemed-A-brief-history-of-time-with-Apache-Flink.pdf

    View Slide

  30. #DevoxxFR 30
    Stream Processing
    Windowing

    View Slide

  31. #DevoxxFR
    Stream Windows
    31

    View Slide

  32. #DevoxxFR
    Stream Windows
    32

    View Slide

  33. #DevoxxFR
    Stream Windows
    33

    View Slide

  34. #DevoxxFR
    Stream Windows
    34

    View Slide

  35. #DevoxxFR
    Stream Windows
    35

    View Slide

  36. #DevoxxFR 36
    Demonstration
    Flink Windowing

    View Slide

  37. #DevoxxFR 37
    Time
    What about it ?

    View Slide

  38. #DevoxxFR
    Demonstration
    38
    • Multiple notion of “Time” in Flink
    • Event Time
    • Ingestion Time
    • Processing Time

    View Slide

  39. #DevoxxFR
    What Is Event-Time Processing
    39
    1977 1980 1983 1999 2002 2005 2015
    Processing Time
    Episode

    IV
    Episode

    V
    Episode

    VI
    Episode

    I
    Episode

    II
    Episode

    III
    Episode

    VII
    Event Time

    View Slide

  40. #DevoxxFR
    Time in Flink
    40

    View Slide

  41. #DevoxxFR 41
    Complex Event Processing

    View Slide

  42. #DevoxxFR
    Complex Event Processing
    42
    • Analyzing a stream of events and drawing conclusions
    • “if A and then B ! infer event C”
    • Demanding requirements on stream processor
    • Low latency!
    • Exactly-once semantics & event-time support

    View Slide

  43. #DevoxxFR
    Stream Windows
    43

    View Slide

  44. #DevoxxFR
    Order Events
    44
    Process is reflected in a stream of order events
    Order(orderId, tStamp, “received”)
    Shipment(orderId, tStamp, “shipped”)
    Delivery(orderId, tStamp,
    “delivered”)
    orderId: Identifies the order
    tStamp: Time at which the event happened

    View Slide

  45. #DevoxxFR
    Real-time Warnings
    45

    View Slide

  46. #DevoxxFR
    CEP to the Rescue
    46
    Define processing and delivery intervals (SLAs)
    ProcessSucc(orderId, tStamp, duration)
    ProcessWarn(orderId, tStamp)
    DeliverySucc(orderId, tStamp, duration)
    DeliveryWarn(orderId, tStamp)
    orderId: Identifies the order
    tStamp: Time when the event happened
    duration: Duration of the processing/delivery

    View Slide

  47. #DevoxxFR
    CEP Example
    47

    View Slide

  48. #DevoxxFR
    Processing: Order ! Shipment
    48

    View Slide

  49. #DevoxxFR 49
    Processing: Order ! Shipment
    val processingPattern = Pattern
    .begin[Event]("received").subtype(classOf[Order])
    .followedBy("shipped").where(_.status == "shipped")
    .within(Time.hours(1))

    View Slide

  50. #DevoxxFR 50
    val processingPattern = Pattern
    .begin[Event]("received").subtype(classOf[Order])
    .followedBy("shipped").where(_.status == "shipped")
    .within(Time.hours(1))
    val processingPatternStream = CEP.pattern(
    input.keyBy("orderId"),
    processingPattern)
    Processing: Order ! Shipment

    View Slide

  51. #DevoxxFR 51
    val processingPattern = Pattern
    .begin[Event]("received").subtype(classOf[Order])
    .followedBy("shipped").where(_.status == "shipped")
    .within(Time.hours(1))
    val processingPatternStream = CEP.pattern(
    input.keyBy("orderId"),
    processingPattern)
    val procResult: DataStream[Either[ProcessWarn, ProcessSucc]] =
    processingPatternStream.select {
    (pP, timestamp) => // Timeout handler
    ProcessWarn(pP("received").orderId, timestamp)
    } {
    fP => // Select function
    ProcessSucc(
    fP("received").orderId, fP("shipped").tStamp,
    fP("shipped").tStamp – fP("received").tStamp)
    }
    Processing: Order ! Shipment

    View Slide

  52. #DevoxxFR
    Count Delayed Shipments
    52

    View Slide

  53. #DevoxxFR
    Compute Avg Processing
    Time
    53

    View Slide

  54. #DevoxxFR
    The End
    54
    • Process events in real time and/or batch
    • Complex Event Processing (CEP)
    • Many other things to discover
    • Deployment
    • High Availability
    • Table/Relational API
    • … https://mapr.com/ebooks/

    View Slide

  55. #DevoxxFR 55
    Flink Community
    &
    Thanks to
    Kostas Tzoumas
    Stephan Ewen
    Fabian Hueske
    Till Rohrmann
    Jamie Grier

    View Slide

  56. #DevoxxFR
    Stream Processing with Apache Flink
    Tugdual “Tug” Grall
    Technical Evangelist @ MapR
    [email protected]
    @tgrall
    56

    View Slide