Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to stream processing with Apache Flink

Introduction to stream processing with Apache Flink

After a quick description of event streams, and stream processing, this presentation moves to an introduction of Apache Flink :
- basic architecture
- sample code
- windowing and time concepts
- complex event processing CEP
- streaming analytics with Flink SQL

Tugdual Grall

June 29, 2017
Tweet

More Decks by Tugdual Grall

Other Decks in Technology

Transcript

  1. © 2017 MapR Technologies
    MapR Confidential 1
    Introduction to
    Stream Processing
    with Apache Flink
    Tugdual Grall
    @tgrall

    View Slide

  2. © 2017 MapR Technologies
    @tgrall
    {“about” : “me”}
    Tugdual “Tug” Grall
    • MapR : Technical Evangelist
    • MongoDB, Couchbase, eXo, Oracle
    • NantesJUG co-founder

    • @tgrall
    • http://tgrall.github.io
    [email protected] / [email protected]

    View Slide

  3. © 2017 MapR Technologies
    @tgrall 3
    Open Source Engines & Tools Commercial Engines & Applications
    Utility-Grade Platform Services
    Data Processing
    Web-Scale Storage
    MapR-FS MapR-DB
    Search and
    Others
    Real Time Unified Security Multi-tenancy Disaster Recovery Global Namespace
    High Availability
    MapR Streams
    Cloud and
    Managed
    Services
    Search and
    Others
    Unified Management and Monitoring
    Search and
    Others
    Event Streaming
    Database
    Custom
    Apps
    MapR Converged Data Platform

    View Slide

  4. © 2017 MapR Technologies
    @tgrall
    Streaming
    Streaming technology is enabling the obvious:
    continuous processing on data
    that is continuously produced
    Hint: you already have streaming data

    View Slide

  5. © 2017 MapR Technologies
    @tgrall
    Decoupling
    App B
    App A
    App C
    State managed centralized
    App B
    App A
    App C
    Applications build their own state

    View Slide

  6. © 2017 MapR Technologies
    @tgrall
    Event
    Stream
    =
    Data
    Pipelines

    View Slide

  7. © 2017 MapR Technologies
    @tgrall
    Streaming and Batch
    2016-3-1

    12:00 am
    2016-3-1

    1:00 am
    2016-3-1

    2:00 am
    2016-3-11

    11:00pm
    2016-3-12

    12:00am
    2016-3-12

    1:00am
    2016-3-11

    10:00pm
    2016-3-12

    2:00am
    2016-3-12

    3:00am

    partition
    partition

    View Slide

  8. © 2017 MapR Technologies
    @tgrall
    Streaming and Batch
    2016-3-1

    12:00 am
    2016-3-1

    1:00 am
    2016-3-1

    2:00 am
    2016-3-11

    11:00pm
    2016-3-12

    12:00am
    2016-3-12

    1:00am
    2016-3-11

    10:00pm
    2016-3-12

    2:00am
    2016-3-12

    3:00am

    partition
    partition
    Stream (low latency)
    Stream (high latency)

    View Slide

  9. © 2017 MapR Technologies
    @tgrall
    Streaming and Batch
    2016-3-1

    12:00 am
    2016-3-1

    1:00 am
    2016-3-1

    2:00 am
    2016-3-11

    11:00pm
    2016-3-12

    12:00am
    2016-3-12

    1:00am
    2016-3-11

    10:00pm
    2016-3-12

    2:00am
    2016-3-12

    3:00am

    partition
    partition
    Stream (low latency)
    Batch
    (bounded stream)
    Stream (high latency)

    View Slide

  10. © 2017 MapR Technologies
    @tgrall
    Processing
    • Request / Response

    View Slide

  11. © 2017 MapR Technologies
    @tgrall
    Processing
    • Request / Response
    • Batch

    View Slide

  12. © 2017 MapR Technologies
    @tgrall
    Processing
    • Request / Response
    • Batch
    • Stream Processing

    View Slide

  13. © 2017 MapR Technologies
    @tgrall
    Processing
    • Request / Response
    • Batch
    • Stream Processing
    • Real-time reaction to events
    • Continuous applications
    • Process both real-time and historical data

    View Slide

  14. © 2017 MapR Technologies
    @tgrall

    View Slide

  15. © 2017 MapR Technologies
    @tgrall
    Flink Architecture

    View Slide

  16. © 2017 MapR Technologies
    @tgrall
    Flink Architecture
    Deployment
    Local Cluster Cloud
    Single JVM Standalone, YARN, Mesos AWS, Google

    View Slide

  17. © 2017 MapR Technologies
    @tgrall
    Flink Architecture
    Deployment
    Local Cluster Cloud
    Single JVM Standalone, YARN, Mesos AWS, Google
    Core
    Runtime
    Distributed Streaming Dataflow

    View Slide

  18. © 2017 MapR Technologies
    @tgrall
    Flink Architecture
    Deployment
    Local Cluster Cloud
    Single JVM Standalone, YARN, Mesos AWS, Google
    Core
    Runtime
    Distributed Streaming Dataflow
    DataSet API
    Batch Processing
    API
    &
    Libraries

    View Slide

  19. © 2017 MapR Technologies
    @tgrall
    Flink Architecture
    Deployment
    Local Cluster Cloud
    Single JVM Standalone, YARN, Mesos AWS, Google
    Core
    Runtime
    Distributed Streaming Dataflow
    DataSet API
    Batch Processing
    API
    &
    Libraries
    FlinkML
    Machine Learning
    Gelly
    Graph Processing
    Table
    Relational

    View Slide

  20. © 2017 MapR Technologies
    @tgrall
    Flink Architecture
    Deployment
    Local Cluster Cloud
    Single JVM Standalone, YARN, Mesos AWS, Google
    Core
    Runtime
    Distributed Streaming Dataflow
    DataSet API
    Batch Processing
    DataStream API
    Stream Processing
    API
    &
    Libraries
    FlinkML
    Machine Learning
    Gelly
    Graph Processing
    Table
    Relational

    View Slide

  21. © 2017 MapR Technologies
    @tgrall
    Flink Architecture
    Deployment
    Local Cluster Cloud
    Single JVM Standalone, YARN, Mesos AWS, Google
    Core
    Runtime
    Distributed Streaming Dataflow
    DataSet API
    Batch Processing
    DataStream API
    Stream Processing
    API
    &
    Libraries
    FlinkML
    Machine Learning
    Gelly
    Graph Processing
    Table
    Relational
    CEP
    Event Processing
    Table
    Relational

    View Slide

  22. © 2017 MapR Technologies
    @tgrall
    Demonstration
    Flink Basics

    View Slide

  23. © 2017 MapR Technologies
    @tgrall
    Batch & Stream
    case class Word (word: String, frequency: Int)
    // DataSet API - Batch
    val lines: DataSet[String] = env.readTextFile(…)
    lines.flatMap {line => line.split(“ ”).map(word => Word(word,1))}
    .groupBy("word").sum("frequency")
    .print()
    // DataStream API - Streaming
    val lines: DataSream[String] = env.fromSocketStream(...)
    lines.flatMap {line => line.split(“ ”).map(word => Word(word,1))}
    .keyBy("word”).window(Time.of(5,SECONDS))
    .every(Time.of(1,SECONDS)).sum(”frequency")
    .print()

    View Slide

  24. © 2017 MapR Technologies
    @tgrall
    Steam Processing
    Source
    Filter /

    Transform
    Sink

    View Slide

  25. © 2017 MapR Technologies
    @tgrall
    Flink Ecosystem
    Source Sink
    Apache Kafka
    MapR Streams
    AWS Kinesis
    RabbitMQ
    Twitter
    Apache Bahir

    Apache Kafka
    MapR Streams
    AWS Kinesis
    RabbitMQ
    Elasticsearch
    HDFS/MapR-FS

    View Slide

  26. © 2017 MapR Technologies
    @tgrall
    Stateful Steam Processing
    Source
    Filter /

    Transform
    State

    read/write
    Sink

    View Slide

  27. © 2017 MapR Technologies
    @tgrall
    Is Flink used?

    View Slide

  28. © 2017 MapR Technologies
    @tgrall
    Powered by Flink

    View Slide

  29. © 2017 MapR Technologies
    @tgrall
    10 Billion events/day
    2Tb of data/day
    30 Applications
    2Pb of storage and growing
    Source Bouyges Telecom : http://berlin.flink-forward.org/wp-content/uploads/2016/07/Thomas-Lamirault_Mohamed-Amine-Abdessemed-A-brief-history-of-time-with-Apache-Flink.pdf

    View Slide

  30. © 2017 MapR Technologies
    @tgrall
    Stream Processing
    Windowing

    View Slide

  31. © 2017 MapR Technologies
    @tgrall
    Stream Windows

    View Slide

  32. © 2017 MapR Technologies
    @tgrall
    Stream Windows

    View Slide

  33. © 2017 MapR Technologies
    @tgrall
    Stream Windows

    View Slide

  34. © 2017 MapR Technologies
    @tgrall
    Stream Windows

    View Slide

  35. © 2017 MapR Technologies
    @tgrall
    Stream Windows

    View Slide

  36. © 2017 MapR Technologies
    @tgrall
    Demonstration
    Flink Windowing

    View Slide

  37. © 2017 MapR Technologies
    @tgrall
    What about it ?
    What about it ?
    Time

    View Slide

  38. © 2017 MapR Technologies
    @tgrall
    Time in Flink
    • Multiple notion of “Time” in Flink
    • Event Time
    • Ingestion Time
    • Processing Time

    View Slide

  39. © 2017 MapR Technologies
    @tgrall
    What Is Event-Time Processing
    1977 1980 1983 1999 2002 2005 2015
    Processing Time
    Episode

    IV
    Episode

    V
    Episode

    VI
    Episode

    I
    Episode

    II
    Episode

    III
    Episode

    VII
    Event Time

    View Slide

  40. © 2017 MapR Technologies
    @tgrall
    Time in Flink

    View Slide

  41. © 2017 MapR Technologies
    @tgrall
    Complex Event Processing

    View Slide

  42. © 2017 MapR Technologies
    @tgrall
    Complex Event Processing
    • Analyzing a stream of events and drawing conclusions
    • “if A and then B ! infer event C”
    • Demanding requirements on stream processor
    • Low latency!
    • Exactly-once semantics & event-time support

    View Slide

  43. © 2017 MapR Technologies
    @tgrall
    Use Case

    View Slide

  44. © 2017 MapR Technologies
    @tgrall
    Order Events
    Process is reflected in a stream of order events
    Order(orderId, tStamp, “received”)
    Shipment(orderId, tStamp, “shipped”)
    Delivery(orderId, tStamp, “delivered”)
    orderId: Identifies the order
    tStamp: Time at which the event happened

    View Slide

  45. © 2017 MapR Technologies
    @tgrall
    Real-time Warnings

    View Slide

  46. © 2017 MapR Technologies
    @tgrall
    CEP to the Rescue
    Define processing and delivery intervals (SLAs)
    ProcessSucc(orderId, tStamp, duration)
    ProcessWarn(orderId, tStamp)
    DeliverySucc(orderId, tStamp, duration)
    DeliveryWarn(orderId, tStamp)
    orderId: Identifies the order
    tStamp: Time when the event happened
    duration: Duration of the processing/delivery

    View Slide

  47. © 2017 MapR Technologies
    @tgrall
    CEP Example

    View Slide

  48. © 2017 MapR Technologies
    @tgrall
    Processing: Order ! Shipment

    View Slide

  49. © 2017 MapR Technologies
    @tgrall
    Processing: Order ! Shipment
    val processingPattern = Pattern
    .begin[Event]("received").subtype(classOf[Order])
    .followedBy("shipped").where(_.status == "shipped")
    .within(Time.hours(1))

    View Slide

  50. © 2017 MapR Technologies
    @tgrall
    val processingPattern = Pattern
    .begin[Event]("received").subtype(classOf[Order])
    .followedBy("shipped").where(_.status == "shipped")
    .within(Time.hours(1))
    val processingPatternStream = CEP.pattern(
    input.keyBy("orderId"),
    processingPattern)
    Processing: Order ! Shipment

    View Slide

  51. © 2017 MapR Technologies
    @tgrall
    val processingPattern = Pattern
    .begin[Event]("received").subtype(classOf[Order])
    .followedBy("shipped").where(_.status == "shipped")
    .within(Time.hours(1))
    val processingPatternStream = CEP.pattern(
    input.keyBy("orderId"),
    processingPattern)
    val procResult: DataStream[Either[ProcessWarn, ProcessSucc]] =
    processingPatternStream.select {
    (pP, timestamp) => // Timeout handler
    ProcessWarn(pP("received").orderId, timestamp)
    } {
    fP => // Select function
    ProcessSucc(
    fP("received").orderId, fP("shipped").tStamp,
    fP("shipped").tStamp – fP("received").tStamp)
    }
    Processing: Order ! Shipment

    View Slide

  52. © 2017 MapR Technologies
    @tgrall
    Count Delayed Shipments

    View Slide

  53. © 2017 MapR Technologies
    @tgrall
    Compute Avg Processing Time

    View Slide

  54. © 2017 MapR Technologies
    @tgrall
    Demonstration
    Streaming Analytics

    View Slide

  55. © 2017 MapR Technologies
    @tgrall
    Demonstration
    • https://github.com/mapr-demos/mapr-streams-flink-demo
    • https://github.com/mapr-demos/wifi-sensor-demo
    • http://tgrall.github.io/blog/2016/10/12/getting-started-with-
    apache-flink-and-kafka/
    • http://tgrall.github.io/blog/2016/10/17/getting-started-with-
    apache-flink-and-mapr-streams/
    • more soon….

    View Slide

  56. © 2017 MapR Technologies
    @tgrall
    Kostas Tzoumas
    Stephan Ewen
    Fabian Hueske
    Till Rohrmann
    Jamie Grier
    Thanks to

    View Slide

  57. © 2017 MapR Technologies
    @tgrall
    Streaming Architecture
    http://mapr.com/ebooks/
    Free ebooks & Online training
    http://mapr.com/training/

    View Slide

  58. © 2017 MapR Technologies
    MapR Confidential 58
    Stream Processing with Apache Flink
    Tugdual Grall
    @tgrall

    View Slide