Pro Yearly is on sale from $80 to $50! »

Introduction to stream processing with Apache Flink

Introduction to stream processing with Apache Flink

After a quick description of event streams, and stream processing, this presentation moves to an introduction of Apache Flink :
- basic architecture
- sample code
- windowing and time concepts
- complex event processing CEP
- streaming analytics with Flink SQL

Aab9ac774f61c5d9bf143b5a1bfe901b?s=128

Tugdual Grall

June 29, 2017
Tweet

Transcript

  1. © 2017 MapR Technologies MapR Confidential 1 Introduction to Stream

    Processing with Apache Flink Tugdual Grall @tgrall
  2. © 2017 MapR Technologies @tgrall {“about” : “me”} Tugdual “Tug”

    Grall • MapR : Technical Evangelist • MongoDB, Couchbase, eXo, Oracle • NantesJUG co-founder
 • @tgrall • http://tgrall.github.io • tug@mapr.com / tugdual@gmail.com
  3. © 2017 MapR Technologies @tgrall 3 Open Source Engines &

    Tools Commercial Engines & Applications Utility-Grade Platform Services Data Processing Web-Scale Storage MapR-FS MapR-DB Search and Others Real Time Unified Security Multi-tenancy Disaster Recovery Global Namespace High Availability MapR Streams Cloud and Managed Services Search and Others Unified Management and Monitoring Search and Others Event Streaming Database Custom Apps MapR Converged Data Platform
  4. © 2017 MapR Technologies @tgrall Streaming Streaming technology is enabling

    the obvious: continuous processing on data that is continuously produced Hint: you already have streaming data
  5. © 2017 MapR Technologies @tgrall Decoupling App B App A

    App C State managed centralized App B App A App C Applications build their own state
  6. © 2017 MapR Technologies @tgrall Event Stream = Data Pipelines

  7. © 2017 MapR Technologies @tgrall Streaming and Batch 2016-3-1
 12:00

    am 2016-3-1
 1:00 am 2016-3-1
 2:00 am 2016-3-11
 11:00pm 2016-3-12
 12:00am 2016-3-12
 1:00am 2016-3-11
 10:00pm 2016-3-12
 2:00am 2016-3-12
 3:00am … partition partition
  8. © 2017 MapR Technologies @tgrall Streaming and Batch 2016-3-1
 12:00

    am 2016-3-1
 1:00 am 2016-3-1
 2:00 am 2016-3-11
 11:00pm 2016-3-12
 12:00am 2016-3-12
 1:00am 2016-3-11
 10:00pm 2016-3-12
 2:00am 2016-3-12
 3:00am … partition partition Stream (low latency) Stream (high latency)
  9. © 2017 MapR Technologies @tgrall Streaming and Batch 2016-3-1
 12:00

    am 2016-3-1
 1:00 am 2016-3-1
 2:00 am 2016-3-11
 11:00pm 2016-3-12
 12:00am 2016-3-12
 1:00am 2016-3-11
 10:00pm 2016-3-12
 2:00am 2016-3-12
 3:00am … partition partition Stream (low latency) Batch (bounded stream) Stream (high latency)
  10. © 2017 MapR Technologies @tgrall Processing • Request / Response

  11. © 2017 MapR Technologies @tgrall Processing • Request / Response

    • Batch
  12. © 2017 MapR Technologies @tgrall Processing • Request / Response

    • Batch • Stream Processing
  13. © 2017 MapR Technologies @tgrall Processing • Request / Response

    • Batch • Stream Processing • Real-time reaction to events • Continuous applications • Process both real-time and historical data
  14. © 2017 MapR Technologies @tgrall

  15. © 2017 MapR Technologies @tgrall Flink Architecture

  16. © 2017 MapR Technologies @tgrall Flink Architecture Deployment Local Cluster

    Cloud Single JVM Standalone, YARN, Mesos AWS, Google
  17. © 2017 MapR Technologies @tgrall Flink Architecture Deployment Local Cluster

    Cloud Single JVM Standalone, YARN, Mesos AWS, Google Core Runtime Distributed Streaming Dataflow
  18. © 2017 MapR Technologies @tgrall Flink Architecture Deployment Local Cluster

    Cloud Single JVM Standalone, YARN, Mesos AWS, Google Core Runtime Distributed Streaming Dataflow DataSet API Batch Processing API & Libraries
  19. © 2017 MapR Technologies @tgrall Flink Architecture Deployment Local Cluster

    Cloud Single JVM Standalone, YARN, Mesos AWS, Google Core Runtime Distributed Streaming Dataflow DataSet API Batch Processing API & Libraries FlinkML Machine Learning Gelly Graph Processing Table Relational
  20. © 2017 MapR Technologies @tgrall Flink Architecture Deployment Local Cluster

    Cloud Single JVM Standalone, YARN, Mesos AWS, Google Core Runtime Distributed Streaming Dataflow DataSet API Batch Processing DataStream API Stream Processing API & Libraries FlinkML Machine Learning Gelly Graph Processing Table Relational
  21. © 2017 MapR Technologies @tgrall Flink Architecture Deployment Local Cluster

    Cloud Single JVM Standalone, YARN, Mesos AWS, Google Core Runtime Distributed Streaming Dataflow DataSet API Batch Processing DataStream API Stream Processing API & Libraries FlinkML Machine Learning Gelly Graph Processing Table Relational CEP Event Processing Table Relational
  22. © 2017 MapR Technologies @tgrall Demonstration Flink Basics

  23. © 2017 MapR Technologies @tgrall Batch & Stream case class

    Word (word: String, frequency: Int) // DataSet API - Batch val lines: DataSet[String] = env.readTextFile(…) lines.flatMap {line => line.split(“ ”).map(word => Word(word,1))} .groupBy("word").sum("frequency") .print() // DataStream API - Streaming val lines: DataSream[String] = env.fromSocketStream(...) lines.flatMap {line => line.split(“ ”).map(word => Word(word,1))} .keyBy("word”).window(Time.of(5,SECONDS)) .every(Time.of(1,SECONDS)).sum(”frequency") .print()
  24. © 2017 MapR Technologies @tgrall Steam Processing Source Filter /


    Transform Sink
  25. © 2017 MapR Technologies @tgrall Flink Ecosystem Source Sink Apache

    Kafka MapR Streams AWS Kinesis RabbitMQ Twitter Apache Bahir … Apache Kafka MapR Streams AWS Kinesis RabbitMQ Elasticsearch HDFS/MapR-FS …
  26. © 2017 MapR Technologies @tgrall Stateful Steam Processing Source Filter

    /
 Transform State
 read/write Sink
  27. © 2017 MapR Technologies @tgrall Is Flink used?

  28. © 2017 MapR Technologies @tgrall Powered by Flink

  29. © 2017 MapR Technologies @tgrall 10 Billion events/day 2Tb of

    data/day 30 Applications 2Pb of storage and growing Source Bouyges Telecom : http://berlin.flink-forward.org/wp-content/uploads/2016/07/Thomas-Lamirault_Mohamed-Amine-Abdessemed-A-brief-history-of-time-with-Apache-Flink.pdf
  30. © 2017 MapR Technologies @tgrall Stream Processing Windowing

  31. © 2017 MapR Technologies @tgrall Stream Windows

  32. © 2017 MapR Technologies @tgrall Stream Windows

  33. © 2017 MapR Technologies @tgrall Stream Windows

  34. © 2017 MapR Technologies @tgrall Stream Windows

  35. © 2017 MapR Technologies @tgrall Stream Windows

  36. © 2017 MapR Technologies @tgrall Demonstration Flink Windowing

  37. © 2017 MapR Technologies @tgrall What about it ? What

    about it ? Time
  38. © 2017 MapR Technologies @tgrall Time in Flink • Multiple

    notion of “Time” in Flink • Event Time • Ingestion Time • Processing Time
  39. © 2017 MapR Technologies @tgrall What Is Event-Time Processing 1977

    1980 1983 1999 2002 2005 2015 Processing Time Episode
 IV Episode
 V Episode
 VI Episode
 I Episode
 II Episode
 III Episode
 VII Event Time
  40. © 2017 MapR Technologies @tgrall Time in Flink

  41. © 2017 MapR Technologies @tgrall Complex Event Processing

  42. © 2017 MapR Technologies @tgrall Complex Event Processing • Analyzing

    a stream of events and drawing conclusions • “if A and then B ! infer event C” • Demanding requirements on stream processor • Low latency! • Exactly-once semantics & event-time support
  43. © 2017 MapR Technologies @tgrall Use Case

  44. © 2017 MapR Technologies @tgrall Order Events Process is reflected

    in a stream of order events Order(orderId, tStamp, “received”) Shipment(orderId, tStamp, “shipped”) Delivery(orderId, tStamp, “delivered”) orderId: Identifies the order tStamp: Time at which the event happened
  45. © 2017 MapR Technologies @tgrall Real-time Warnings

  46. © 2017 MapR Technologies @tgrall CEP to the Rescue Define

    processing and delivery intervals (SLAs) ProcessSucc(orderId, tStamp, duration) ProcessWarn(orderId, tStamp) DeliverySucc(orderId, tStamp, duration) DeliveryWarn(orderId, tStamp) orderId: Identifies the order tStamp: Time when the event happened duration: Duration of the processing/delivery
  47. © 2017 MapR Technologies @tgrall CEP Example

  48. © 2017 MapR Technologies @tgrall Processing: Order ! Shipment

  49. © 2017 MapR Technologies @tgrall Processing: Order ! Shipment val

    processingPattern = Pattern .begin[Event]("received").subtype(classOf[Order]) .followedBy("shipped").where(_.status == "shipped") .within(Time.hours(1))
  50. © 2017 MapR Technologies @tgrall val processingPattern = Pattern .begin[Event]("received").subtype(classOf[Order])

    .followedBy("shipped").where(_.status == "shipped") .within(Time.hours(1)) val processingPatternStream = CEP.pattern( input.keyBy("orderId"), processingPattern) Processing: Order ! Shipment
  51. © 2017 MapR Technologies @tgrall val processingPattern = Pattern .begin[Event]("received").subtype(classOf[Order])

    .followedBy("shipped").where(_.status == "shipped") .within(Time.hours(1)) val processingPatternStream = CEP.pattern( input.keyBy("orderId"), processingPattern) val procResult: DataStream[Either[ProcessWarn, ProcessSucc]] = processingPatternStream.select { (pP, timestamp) => // Timeout handler ProcessWarn(pP("received").orderId, timestamp) } { fP => // Select function ProcessSucc( fP("received").orderId, fP("shipped").tStamp, fP("shipped").tStamp – fP("received").tStamp) } Processing: Order ! Shipment
  52. © 2017 MapR Technologies @tgrall Count Delayed Shipments

  53. © 2017 MapR Technologies @tgrall Compute Avg Processing Time

  54. © 2017 MapR Technologies @tgrall Demonstration Streaming Analytics

  55. © 2017 MapR Technologies @tgrall Demonstration • https://github.com/mapr-demos/mapr-streams-flink-demo • https://github.com/mapr-demos/wifi-sensor-demo

    • http://tgrall.github.io/blog/2016/10/12/getting-started-with- apache-flink-and-kafka/ • http://tgrall.github.io/blog/2016/10/17/getting-started-with- apache-flink-and-mapr-streams/ • more soon….
  56. © 2017 MapR Technologies @tgrall Kostas Tzoumas Stephan Ewen Fabian

    Hueske Till Rohrmann Jamie Grier Thanks to
  57. © 2017 MapR Technologies @tgrall Streaming Architecture http://mapr.com/ebooks/ Free ebooks

    & Online training http://mapr.com/training/
  58. © 2017 MapR Technologies MapR Confidential 58 Stream Processing with

    Apache Flink Tugdual Grall @tgrall