Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Streaming with Apache Flink

Introduction to Streaming with Apache Flink

After a quick description of event streams, and stream processing, this presentation moves to an introduction of Apache Flink :
- basic architecture
- sample code
- windowing and time concepts
- complex event processing CEP

This presentation was delivered during Devoxx France 2017

Aab9ac774f61c5d9bf143b5a1bfe901b?s=128

Tugdual Grall

April 06, 2017
Tweet

More Decks by Tugdual Grall

Other Decks in Technology

Transcript

  1. #DevoxxFR Stream Processing with Apache Flink Tugdual “Tug” Grall Technical

    Evangelist @ MapR tug@mapr.com @tgrall 1
  2. #DevoxxFR {“about” : “me”} 2 Tugdual “Tug” Grall • MapR

    : Technical Evangelist • MongoDB, Couchbase, eXo, Oracle • NantesJUG co-founder
 • @tgrall • http://tgrall.github.io • tug@mapr.com / tugdual@gmail.com
  3. #DevoxxFR 3 Open Source Engines & Tools Commercial Engines &

    Applications Enterprise-Grade Platform Services Data Processing Web-Scale Storage MapR-FS MapR-DB Search and Others Real Time Unified Security Multi-tenancy Disaster Recovery Global Namespace High Availability MapR Streams Cloud and Managed Services Search and Others Unified Management and Monitoring Search and Others Event Streaming Database Custom Apps HDFS API POSIX, NFS HBase API JSON API Kafka API MapR Converged Data Platform
  4. #DevoxxFR 4 Streaming technology is enabling the obvious: continuous processing

    on data that is continuously produced Hint: you already have streaming data
  5. #DevoxxFR Decoupling 5 App B App A App C State

    managed centralized App B App A App C Applications build their own state
  6. #DevoxxFR 6 Event Stream = Data Pipelines

  7. #DevoxxFR Streaming and Batch 7 2016-3-1
 12:00 am 2016-3-1
 1:00

    am 2016-3-1
 2:00 am 2016-3-11
 11:00pm 2016-3-12
 12:00am 2016-3-12
 1:00am 2016-3-11
 10:00pm 2016-3-12
 2:00am 2016-3-12
 3:00am … partition partition
  8. #DevoxxFR Streaming and Batch 8 2016-3-1
 12:00 am 2016-3-1
 1:00

    am 2016-3-1
 2:00 am 2016-3-11
 11:00pm 2016-3-12
 12:00am 2016-3-12
 1:00am 2016-3-11
 10:00pm 2016-3-12
 2:00am 2016-3-12
 3:00am … partition partition Stream (low latency) Stream (high latency)
  9. #DevoxxFR Streaming and Batch 9 2016-3-1
 12:00 am 2016-3-1
 1:00

    am 2016-3-1
 2:00 am 2016-3-11
 11:00pm 2016-3-12
 12:00am 2016-3-12
 1:00am 2016-3-11
 10:00pm 2016-3-12
 2:00am 2016-3-12
 3:00am … partition partition Stream (low latency) Batch (bounded stream) Stream (high latency)
  10. #DevoxxFR Processing 10 • Request / Response

  11. #DevoxxFR Processing 11 • Request / Response • Batch

  12. #DevoxxFR Processing 12 • Request / Response • Batch •

    Stream Processing
  13. #DevoxxFR Processing 13 • Request / Response • Batch •

    Stream Processing • Real-time reaction to events • Continuous applications • Process both real-time and historical data
  14. #DevoxxFR 14

  15. #DevoxxFR Flink Architecture 15

  16. #DevoxxFR Flink Architecture 16 Deployment Local Cluster Cloud Single JVM

    Standalone, YARN, Mesos AWS, Google
  17. #DevoxxFR Flink Architecture 17 Deployment Local Cluster Cloud Single JVM

    Standalone, YARN, Mesos AWS, Google Core Runtime Distributed Streaming Dataflow
  18. #DevoxxFR 18 Deployment Local Cluster Cloud Single JVM Standalone, YARN,

    Mesos AWS, Google Core Runtime Distributed Streaming Dataflow DataSet API Batch Processing API & Libraries
  19. #DevoxxFR Flink Architecture 19 Deployment Local Cluster Cloud Single JVM

    Standalone, YARN, Mesos AWS, Google Core Runtime Distributed Streaming Dataflow DataSet API Batch Processing API & Libraries FlinkML Machine Learning Gelly Graph Processing Table Relational
  20. #DevoxxFR Flink Architecture 20 Deployment Local Cluster Cloud Single JVM

    Standalone, YARN, Mesos AWS, Google Core Runtime Distributed Streaming Dataflow DataSet API Batch Processing DataStream API Stream Processing API & Libraries FlinkML Machine Learning Gelly Graph Processing Table Relational
  21. #DevoxxFR Flink Architecture 21 Deployment Local Cluster Cloud Single JVM

    Standalone, YARN, Mesos AWS, Google Core Runtime Distributed Streaming Dataflow DataSet API Batch Processing DataStream API Stream Processing API & Libraries FlinkML Machine Learning Gelly Graph Processing Table Relational CEP Event Processing Table Relational
  22. #DevoxxFR 22 Demonstration Flink Basics

  23. #DevoxxFR Batch & Stream 23 case class Word (word: String,

    frequency: Int) // DataSet API - Batch val lines: DataSet[String] = env.readTextFile(…) lines.flatMap {line => line.split(“ ”).map(word => Word(word,1))} .groupBy("word").sum("frequency") .print() // DataStream API - Streaming val lines: DataSream[String] = env.fromSocketStream(...) lines.flatMap {line => line.split(“ ”).map(word => Word(word,1))} .keyBy("word”).window(Time.of(5,SECONDS)) .every(Time.of(1,SECONDS)).sum(”frequency") .print()
  24. #DevoxxFR Steam Processing 24 Source Filter /
 Transform Sink

  25. #DevoxxFR Flink Ecosystem 25 Source Sink Apache Kafka MapR Streams

    AWS Kinesis RabbitMQ Twitter Apache Bahir … Apache Kafka MapR Streams AWS Kinesis RabbitMQ Elasticsearch HDFS/MapR-FS …
  26. #DevoxxFR Stateful Steam Processing 26 Source Filter /
 Transform State


    read/write Sink
  27. #DevoxxFR 27 Is Flink used?

  28. #DevoxxFR Powered by Flink 28

  29. #DevoxxFR 29 10 Billion events/day 2Tb of data/day 30 Applications

    2Pb of storage and growing Source Bouyges Telecom : http://berlin.flink-forward.org/wp-content/uploads/2016/07/Thomas-Lamirault_Mohamed-Amine-Abdessemed-A-brief-history-of-time-with-Apache-Flink.pdf
  30. #DevoxxFR 30 Stream Processing Windowing

  31. #DevoxxFR Stream Windows 31

  32. #DevoxxFR Stream Windows 32

  33. #DevoxxFR Stream Windows 33

  34. #DevoxxFR Stream Windows 34

  35. #DevoxxFR Stream Windows 35

  36. #DevoxxFR 36 Demonstration Flink Windowing

  37. #DevoxxFR 37 Time What about it ?

  38. #DevoxxFR Demonstration 38 • Multiple notion of “Time” in Flink

    • Event Time • Ingestion Time • Processing Time
  39. #DevoxxFR What Is Event-Time Processing 39 1977 1980 1983 1999

    2002 2005 2015 Processing Time Episode
 IV Episode
 V Episode
 VI Episode
 I Episode
 II Episode
 III Episode
 VII Event Time
  40. #DevoxxFR Time in Flink 40

  41. #DevoxxFR 41 Complex Event Processing

  42. #DevoxxFR Complex Event Processing 42 • Analyzing a stream of

    events and drawing conclusions • “if A and then B ! infer event C” • Demanding requirements on stream processor • Low latency! • Exactly-once semantics & event-time support
  43. #DevoxxFR Stream Windows 43

  44. #DevoxxFR Order Events 44 Process is reflected in a stream

    of order events Order(orderId, tStamp, “received”) Shipment(orderId, tStamp, “shipped”) Delivery(orderId, tStamp, “delivered”) orderId: Identifies the order tStamp: Time at which the event happened
  45. #DevoxxFR Real-time Warnings 45

  46. #DevoxxFR CEP to the Rescue 46 Define processing and delivery

    intervals (SLAs) ProcessSucc(orderId, tStamp, duration) ProcessWarn(orderId, tStamp) DeliverySucc(orderId, tStamp, duration) DeliveryWarn(orderId, tStamp) orderId: Identifies the order tStamp: Time when the event happened duration: Duration of the processing/delivery
  47. #DevoxxFR CEP Example 47

  48. #DevoxxFR Processing: Order ! Shipment 48

  49. #DevoxxFR 49 Processing: Order ! Shipment val processingPattern = Pattern

    .begin[Event]("received").subtype(classOf[Order]) .followedBy("shipped").where(_.status == "shipped") .within(Time.hours(1))
  50. #DevoxxFR 50 val processingPattern = Pattern .begin[Event]("received").subtype(classOf[Order]) .followedBy("shipped").where(_.status == "shipped")

    .within(Time.hours(1)) val processingPatternStream = CEP.pattern( input.keyBy("orderId"), processingPattern) Processing: Order ! Shipment
  51. #DevoxxFR 51 val processingPattern = Pattern .begin[Event]("received").subtype(classOf[Order]) .followedBy("shipped").where(_.status == "shipped")

    .within(Time.hours(1)) val processingPatternStream = CEP.pattern( input.keyBy("orderId"), processingPattern) val procResult: DataStream[Either[ProcessWarn, ProcessSucc]] = processingPatternStream.select { (pP, timestamp) => // Timeout handler ProcessWarn(pP("received").orderId, timestamp) } { fP => // Select function ProcessSucc( fP("received").orderId, fP("shipped").tStamp, fP("shipped").tStamp – fP("received").tStamp) } Processing: Order ! Shipment
  52. #DevoxxFR Count Delayed Shipments 52

  53. #DevoxxFR Compute Avg Processing Time 53

  54. #DevoxxFR The End 54 • Process events in real time

    and/or batch • Complex Event Processing (CEP) • Many other things to discover • Deployment • High Availability • Table/Relational API • … https://mapr.com/ebooks/
  55. #DevoxxFR 55 Flink Community & Thanks to Kostas Tzoumas Stephan

    Ewen Fabian Hueske Till Rohrmann Jamie Grier
  56. #DevoxxFR Stream Processing with Apache Flink Tugdual “Tug” Grall Technical

    Evangelist @ MapR tug@mapr.com @tgrall 56