Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to stream processing with Apache Flink

Introduction to stream processing with Apache Flink

After a quick description of event streams, and stream processing, this presentation moves to an introduction of Apache Flink :
- basic architecture
- sample code
- windowing and time concepts
- complex event processing CEP
- streaming analytics with Flink SQL

Tugdual Grall

June 29, 2017
Tweet

More Decks by Tugdual Grall

Other Decks in Technology

Transcript

  1. © 2017 MapR Technologies MapR Confidential 1 Introduction to Stream

    Processing with Apache Flink Tugdual Grall @tgrall
  2. © 2017 MapR Technologies @tgrall {“about” : “me”} Tugdual “Tug”

    Grall • MapR : Technical Evangelist • MongoDB, Couchbase, eXo, Oracle • NantesJUG co-founder
 • @tgrall • http://tgrall.github.io • [email protected] / [email protected]
  3. © 2017 MapR Technologies @tgrall 3 Open Source Engines &

    Tools Commercial Engines & Applications Utility-Grade Platform Services Data Processing Web-Scale Storage MapR-FS MapR-DB Search and Others Real Time Unified Security Multi-tenancy Disaster Recovery Global Namespace High Availability MapR Streams Cloud and Managed Services Search and Others Unified Management and Monitoring Search and Others Event Streaming Database Custom Apps MapR Converged Data Platform
  4. © 2017 MapR Technologies @tgrall Streaming Streaming technology is enabling

    the obvious: continuous processing on data that is continuously produced Hint: you already have streaming data
  5. © 2017 MapR Technologies @tgrall Decoupling App B App A

    App C State managed centralized App B App A App C Applications build their own state
  6. © 2017 MapR Technologies @tgrall Streaming and Batch 2016-3-1
 12:00

    am 2016-3-1
 1:00 am 2016-3-1
 2:00 am 2016-3-11
 11:00pm 2016-3-12
 12:00am 2016-3-12
 1:00am 2016-3-11
 10:00pm 2016-3-12
 2:00am 2016-3-12
 3:00am … partition partition
  7. © 2017 MapR Technologies @tgrall Streaming and Batch 2016-3-1
 12:00

    am 2016-3-1
 1:00 am 2016-3-1
 2:00 am 2016-3-11
 11:00pm 2016-3-12
 12:00am 2016-3-12
 1:00am 2016-3-11
 10:00pm 2016-3-12
 2:00am 2016-3-12
 3:00am … partition partition Stream (low latency) Stream (high latency)
  8. © 2017 MapR Technologies @tgrall Streaming and Batch 2016-3-1
 12:00

    am 2016-3-1
 1:00 am 2016-3-1
 2:00 am 2016-3-11
 11:00pm 2016-3-12
 12:00am 2016-3-12
 1:00am 2016-3-11
 10:00pm 2016-3-12
 2:00am 2016-3-12
 3:00am … partition partition Stream (low latency) Batch (bounded stream) Stream (high latency)
  9. © 2017 MapR Technologies @tgrall Processing • Request / Response

    • Batch • Stream Processing • Real-time reaction to events • Continuous applications • Process both real-time and historical data
  10. © 2017 MapR Technologies @tgrall Flink Architecture Deployment Local Cluster

    Cloud Single JVM Standalone, YARN, Mesos AWS, Google
  11. © 2017 MapR Technologies @tgrall Flink Architecture Deployment Local Cluster

    Cloud Single JVM Standalone, YARN, Mesos AWS, Google Core Runtime Distributed Streaming Dataflow
  12. © 2017 MapR Technologies @tgrall Flink Architecture Deployment Local Cluster

    Cloud Single JVM Standalone, YARN, Mesos AWS, Google Core Runtime Distributed Streaming Dataflow DataSet API Batch Processing API & Libraries
  13. © 2017 MapR Technologies @tgrall Flink Architecture Deployment Local Cluster

    Cloud Single JVM Standalone, YARN, Mesos AWS, Google Core Runtime Distributed Streaming Dataflow DataSet API Batch Processing API & Libraries FlinkML Machine Learning Gelly Graph Processing Table Relational
  14. © 2017 MapR Technologies @tgrall Flink Architecture Deployment Local Cluster

    Cloud Single JVM Standalone, YARN, Mesos AWS, Google Core Runtime Distributed Streaming Dataflow DataSet API Batch Processing DataStream API Stream Processing API & Libraries FlinkML Machine Learning Gelly Graph Processing Table Relational
  15. © 2017 MapR Technologies @tgrall Flink Architecture Deployment Local Cluster

    Cloud Single JVM Standalone, YARN, Mesos AWS, Google Core Runtime Distributed Streaming Dataflow DataSet API Batch Processing DataStream API Stream Processing API & Libraries FlinkML Machine Learning Gelly Graph Processing Table Relational CEP Event Processing Table Relational
  16. © 2017 MapR Technologies @tgrall Batch & Stream case class

    Word (word: String, frequency: Int) // DataSet API - Batch val lines: DataSet[String] = env.readTextFile(…) lines.flatMap {line => line.split(“ ”).map(word => Word(word,1))} .groupBy("word").sum("frequency") .print() // DataStream API - Streaming val lines: DataSream[String] = env.fromSocketStream(...) lines.flatMap {line => line.split(“ ”).map(word => Word(word,1))} .keyBy("word”).window(Time.of(5,SECONDS)) .every(Time.of(1,SECONDS)).sum(”frequency") .print()
  17. © 2017 MapR Technologies @tgrall Flink Ecosystem Source Sink Apache

    Kafka MapR Streams AWS Kinesis RabbitMQ Twitter Apache Bahir … Apache Kafka MapR Streams AWS Kinesis RabbitMQ Elasticsearch HDFS/MapR-FS …
  18. © 2017 MapR Technologies @tgrall 10 Billion events/day 2Tb of

    data/day 30 Applications 2Pb of storage and growing Source Bouyges Telecom : http://berlin.flink-forward.org/wp-content/uploads/2016/07/Thomas-Lamirault_Mohamed-Amine-Abdessemed-A-brief-history-of-time-with-Apache-Flink.pdf
  19. © 2017 MapR Technologies @tgrall Time in Flink • Multiple

    notion of “Time” in Flink • Event Time • Ingestion Time • Processing Time
  20. © 2017 MapR Technologies @tgrall What Is Event-Time Processing 1977

    1980 1983 1999 2002 2005 2015 Processing Time Episode
 IV Episode
 V Episode
 VI Episode
 I Episode
 II Episode
 III Episode
 VII Event Time
  21. © 2017 MapR Technologies @tgrall Complex Event Processing • Analyzing

    a stream of events and drawing conclusions • “if A and then B ! infer event C” • Demanding requirements on stream processor • Low latency! • Exactly-once semantics & event-time support
  22. © 2017 MapR Technologies @tgrall Order Events Process is reflected

    in a stream of order events Order(orderId, tStamp, “received”) Shipment(orderId, tStamp, “shipped”) Delivery(orderId, tStamp, “delivered”) orderId: Identifies the order tStamp: Time at which the event happened
  23. © 2017 MapR Technologies @tgrall CEP to the Rescue Define

    processing and delivery intervals (SLAs) ProcessSucc(orderId, tStamp, duration) ProcessWarn(orderId, tStamp) DeliverySucc(orderId, tStamp, duration) DeliveryWarn(orderId, tStamp) orderId: Identifies the order tStamp: Time when the event happened duration: Duration of the processing/delivery
  24. © 2017 MapR Technologies @tgrall Processing: Order ! Shipment val

    processingPattern = Pattern .begin[Event]("received").subtype(classOf[Order]) .followedBy("shipped").where(_.status == "shipped") .within(Time.hours(1))
  25. © 2017 MapR Technologies @tgrall val processingPattern = Pattern .begin[Event]("received").subtype(classOf[Order])

    .followedBy("shipped").where(_.status == "shipped") .within(Time.hours(1)) val processingPatternStream = CEP.pattern( input.keyBy("orderId"), processingPattern) Processing: Order ! Shipment
  26. © 2017 MapR Technologies @tgrall val processingPattern = Pattern .begin[Event]("received").subtype(classOf[Order])

    .followedBy("shipped").where(_.status == "shipped") .within(Time.hours(1)) val processingPatternStream = CEP.pattern( input.keyBy("orderId"), processingPattern) val procResult: DataStream[Either[ProcessWarn, ProcessSucc]] = processingPatternStream.select { (pP, timestamp) => // Timeout handler ProcessWarn(pP("received").orderId, timestamp) } { fP => // Select function ProcessSucc( fP("received").orderId, fP("shipped").tStamp, fP("shipped").tStamp – fP("received").tStamp) } Processing: Order ! Shipment
  27. © 2017 MapR Technologies @tgrall Demonstration • https://github.com/mapr-demos/mapr-streams-flink-demo • https://github.com/mapr-demos/wifi-sensor-demo

    • http://tgrall.github.io/blog/2016/10/12/getting-started-with- apache-flink-and-kafka/ • http://tgrall.github.io/blog/2016/10/17/getting-started-with- apache-flink-and-mapr-streams/ • more soon….