Link
Embed
Share
Beginning
This slide
Copy link URL
Copy link URL
Copy iframe embed code
Copy iframe embed code
Copy javascript embed code
Copy javascript embed code
Share
Tweet
Share
Tweet
Slide 1
Slide 1 text
Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia
Slide 2
Slide 2 text
Stream Processing?
Slide 3
Slide 3 text
Stream Processing? User Interaction Logs
Slide 4
Slide 4 text
Stream Processing? User Interaction Logs Application Logs
Slide 5
Slide 5 text
Stream Processing? User Interaction Logs Application Logs Sensor Data
Slide 6
Slide 6 text
Stream Processing? User Interaction Logs Application Logs Sensor Data Database Commit Logs
Slide 7
Slide 7 text
Infinite Dataset
Slide 8
Slide 8 text
Producer Stream
Slide 9
Slide 9 text
Producer Stream HDFS
Slide 10
Slide 10 text
Producer Stream HDFS Hive
Slide 11
Slide 11 text
Producer Stream HDFS Hive Big Latency
Slide 12
Slide 12 text
Producer Stream HDFS Real-time service
Slide 13
Slide 13 text
Apache Storm storm.apache.org
Slide 14
Slide 14 text
High-latency & accurate vs. Low-latency & approximation
Slide 15
Slide 15 text
Lambda architecture
Slide 16
Slide 16 text
https://www.oreilly.com/ideas/questioning-the-lambda-architecture
Slide 17
Slide 17 text
Kappa Architecture
Slide 18
Slide 18 text
Use Apache Kafka Durable, scalable, fault-tolerant
Slide 19
Slide 19 text
Producer Kafka Stream Processor
Slide 20
Slide 20 text
No content
Slide 21
Slide 21 text
No content
Slide 22
Slide 22 text
Metrics we want to track Net payout Daily items sold Weekly items sold Order acceptance rate Order preparation speed Item rating
Slide 23
Slide 23 text
Real time
Slide 24
Slide 24 text
Scalable
Slide 25
Slide 25 text
Granular
Slide 26
Slide 26 text
Highly available
Slide 27
Slide 27 text
Order Stream Payment Stream User Rating Stream
Slide 28
Slide 28 text
Order Stream Payment Stream User Rating Stream Stream Processor OLAP
Slide 29
Slide 29 text
samza.apache.org
Slide 30
Slide 30 text
Apache Flink flink.apache.org
Slide 31
Slide 31 text
Everything is a batch vs. Everything is a stream
Slide 32
Slide 32 text
Single JVM Cluster Cloud Runtime DataSet API DataStream API
Slide 33
Slide 33 text
Dataflow graph
Slide 34
Slide 34 text
Source Source Operator Operator Operator Sinc OLAP
Slide 35
Slide 35 text
https://ci.apache.org/projects/flink/flink-docs-release-1.6/concepts/programming-model.html
Slide 36
Slide 36 text
https://ci.apache.org/projects/flink/flink-docs-release-1.6/concepts/programming-model.html
Slide 37
Slide 37 text
https://ci.apache.org/projects/flink/flink-docs-release-1.6/concepts/programming-model.html
Slide 38
Slide 38 text
Flink Program Optimizer Graph Builder Client
Slide 39
Slide 39 text
Flink Program Optimizer Graph Builder Client Job Manager Task Manager Task Manager
Slide 40
Slide 40 text
Flink Program Optimizer Graph Builder Client Job Manager Task Manager Task Manager Snapshot Store
Slide 41
Slide 41 text
Fault tolerant
Slide 42
Slide 42 text
Flink Program Optimizer Graph Builder Client Job Manager Task Manager Task Manager Snapshot Store
Slide 43
Slide 43 text
Lightweight Asynchronous Snapshots for Distributed Dataflows Paris Carbone, Gyula Fóra, Stephan Ewen Seif Haridi Kostas Tzoumas
Slide 44
Slide 44 text
Barrier Msg Msg Barrier Msg Msg Barrier Operator
Slide 45
Slide 45 text
Barrier Msg Msg Barrier Msg Msg Operator Msg Snapshot Store
Slide 46
Slide 46 text
Exactly Once Processing
Slide 47
Slide 47 text
Can handle very large state
Slide 48
Slide 48 text
Flink Program Optimizer Graph Builder Client Job Manager Task Manager Task Manager Snapshot Store
Slide 49
Slide 49 text
Flink Program Optimizer Graph Builder Client Job Manager Task Manager Task Manager Snapshot Store Job Manager Job Manager Zookeeper
Slide 50
Slide 50 text
Flink Program Optimizer Graph Builder Client Job Manager Task Manager Task Manager Snapshot Store Job Manager Job Manager Zookeeper
Slide 51
Slide 51 text
Flink Program Optimizer Graph Builder Client Task Manager Task Manager Snapshot Store Job Manager Job Manager Zookeeper
Slide 52
Slide 52 text
Joining Streams
Slide 53
Slide 53 text
Order Stream User Rating Stream
Slide 54
Slide 54 text
Order Stream User Rating Stream
Slide 55
Slide 55 text
Order Stream User Rating Stream Local Join Local Join
Slide 56
Slide 56 text
Order Stream User Rating Stream Local Join Local Join
Slide 57
Slide 57 text
Apache Flink ● Can join streams ● Fault tolerant ● Exactly Once Processing ● Combines stream and batch processing
Slide 58
Slide 58 text
… but it requires Java/Scala code
Slide 59
Slide 59 text
Scalable, efficient and robust
Slide 60
Slide 60 text
github.com/uber/AthenaX
Slide 61
Slide 61 text
SQL → what data to analyze Flink → how to analyze it
Slide 62
Slide 62 text
No content
Slide 63
Slide 63 text
No content
Slide 64
Slide 64 text
No content
Slide 65
Slide 65 text
No content
Slide 66
Slide 66 text
No content
Slide 67
Slide 67 text
No content
Slide 68
Slide 68 text
No content
Slide 69
Slide 69 text
Resource estimation and auto scaling
Slide 70
Slide 70 text
Monitoring and automatic failure recovery
Slide 71
Slide 71 text
eng.uber.com/athenax
Slide 72
Slide 72 text
Thanks! Nikolay Stoitsev @ Uber
Slide 73
Slide 73 text
No content