Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Stream Processing with Apache Flink
Search
Kristian Kottke
September 13, 2018
Programming
0
160
Stream Processing with Apache Flink
Kristian Kottke
September 13, 2018
Tweet
Share
More Decks by Kristian Kottke
See All by Kristian Kottke
Jeder wie er will, aber so nicht
kkottke
0
36
Turmbau_zu_Babel.pdf
kkottke
0
110
Reactive Microservices based on Vert.x
kkottke
0
220
Graph Processing using Apache Flink
kkottke
0
110
Other Decks in Programming
See All in Programming
Range on Rails ―「多重範囲型」という新たな選択肢が、複雑ロジックを劇的にシンプルにしたワケ
rizap_tech
0
6.7k
Six and a half ridiculous things to do with Quarkus
hollycummins
0
180
CSC509 Lecture 06
javiergs
PRO
0
260
バッチ処理を「状態の記録」から「事実の記録」へ
panda728
PRO
0
160
Railsだからできる 例外業務に禍根を残さない 設定設計パターン
ei_ei_eiichi
0
930
monorepo の Go テストをはやくした〜い!~最小の依存解決への道のり~ / faster-testing-of-monorepos
convto
2
500
AIと人間の共創開発!OSSで試行錯誤した開発スタイル
mae616
1
650
Leading Effective Engineering Teams in the AI Era
addyosmani
7
460
bootcamp2025_バックエンド研修_WebAPIサーバ作成.pdf
geniee_inc
0
110
なぜあの開発者はDevRelに伴走し続けるのか / Why Does That Developer Keep Running Alongside DevRel?
nrslib
3
410
開発生産性を上げるための生成AI活用術
starfish719
3
1.2k
大規模アプリのDIフレームワーク刷新戦略 ~過去最大規模の並行開発を止めずにアプリ全体に導入するまで~
mot_techtalk
1
460
Featured
See All Featured
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
49
3.1k
GitHub's CSS Performance
jonrohan
1032
470k
How to Ace a Technical Interview
jacobian
280
24k
Docker and Python
trallard
46
3.6k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
508
140k
Gamification - CAS2011
davidbonilla
81
5.5k
How STYLIGHT went responsive
nonsquared
100
5.8k
YesSQL, Process and Tooling at Scale
rocio
173
14k
Done Done
chrislema
185
16k
Agile that works and the tools we love
rasmusluckow
331
21k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
35
3.2k
Speed Design
sergeychernyshev
32
1.2k
Transcript
Java Forum Nord Kristian Kottke From one Stream Stream Processing
with Apache Flink
©iteratec Whoami Kristian Kottke › Senior Software Engineer -> iteratec
Interests › Software Architecture › Big Data Technologies
[email protected]
github.com/kkottke xing.to/kkottke speakerdeck.com/kkottke 2
©iteratec 4
©iteratec Batch Processing 5
©iteratec Stream Processor
©iteratec Lambda Architecture 7
©iteratec Lambda Architecture 8
©iteratec Streaming Architecture 9
©iteratec Streaming Architecture 10
©iteratec Stream Processing
©iteratec Streams following: https://flink.apache.org/flink-architecture.html ← bounded stream → ← bounded
stream → now start of the stream past future unbounded stream 12
©iteratec State following: https://ci.apache.org/projects/flink/flink-docs-release-1.6/ Local State Remote State Periodic Checkpoint
13
©iteratec Time
©iteratec Time Event Time Processing Time Ingestion Time 15
©iteratec Windows
©iteratec Window Tumbling Key 1 12:00 12:10 12:20 12:30 12:40
12:50 Key 2 Key 3 17
©iteratec Window Sliding Key 1 12:00 12:10 12:20 12:30 12:40
12:50 Key 2 Key 3 18
©iteratec Window Session Key 1 12:00 12:10 12:20 12:30 12:40
12:50 Key 2 Key 3 19
©iteratec 20 20 Window › Watermark › Trigger › Late
Data › Discard › Redirect into separate Stream › Update result Key 1
©iteratec 22 22 Guarantees › At most once › At
least once › Exactly once › Processor State › End-2-End Exactly once › Resettable / Replayable Source & Sink › Idempotency Source Sink State
©iteratec 24
©iteratec Apache Flink Databases Stream following: https://ci.apache.org/projects/flink/flink-docs-release-1.6/ Storage Application Streams
Historic Data Transactions Logs IoT Clicks ..... ...framework and distributed processing engine for stateful computations over unbounded and bounded data streams 25
©iteratec Apache Flink Files, HDFS, S3, JDBC, Kafka, ... Local
Cluster Cloud DataStream API FlinkML Gelly Table & SQL CEP Table & SQL Storage Deployment Runtime API Libraries following: https://ci.apache.org/projects/flink/flink-docs-release-1.6/ DataSet API 26
©iteratec Apache Flink DataStream<String> messages = env.addSource( new FlinkKafkaConsumer<>(...)); DataStream<Tick>
ticks = messages.map( Tick::parse); DataStream<Tick> maxValues = ticks .keyBy(„id“) .timeWindow(Time.seconds(10)) .maxBy(„value“); stats.addSink(new BucketingSink(„/path/to/dir“)); OP OP OP OP Transformation Transformation Source Sink 28
©iteratec Code
©iteratec DataStream<String> inputStream = env.addSource(new FlinkKafkaConsumer<>(...)); DataStream<Tick> ticks = inputStream
.map(Tick::parse) .assignTimestampsAndWatermarks(new PeriodicAssigner(Time.seconds(5))); DataStream<Tick> maxValues = ticks .keyBy("id") .timeWindow(Time.seconds(10)) .maxBy("value"); Window Functions 33
©iteratec DataStream<Tick> performanceValues = ticks .keyBy("id") .timeWindow(Time.seconds(10)) .trigger(new ThresholdTrigger(10d)) .process(new
PerformanceFunction()); public void process( Tuple key, Context ctx, Iterable<Tick> ticks, Collector<Tick> out) { /* calculate min / max value */ out.collect(tick); } Window Functions 34
©iteratec public void processElement(Tick tick, Context ctx, Collector<Tick> out) {
... ctx.timerService().registerEventTimeTimer(timerTimestamp); ... } public void onTimer(long timestamp, OnTimerContext ctx, Collector<Tick> out) { ... ctx.output(outputTag, ctx.getCurrentKey()); ... } Timer Service 36
©iteratec DataStream<Tick> priceAlerts = ticks .keyBy("id") .flatMap(new PriceAlertFunction(10d)); public void
open(Configuration parameters) { // ... previousPriceState = getRuntimeContext().getState(previousPriceDescriptor); } public void flatMap(Tick tick, Collector<Tick> out) throws Exception { if (Math.abs(tick.value - previousPriceState.value()) > threshold) { out.collect(tick); } previousPriceState.update(tick.value); } Value State 38
©iteratec DataStream<Threshold> thresholds = env.addSource(...); BroadcastStream<Threshold> thresholdBroadcast = thresholds.broadcast(thresholdsDescriptor); DataStream<Tick>
priceAlerts = ticks .keyBy("id") .connect(thresholdBroadcast) .process(new UpdatablePriceDiffFunction()); Broadcast State 39
©iteratec
©iteratec Queryable State 43 TaskManager TaskManager TaskManager
©iteratec Complex Event Processing Stream Pattern Pattern Stream 44
©iteratec Table & SQL Dynamic Table Dynamic Table Stream Stream
Continuous Query State 45
©iteratec Alternatives source: https://commons.wikimedia.org 46
©iteratec Wrap Up › Data usually occur in streams ›
Batch Processing doesn’t meet the modern requirements regarding continuous data streams › Stream Processing › Powerful › Higher / manageable complexity › Real-time / low latency › Intuitiveness 47
www.iteratec.de Contact Kristian Kottke
[email protected]
github.com/kkottke xing.to/kkottke speakerdeck.com/kkottke