Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Stream Processing with Apache Flink
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Kristian Kottke
September 13, 2018
Programming
0
170
Stream Processing with Apache Flink
Kristian Kottke
September 13, 2018
Tweet
Share
More Decks by Kristian Kottke
See All by Kristian Kottke
Jeder wie er will, aber so nicht
kkottke
0
37
Turmbau_zu_Babel.pdf
kkottke
0
110
Reactive Microservices based on Vert.x
kkottke
0
220
Graph Processing using Apache Flink
kkottke
0
110
Other Decks in Programming
See All in Programming
最初からAWS CDKで技術検証してもいいんじゃない?
akihisaikeda
4
140
AI 開発合宿を通して得た学び
niftycorp
PRO
0
100
2026年は Rust 置き換えが流行る! / 20260220-niigata-5min-tech
girigiribauer
0
230
Cyrius ーLinux非依存にコンテナをネイティブ実行する専用OSー
n4mlz
0
140
技術検証結果の整理と解析をAIに任せよう!
keisukeikeda
0
120
ベクトル検索のフィルタを用いた機械学習モデルとの統合 / python-meetup-fukuoka-06-vector-attr
monochromegane
2
410
Docコメントで始める簡単ガードレール
keisukeikeda
1
110
Codexに役割を持たせる 他のAIエージェントと組み合わせる実務Tips
o8n
4
1.3k
20260228_JAWS_Beginner_Kansai
takuyay0ne
5
510
モックわからないマン卒業記 ~振る舞いを起点に見直した、フロントエンドテストにおけるモックの使いどころ~
tasukuwatanabe
2
260
S3ストレージクラスの「見える」「ある」「使える」は全部違う ─ 体験から見た、仕様の深淵を覗く
ya_ma23
0
440
CSC307 Lecture 13
javiergs
PRO
0
320
Featured
See All Featured
Heart Work Chapter 1 - Part 1
lfama
PRO
5
35k
Primal Persuasion: How to Engage the Brain for Learning That Lasts
tmiket
0
290
The Spectacular Lies of Maps
axbom
PRO
1
620
Product Roadmaps are Hard
iamctodd
PRO
55
12k
The innovator’s Mindset - Leading Through an Era of Exponential Change - McGill University 2025
jdejongh
PRO
1
120
Dominate Local Search Results - an insider guide to GBP, reviews, and Local SEO
greggifford
PRO
0
100
New Earth Scene 8
popppiees
1
1.7k
The Curse of the Amulet
leimatthew05
1
9.9k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.9k
The B2B funnel & how to create a winning content strategy
katarinadahlin
PRO
1
300
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
360
30k
How to Think Like a Performance Engineer
csswizardry
28
2.5k
Transcript
Java Forum Nord Kristian Kottke From one Stream Stream Processing
with Apache Flink
©iteratec Whoami Kristian Kottke › Senior Software Engineer -> iteratec
Interests › Software Architecture › Big Data Technologies
[email protected]
github.com/kkottke xing.to/kkottke speakerdeck.com/kkottke 2
©iteratec 4
©iteratec Batch Processing 5
©iteratec Stream Processor
©iteratec Lambda Architecture 7
©iteratec Lambda Architecture 8
©iteratec Streaming Architecture 9
©iteratec Streaming Architecture 10
©iteratec Stream Processing
©iteratec Streams following: https://flink.apache.org/flink-architecture.html ← bounded stream → ← bounded
stream → now start of the stream past future unbounded stream 12
©iteratec State following: https://ci.apache.org/projects/flink/flink-docs-release-1.6/ Local State Remote State Periodic Checkpoint
13
©iteratec Time
©iteratec Time Event Time Processing Time Ingestion Time 15
©iteratec Windows
©iteratec Window Tumbling Key 1 12:00 12:10 12:20 12:30 12:40
12:50 Key 2 Key 3 17
©iteratec Window Sliding Key 1 12:00 12:10 12:20 12:30 12:40
12:50 Key 2 Key 3 18
©iteratec Window Session Key 1 12:00 12:10 12:20 12:30 12:40
12:50 Key 2 Key 3 19
©iteratec 20 20 Window › Watermark › Trigger › Late
Data › Discard › Redirect into separate Stream › Update result Key 1
©iteratec 22 22 Guarantees › At most once › At
least once › Exactly once › Processor State › End-2-End Exactly once › Resettable / Replayable Source & Sink › Idempotency Source Sink State
©iteratec 24
©iteratec Apache Flink Databases Stream following: https://ci.apache.org/projects/flink/flink-docs-release-1.6/ Storage Application Streams
Historic Data Transactions Logs IoT Clicks ..... ...framework and distributed processing engine for stateful computations over unbounded and bounded data streams 25
©iteratec Apache Flink Files, HDFS, S3, JDBC, Kafka, ... Local
Cluster Cloud DataStream API FlinkML Gelly Table & SQL CEP Table & SQL Storage Deployment Runtime API Libraries following: https://ci.apache.org/projects/flink/flink-docs-release-1.6/ DataSet API 26
©iteratec Apache Flink DataStream<String> messages = env.addSource( new FlinkKafkaConsumer<>(...)); DataStream<Tick>
ticks = messages.map( Tick::parse); DataStream<Tick> maxValues = ticks .keyBy(„id“) .timeWindow(Time.seconds(10)) .maxBy(„value“); stats.addSink(new BucketingSink(„/path/to/dir“)); OP OP OP OP Transformation Transformation Source Sink 28
©iteratec Code
©iteratec DataStream<String> inputStream = env.addSource(new FlinkKafkaConsumer<>(...)); DataStream<Tick> ticks = inputStream
.map(Tick::parse) .assignTimestampsAndWatermarks(new PeriodicAssigner(Time.seconds(5))); DataStream<Tick> maxValues = ticks .keyBy("id") .timeWindow(Time.seconds(10)) .maxBy("value"); Window Functions 33
©iteratec DataStream<Tick> performanceValues = ticks .keyBy("id") .timeWindow(Time.seconds(10)) .trigger(new ThresholdTrigger(10d)) .process(new
PerformanceFunction()); public void process( Tuple key, Context ctx, Iterable<Tick> ticks, Collector<Tick> out) { /* calculate min / max value */ out.collect(tick); } Window Functions 34
©iteratec public void processElement(Tick tick, Context ctx, Collector<Tick> out) {
... ctx.timerService().registerEventTimeTimer(timerTimestamp); ... } public void onTimer(long timestamp, OnTimerContext ctx, Collector<Tick> out) { ... ctx.output(outputTag, ctx.getCurrentKey()); ... } Timer Service 36
©iteratec DataStream<Tick> priceAlerts = ticks .keyBy("id") .flatMap(new PriceAlertFunction(10d)); public void
open(Configuration parameters) { // ... previousPriceState = getRuntimeContext().getState(previousPriceDescriptor); } public void flatMap(Tick tick, Collector<Tick> out) throws Exception { if (Math.abs(tick.value - previousPriceState.value()) > threshold) { out.collect(tick); } previousPriceState.update(tick.value); } Value State 38
©iteratec DataStream<Threshold> thresholds = env.addSource(...); BroadcastStream<Threshold> thresholdBroadcast = thresholds.broadcast(thresholdsDescriptor); DataStream<Tick>
priceAlerts = ticks .keyBy("id") .connect(thresholdBroadcast) .process(new UpdatablePriceDiffFunction()); Broadcast State 39
©iteratec
©iteratec Queryable State 43 TaskManager TaskManager TaskManager
©iteratec Complex Event Processing Stream Pattern Pattern Stream 44
©iteratec Table & SQL Dynamic Table Dynamic Table Stream Stream
Continuous Query State 45
©iteratec Alternatives source: https://commons.wikimedia.org 46
©iteratec Wrap Up › Data usually occur in streams ›
Batch Processing doesn’t meet the modern requirements regarding continuous data streams › Stream Processing › Powerful › Higher / manageable complexity › Real-time / low latency › Intuitiveness 47
www.iteratec.de Contact Kristian Kottke
[email protected]
github.com/kkottke xing.to/kkottke speakerdeck.com/kkottke