$30 off During Our Annual Pro Sale. View Details »
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Stream Processing with Apache Flink
Search
Kristian Kottke
September 13, 2018
Programming
0
170
Stream Processing with Apache Flink
Kristian Kottke
September 13, 2018
Tweet
Share
More Decks by Kristian Kottke
See All by Kristian Kottke
Jeder wie er will, aber so nicht
kkottke
0
36
Turmbau_zu_Babel.pdf
kkottke
0
110
Reactive Microservices based on Vert.x
kkottke
0
220
Graph Processing using Apache Flink
kkottke
0
110
Other Decks in Programming
See All in Programming
バックエンドエンジニアによる Amebaブログ K8s 基盤への CronJobの導入・運用経験
sunabig
0
150
tsgolintはいかにしてtypescript-goの非公開APIを呼び出しているのか
syumai
6
2.2k
手軽に積ん読を増やすには?/読みたい本と付き合うには?
o0h
PRO
1
170
從冷知識到漏洞,你不懂的 Web,駭客懂 - Huli @ WebConf Taiwan 2025
aszx87410
2
2.5k
エディターってAIで操作できるんだぜ
kis9a
0
720
S3 VectorsとStrands Agentsを利用したAgentic RAGシステムの構築
tosuri13
6
310
Giselleで作るAI QAアシスタント 〜 Pull Requestレビューに継続的QAを
codenote
0
170
AIの誤りが許されない業務システムにおいて“信頼されるAI” を目指す / building-trusted-ai-systems
yuya4
6
3.2k
tparseでgo testの出力を見やすくする
utgwkk
2
210
非同期処理の迷宮を抜ける: 初学者がつまづく構造的な原因
pd1xx
1
710
Rubyで鍛える仕組み化プロヂュース力
muryoimpl
0
110
まだ間に合う!Claude Code元年をふりかえる
nogu66
5
820
Featured
See All Featured
Code Reviewing Like a Champion
maltzj
527
40k
The World Runs on Bad Software
bkeepers
PRO
72
12k
A Tale of Four Properties
chriscoyier
162
23k
How STYLIGHT went responsive
nonsquared
100
6k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
37
2.6k
Agile that works and the tools we love
rasmusluckow
331
21k
Java REST API Framework Comparison - PWX 2021
mraible
34
9k
Build your cross-platform service in a week with App Engine
jlugia
234
18k
Automating Front-end Workflow
addyosmani
1371
200k
How To Stay Up To Date on Web Technology
chriscoyier
791
250k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
128
54k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
46
2.6k
Transcript
Java Forum Nord Kristian Kottke From one Stream Stream Processing
with Apache Flink
©iteratec Whoami Kristian Kottke › Senior Software Engineer -> iteratec
Interests › Software Architecture › Big Data Technologies
[email protected]
github.com/kkottke xing.to/kkottke speakerdeck.com/kkottke 2
©iteratec 4
©iteratec Batch Processing 5
©iteratec Stream Processor
©iteratec Lambda Architecture 7
©iteratec Lambda Architecture 8
©iteratec Streaming Architecture 9
©iteratec Streaming Architecture 10
©iteratec Stream Processing
©iteratec Streams following: https://flink.apache.org/flink-architecture.html ← bounded stream → ← bounded
stream → now start of the stream past future unbounded stream 12
©iteratec State following: https://ci.apache.org/projects/flink/flink-docs-release-1.6/ Local State Remote State Periodic Checkpoint
13
©iteratec Time
©iteratec Time Event Time Processing Time Ingestion Time 15
©iteratec Windows
©iteratec Window Tumbling Key 1 12:00 12:10 12:20 12:30 12:40
12:50 Key 2 Key 3 17
©iteratec Window Sliding Key 1 12:00 12:10 12:20 12:30 12:40
12:50 Key 2 Key 3 18
©iteratec Window Session Key 1 12:00 12:10 12:20 12:30 12:40
12:50 Key 2 Key 3 19
©iteratec 20 20 Window › Watermark › Trigger › Late
Data › Discard › Redirect into separate Stream › Update result Key 1
©iteratec 22 22 Guarantees › At most once › At
least once › Exactly once › Processor State › End-2-End Exactly once › Resettable / Replayable Source & Sink › Idempotency Source Sink State
©iteratec 24
©iteratec Apache Flink Databases Stream following: https://ci.apache.org/projects/flink/flink-docs-release-1.6/ Storage Application Streams
Historic Data Transactions Logs IoT Clicks ..... ...framework and distributed processing engine for stateful computations over unbounded and bounded data streams 25
©iteratec Apache Flink Files, HDFS, S3, JDBC, Kafka, ... Local
Cluster Cloud DataStream API FlinkML Gelly Table & SQL CEP Table & SQL Storage Deployment Runtime API Libraries following: https://ci.apache.org/projects/flink/flink-docs-release-1.6/ DataSet API 26
©iteratec Apache Flink DataStream<String> messages = env.addSource( new FlinkKafkaConsumer<>(...)); DataStream<Tick>
ticks = messages.map( Tick::parse); DataStream<Tick> maxValues = ticks .keyBy(„id“) .timeWindow(Time.seconds(10)) .maxBy(„value“); stats.addSink(new BucketingSink(„/path/to/dir“)); OP OP OP OP Transformation Transformation Source Sink 28
©iteratec Code
©iteratec DataStream<String> inputStream = env.addSource(new FlinkKafkaConsumer<>(...)); DataStream<Tick> ticks = inputStream
.map(Tick::parse) .assignTimestampsAndWatermarks(new PeriodicAssigner(Time.seconds(5))); DataStream<Tick> maxValues = ticks .keyBy("id") .timeWindow(Time.seconds(10)) .maxBy("value"); Window Functions 33
©iteratec DataStream<Tick> performanceValues = ticks .keyBy("id") .timeWindow(Time.seconds(10)) .trigger(new ThresholdTrigger(10d)) .process(new
PerformanceFunction()); public void process( Tuple key, Context ctx, Iterable<Tick> ticks, Collector<Tick> out) { /* calculate min / max value */ out.collect(tick); } Window Functions 34
©iteratec public void processElement(Tick tick, Context ctx, Collector<Tick> out) {
... ctx.timerService().registerEventTimeTimer(timerTimestamp); ... } public void onTimer(long timestamp, OnTimerContext ctx, Collector<Tick> out) { ... ctx.output(outputTag, ctx.getCurrentKey()); ... } Timer Service 36
©iteratec DataStream<Tick> priceAlerts = ticks .keyBy("id") .flatMap(new PriceAlertFunction(10d)); public void
open(Configuration parameters) { // ... previousPriceState = getRuntimeContext().getState(previousPriceDescriptor); } public void flatMap(Tick tick, Collector<Tick> out) throws Exception { if (Math.abs(tick.value - previousPriceState.value()) > threshold) { out.collect(tick); } previousPriceState.update(tick.value); } Value State 38
©iteratec DataStream<Threshold> thresholds = env.addSource(...); BroadcastStream<Threshold> thresholdBroadcast = thresholds.broadcast(thresholdsDescriptor); DataStream<Tick>
priceAlerts = ticks .keyBy("id") .connect(thresholdBroadcast) .process(new UpdatablePriceDiffFunction()); Broadcast State 39
©iteratec
©iteratec Queryable State 43 TaskManager TaskManager TaskManager
©iteratec Complex Event Processing Stream Pattern Pattern Stream 44
©iteratec Table & SQL Dynamic Table Dynamic Table Stream Stream
Continuous Query State 45
©iteratec Alternatives source: https://commons.wikimedia.org 46
©iteratec Wrap Up › Data usually occur in streams ›
Batch Processing doesn’t meet the modern requirements regarding continuous data streams › Stream Processing › Powerful › Higher / manageable complexity › Real-time / low latency › Intuitiveness 47
www.iteratec.de Contact Kristian Kottke
[email protected]
github.com/kkottke xing.to/kkottke speakerdeck.com/kkottke