Lock in $30 Savings on PRO—Offer Ends Soon! ⏳
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
AWS Summit Amsterdam 2023 - SVS204
Search
Pubudu
June 18, 2023
Technology
1
22
AWS Summit Amsterdam 2023 - SVS204
Large scale parallel data processing with AWS Step Functions Distributed Maps
Pubudu
June 18, 2023
Tweet
Share
More Decks by Pubudu
See All by Pubudu
Moving from single tenant to multi tenant
pubudusj
0
39
COM202 Dev Chat at re:Invent 2022
pubudusj
1
83
Manage webhooks at scale with AWS Serverless
pubudusj
0
54
Smart Doorbell with AWS Serverless - AWS UG Coimbatore
pubudusj
0
66
Smart Doorbell with AWS Serverless - Serverless Summit 21
pubudusj
0
94
Other Decks in Technology
See All in Technology
研究開発×プロダクトマネジメントへの挑戦 / ly_mlpm_meetup
sansan_randd
0
110
Ruby で作る大規模イベントネットワーク構築・運用支援システム TTDB
taketo1113
1
260
Reinforcement Fine-tuning 基礎〜実践まで
ch6noota
0
170
大企業でもできる!ボトムアップで拡大させるプラットフォームの作り方
findy_eventslides
1
710
Lambdaの常識はどう変わる?!re:Invent 2025 before after
iwatatomoya
1
450
re:Invent 2025 ふりかえり 生成AI版
takaakikakei
1
190
AWSを使う上で最低限知っておきたいセキュリティ研修を社内で実施した話 ~みんなでやるセキュリティ~
maimyyym
2
310
MapKitとオープンデータで実現する地図情報の拡張と可視化
zozotech
PRO
1
130
最近のLinux普段づかいWaylandデスクトップ元年
penguin2716
1
690
[JAWS-UG 横浜支部 #91]DevOps Agent vs CloudWatch Investigations -比較と実践-
sh_fk2
1
250
20251209_WAKECareer_生成AIを活用した設計・開発プロセス
syobochim
6
1.5k
学習データって増やせばいいんですか?
ftakahashi
2
310
Featured
See All Featured
jQuery: Nuts, Bolts and Bling
dougneiner
65
8.2k
Documentation Writing (for coders)
carmenintech
76
5.2k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
285
14k
How GitHub (no longer) Works
holman
316
140k
Designing for Performance
lara
610
69k
Making Projects Easy
brettharned
120
6.5k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
49
3.2k
It's Worth the Effort
3n
187
29k
Writing Fast Ruby
sferik
630
62k
Mobile First: as difficult as doing things right
swwweet
225
10k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
659
61k
Large-scale JavaScript Application Architecture
addyosmani
515
110k
Transcript
Large scale parallel data processing with Step Functions Distributed Map
SVS204
About Me Pubudu Jayawardana @pubudusj From Amsterdam, the Netherlands Senior
Backend Developer at starred.com AWS Community Builder (Serverless) AWS Certified - SA Pro https://medium.com/@pubudusj https://pubudu.dev https://dev.to/pubudusj
AWS Community Builders
Step Functions Distributed Map
▪ To iterate over an array ▪ Limitations • 40
parallel iterations at a time • Max payload size - 256KB • Execution history - 25,000 events Map State
▪ Totally separated child executions • 25,000 events each •
10,000 executions at a time • S3 as a source ▪ Result output to S3 ▪ Only applicable for Standard flows Distributed Map
Distributed Map
▪ Source types: • S3 object list • JSON file
in S3 • CSV file in S3 • S3 manifest file ▪ Limit no of items ▪ ItemSelector Source
▪ Batching based on: • No of items • Size
▪ Modify input with Batch input Item Batching
▪ Concurrency limit ▪ Child execution types: • Standard •
Express ▪ Error threshold: • Percentage • No of items Runtime Settings
▪ S3 location ▪ Logs • manifest.json • SUCCEEDED_n.json •
FAILED_n.json • PENDING_n.json Export Result
Execution Details - Parent Event Log
Execution Details - Map Run
Execution Details - Single Child Execution
Process
▪ SAAS application to measure candidate experience ▪ Send surveys
▪ Record the feedback ▪ Visualize in a dashboard (benchmark, filter, comparison) ▪ Transform / Enrich data Process
Source from MySQL Transform Save to S3 Load to Postgres
Hourly ETL
▪ Amazon Managed Workflows for Apache Airflow ▪ Amazon EMR
Hourly ETL
None
Problem
▪ Less visibility ▪ Cannot retry single table load ▪
Takes avg 20 minutes ▪ EC2 cost Data Load Step
Solution
None
None
Demo
▪ Reduced time to avg 5 minutes ▪ Load data
parallelly ▪ Better insights ▪ Retry individual table data load ▪ Cost effective Benefits
▪ Use batching ▪ Set concurrency ▪ Set error threshold
▪ Use express child executions Tips / Lesson Learned
https://bit.ly/s3-to-postgres Read more about this
Thank You! @pubudusj