Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
AWS Summit Amsterdam 2023 - SVS204
Search
Pubudu
June 18, 2023
Technology
1
22
AWS Summit Amsterdam 2023 - SVS204
Large scale parallel data processing with AWS Step Functions Distributed Maps
Pubudu
June 18, 2023
Tweet
Share
More Decks by Pubudu
See All by Pubudu
Moving from single tenant to multi tenant
pubudusj
0
41
COM202 Dev Chat at re:Invent 2022
pubudusj
1
84
Manage webhooks at scale with AWS Serverless
pubudusj
0
54
Smart Doorbell with AWS Serverless - AWS UG Coimbatore
pubudusj
0
66
Smart Doorbell with AWS Serverless - Serverless Summit 21
pubudusj
0
94
Other Decks in Technology
See All in Technology
AWS re:Invent 2025~初参加の成果と学び~
kubomasataka
0
180
AWSインフルエンサーへの道 / load of AWS Influencer
whisaiyo
0
210
New Relic 1 年生の振り返りと Cloud Cost Intelligence について #NRUG
play_inc
0
220
20251219 OpenIDファウンデーション・ジャパン紹介 / OpenID Foundation Japan Intro
oidfj
0
490
Knowledge Work の AI Backend
kworkdev
PRO
0
210
SQLだけでマイグレーションしたい!
makki_d
0
1.2k
20251203_AIxIoTビジネス共創ラボ_第4回勉強会_BP山崎.pdf
iotcomjpadmin
0
130
[2025-12-12]あの日僕が見た胡蝶の夢 〜人の夢は終わらねェ AIによるパフォーマンスチューニングのすゝめ〜
tosite
0
170
アラフォーおじさん、はじめてre:Inventに行く / A 40-Something Guy’s First re:Invent Adventure
kaminashi
0
130
ペアーズにおけるAIエージェント 基盤とText to SQLツールの紹介
hisamouna
2
1.6k
ハッカソンから社内プロダクトへ AIエージェント「ko☆shi」開発で学んだ4つの重要要素
sonoda_mj
6
1.6k
Amazon Bedrock Knowledge Bases × メタデータ活用で実現する検証可能な RAG 設計
tomoaki25
6
2.3k
Featured
See All Featured
It's Worth the Effort
3n
187
29k
Un-Boring Meetings
codingconduct
0
160
The SEO Collaboration Effect
kristinabergwall1
0
310
Primal Persuasion: How to Engage the Brain for Learning That Lasts
tmiket
0
190
Imperfection Machines: The Place of Print at Facebook
scottboms
269
13k
Testing 201, or: Great Expectations
jmmastey
46
7.8k
Navigating Team Friction
lara
191
16k
The Spectacular Lies of Maps
axbom
PRO
1
400
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
27k
What does AI have to do with Human Rights?
axbom
PRO
0
1.9k
Fantastic passwords and where to find them - at NoRuKo
philnash
52
3.5k
Money Talks: Using Revenue to Get Sh*t Done
nikkihalliwell
0
120
Transcript
Large scale parallel data processing with Step Functions Distributed Map
SVS204
About Me Pubudu Jayawardana @pubudusj From Amsterdam, the Netherlands Senior
Backend Developer at starred.com AWS Community Builder (Serverless) AWS Certified - SA Pro https://medium.com/@pubudusj https://pubudu.dev https://dev.to/pubudusj
AWS Community Builders
Step Functions Distributed Map
▪ To iterate over an array ▪ Limitations • 40
parallel iterations at a time • Max payload size - 256KB • Execution history - 25,000 events Map State
▪ Totally separated child executions • 25,000 events each •
10,000 executions at a time • S3 as a source ▪ Result output to S3 ▪ Only applicable for Standard flows Distributed Map
Distributed Map
▪ Source types: • S3 object list • JSON file
in S3 • CSV file in S3 • S3 manifest file ▪ Limit no of items ▪ ItemSelector Source
▪ Batching based on: • No of items • Size
▪ Modify input with Batch input Item Batching
▪ Concurrency limit ▪ Child execution types: • Standard •
Express ▪ Error threshold: • Percentage • No of items Runtime Settings
▪ S3 location ▪ Logs • manifest.json • SUCCEEDED_n.json •
FAILED_n.json • PENDING_n.json Export Result
Execution Details - Parent Event Log
Execution Details - Map Run
Execution Details - Single Child Execution
Process
▪ SAAS application to measure candidate experience ▪ Send surveys
▪ Record the feedback ▪ Visualize in a dashboard (benchmark, filter, comparison) ▪ Transform / Enrich data Process
Source from MySQL Transform Save to S3 Load to Postgres
Hourly ETL
▪ Amazon Managed Workflows for Apache Airflow ▪ Amazon EMR
Hourly ETL
None
Problem
▪ Less visibility ▪ Cannot retry single table load ▪
Takes avg 20 minutes ▪ EC2 cost Data Load Step
Solution
None
None
Demo
▪ Reduced time to avg 5 minutes ▪ Load data
parallelly ▪ Better insights ▪ Retry individual table data load ▪ Cost effective Benefits
▪ Use batching ▪ Set concurrency ▪ Set error threshold
▪ Use express child executions Tips / Lesson Learned
https://bit.ly/s3-to-postgres Read more about this
Thank You! @pubudusj