Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
AWS Summit Amsterdam 2023 - SVS204
Search
Pubudu
June 18, 2023
Technology
1
23
AWS Summit Amsterdam 2023 - SVS204
Large scale parallel data processing with AWS Step Functions Distributed Maps
Pubudu
June 18, 2023
Tweet
Share
More Decks by Pubudu
See All by Pubudu
Moving from single tenant to multi tenant
pubudusj
0
44
COM202 Dev Chat at re:Invent 2022
pubudusj
1
85
Manage webhooks at scale with AWS Serverless
pubudusj
0
55
Smart Doorbell with AWS Serverless - AWS UG Coimbatore
pubudusj
0
66
Smart Doorbell with AWS Serverless - Serverless Summit 21
pubudusj
0
96
Other Decks in Technology
See All in Technology
スケールアップ企業でQA組織が機能し続けるための組織設計と仕組み〜ボトムアップとトップダウンを両輪としたアプローチ〜
tarappo
4
370
Physical AI on AWS リファレンスアーキテクチャ / Physical AI on AWS Reference Architecture
aws_shota
1
130
FastMCP OAuth Proxy with Cognito
hironobuiga
3
190
CloudFrontのHost Header転送設定でパケットの中身はどう変わるのか?
nagisa53
1
160
スピンアウト講座06_認証系(API-OAuth-MCP)入門
overflowinc
0
1.1k
スピンアウト講座01_GitHub管理
overflowinc
0
1.3k
Bref でサービスを運用している話
sgash708
0
190
Blue/Green Deployment を用いた PostgreSQL のメジャーバージョンアップ
kkato1
0
120
The Rise of Browser Automation: AI-Powered Web Interaction in 2026
marcthompson_seo
0
310
形式手法特論:SMT ソルバで解く認可ポリシの静的解析 #kernelvm / Kernel VM Study Tsukuba No3
ytaka23
1
780
A4)シラバスを超えて語る、テストマネジメント
moritamasami
0
120
Astro Islandsの 内部実装を 「日本で一番わかりやすく」 ざっくり解説!
knj
1
270
Featured
See All Featured
Learning to Love Humans: Emotional Interface Design
aarron
275
41k
Imperfection Machines: The Place of Print at Facebook
scottboms
269
14k
Fireside Chat
paigeccino
42
3.8k
The Illustrated Children's Guide to Kubernetes
chrisshort
51
52k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
234
17k
Money Talks: Using Revenue to Get Sh*t Done
nikkihalliwell
0
190
Building AI with AI
inesmontani
PRO
1
820
Why Your Marketing Sucks and What You Can Do About It - Sophie Logan
marketingsoph
0
120
Become a Pro
speakerdeck
PRO
31
5.9k
Breaking role norms: Why Content Design is so much more than writing copy - Taylor Woolridge
uxyall
0
220
WENDY [Excerpt]
tessaabrams
9
37k
How to Think Like a Performance Engineer
csswizardry
28
2.5k
Transcript
Large scale parallel data processing with Step Functions Distributed Map
SVS204
About Me Pubudu Jayawardana @pubudusj From Amsterdam, the Netherlands Senior
Backend Developer at starred.com AWS Community Builder (Serverless) AWS Certified - SA Pro https://medium.com/@pubudusj https://pubudu.dev https://dev.to/pubudusj
AWS Community Builders
Step Functions Distributed Map
▪ To iterate over an array ▪ Limitations • 40
parallel iterations at a time • Max payload size - 256KB • Execution history - 25,000 events Map State
▪ Totally separated child executions • 25,000 events each •
10,000 executions at a time • S3 as a source ▪ Result output to S3 ▪ Only applicable for Standard flows Distributed Map
Distributed Map
▪ Source types: • S3 object list • JSON file
in S3 • CSV file in S3 • S3 manifest file ▪ Limit no of items ▪ ItemSelector Source
▪ Batching based on: • No of items • Size
▪ Modify input with Batch input Item Batching
▪ Concurrency limit ▪ Child execution types: • Standard •
Express ▪ Error threshold: • Percentage • No of items Runtime Settings
▪ S3 location ▪ Logs • manifest.json • SUCCEEDED_n.json •
FAILED_n.json • PENDING_n.json Export Result
Execution Details - Parent Event Log
Execution Details - Map Run
Execution Details - Single Child Execution
Process
▪ SAAS application to measure candidate experience ▪ Send surveys
▪ Record the feedback ▪ Visualize in a dashboard (benchmark, filter, comparison) ▪ Transform / Enrich data Process
Source from MySQL Transform Save to S3 Load to Postgres
Hourly ETL
▪ Amazon Managed Workflows for Apache Airflow ▪ Amazon EMR
Hourly ETL
None
Problem
▪ Less visibility ▪ Cannot retry single table load ▪
Takes avg 20 minutes ▪ EC2 cost Data Load Step
Solution
None
None
Demo
▪ Reduced time to avg 5 minutes ▪ Load data
parallelly ▪ Better insights ▪ Retry individual table data load ▪ Cost effective Benefits
▪ Use batching ▪ Set concurrency ▪ Set error threshold
▪ Use express child executions Tips / Lesson Learned
https://bit.ly/s3-to-postgres Read more about this
Thank You! @pubudusj