Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
AWS Summit Amsterdam 2023 - SVS204
Search
Pubudu
June 18, 2023
Technology
25
1
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
AWS Summit Amsterdam 2023 - SVS204
Large scale parallel data processing with AWS Step Functions Distributed Maps
Pubudu
June 18, 2023
More Decks by Pubudu
See All by Pubudu
Moving from single tenant to multi tenant
pubudusj
0
47
COM202 Dev Chat at re:Invent 2022
pubudusj
1
89
Manage webhooks at scale with AWS Serverless
pubudusj
0
60
Smart Doorbell with AWS Serverless - AWS UG Coimbatore
pubudusj
0
67
Smart Doorbell with AWS Serverless - Serverless Summit 21
pubudusj
0
98
Other Decks in Technology
See All in Technology
GoとSIMDとWasmの今。
askua
3
520
2026.06.13_AI時代に事業会社が「SIer出身エンジニア」を求める理由 / Why Businesses Seek Engineers with a System Integrator Background in the AI Era
jumtech
0
950
データ基盤をDataformで整えた話 〜 開発環境を添えて 〜
takapy
0
130
【Gen-AX】20260530開催_JJUG CCC 2026 Spring
genax
1
440
マーケットプレイス版Oracle WebCenter Content For OCI
oracle4engineer
PRO
5
1.8k
React、まだ楽しくて草
uhyo
7
4.2k
作って終わりにしない タイミーのセマンティックレイヤー育成の現在地
chanyou0311
1
1.4k
noUncheckedIndexedAccess、3時間、1万円。 / noUncheckedIndexedAccess, 3 Hours, 10,000 JPY.
kaonavi
1
340
Rubyで音を視る
ydah
1
160
Platform Engineering as a Product: Criteria for Improvement and Multi-Tenant Design
kumorn5s
0
530
NAB Show 2026 動画技術関連レポート / NAB Show 2026 Report
cyberagentdevelopers
PRO
0
130
MIERUNE JCT 発表資料「宇宙から伊能忠敬ごっこ」
syuchimu
0
190
Featured
See All Featured
The Cost Of JavaScript in 2023
addyosmani
55
10k
Visual Storytelling: How to be a Superhuman Communicator
reverentgeek
2
550
Beyond borders and beyond the search box: How to win the global "messy middle" with AI-driven SEO
davidcarrasco
3
150
Navigating the Design Leadership Dip - Product Design Week Design Leaders+ Conference 2024
apolaine
1
340
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
254
22k
Leadership Guide Workshop - DevTernity 2021
reverentgeek
1
300
A designer walks into a library…
pauljervisheath
211
24k
Building Better People: How to give real-time feedback that sticks.
wjessup
370
20k
A Modern Web Designer's Workflow
chriscoyier
698
190k
Keith and Marios Guide to Fast Websites
keithpitt
413
23k
Digital Ethics as a Driver of Design Innovation
axbom
PRO
1
310
Stewardship and Sustainability of Urban and Community Forests
pwiseman
0
220
Transcript
Large scale parallel data processing with Step Functions Distributed Map
SVS204
About Me Pubudu Jayawardana @pubudusj From Amsterdam, the Netherlands Senior
Backend Developer at starred.com AWS Community Builder (Serverless) AWS Certified - SA Pro https://medium.com/@pubudusj https://pubudu.dev https://dev.to/pubudusj
AWS Community Builders
Step Functions Distributed Map
▪ To iterate over an array ▪ Limitations • 40
parallel iterations at a time • Max payload size - 256KB • Execution history - 25,000 events Map State
▪ Totally separated child executions • 25,000 events each •
10,000 executions at a time • S3 as a source ▪ Result output to S3 ▪ Only applicable for Standard flows Distributed Map
Distributed Map
▪ Source types: • S3 object list • JSON file
in S3 • CSV file in S3 • S3 manifest file ▪ Limit no of items ▪ ItemSelector Source
▪ Batching based on: • No of items • Size
▪ Modify input with Batch input Item Batching
▪ Concurrency limit ▪ Child execution types: • Standard •
Express ▪ Error threshold: • Percentage • No of items Runtime Settings
▪ S3 location ▪ Logs • manifest.json • SUCCEEDED_n.json •
FAILED_n.json • PENDING_n.json Export Result
Execution Details - Parent Event Log
Execution Details - Map Run
Execution Details - Single Child Execution
Process
▪ SAAS application to measure candidate experience ▪ Send surveys
▪ Record the feedback ▪ Visualize in a dashboard (benchmark, filter, comparison) ▪ Transform / Enrich data Process
Source from MySQL Transform Save to S3 Load to Postgres
Hourly ETL
▪ Amazon Managed Workflows for Apache Airflow ▪ Amazon EMR
Hourly ETL
None
Problem
▪ Less visibility ▪ Cannot retry single table load ▪
Takes avg 20 minutes ▪ EC2 cost Data Load Step
Solution
None
None
Demo
▪ Reduced time to avg 5 minutes ▪ Load data
parallelly ▪ Better insights ▪ Retry individual table data load ▪ Cost effective Benefits
▪ Use batching ▪ Set concurrency ▪ Set error threshold
▪ Use express child executions Tips / Lesson Learned
https://bit.ly/s3-to-postgres Read more about this
Thank You! @pubudusj