Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
AWS Summit Amsterdam 2023 - SVS204
Search
Pubudu
June 18, 2023
Technology
1
19
AWS Summit Amsterdam 2023 - SVS204
Large scale parallel data processing with AWS Step Functions Distributed Maps
Pubudu
June 18, 2023
Tweet
Share
More Decks by Pubudu
See All by Pubudu
Moving from single tenant to multi tenant
pubudusj
0
27
COM202 Dev Chat at re:Invent 2022
pubudusj
1
77
Manage webhooks at scale with AWS Serverless
pubudusj
0
49
Smart Doorbell with AWS Serverless - AWS UG Coimbatore
pubudusj
0
64
Smart Doorbell with AWS Serverless - Serverless Summit 21
pubudusj
0
90
Other Decks in Technology
See All in Technology
BrainPadプログラミングコンテスト記念LT会2025_社内イベント&問題解説
brainpadpr
1
160
Fabric + Databricks 2025.6 の最新情報ピックアップ
ryomaru0825
1
120
データプラットフォーム技術におけるメダリオンアーキテクチャという考え方/DataPlatformWithMedallionArchitecture
smdmts
5
620
Amazon S3標準/ S3 Tables/S3 Express One Zoneを使ったログ分析
shigeruoda
3
450
PostgreSQL 18 cancel request key長の変更とRailsへの関連
yahonda
0
120
Definition of Done
kawaguti
PRO
6
480
Claude Code Actionを使ったコード品質改善の取り組み
potix2
PRO
6
2.1k
Абьюзим random_bytes(). Фёдор Кулаков, разработчик Lamoda Tech
lamodatech
0
330
監視のこれまでとこれから/sakura monitoring seminar 2025
fujiwara3
11
3.8k
PHP開発者のためのSOLID原則再入門 #phpcon / PHP Conference Japan 2025
shogogg
4
650
Uniadex__公開版_20250617-AIxIoTビジネス共創ラボ_ツナガルチカラ_.pdf
iotcomjpadmin
0
160
“社内”だけで完結していた私が、AWS Community Builder になるまで
nagisa53
1
340
Featured
See All Featured
Reflections from 52 weeks, 52 projects
jeffersonlam
351
20k
Mobile First: as difficult as doing things right
swwweet
223
9.7k
Building an army of robots
kneath
306
45k
Building Adaptive Systems
keathley
43
2.6k
For a Future-Friendly Web
brad_frost
179
9.8k
The Art of Programming - Codeland 2020
erikaheidi
54
13k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
26k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
228
22k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
8
670
StorybookのUI Testing Handbookを読んだ
zakiyama
30
5.8k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
35
2.3k
Making Projects Easy
brettharned
116
6.3k
Transcript
Large scale parallel data processing with Step Functions Distributed Map
SVS204
About Me Pubudu Jayawardana @pubudusj From Amsterdam, the Netherlands Senior
Backend Developer at starred.com AWS Community Builder (Serverless) AWS Certified - SA Pro https://medium.com/@pubudusj https://pubudu.dev https://dev.to/pubudusj
AWS Community Builders
Step Functions Distributed Map
▪ To iterate over an array ▪ Limitations • 40
parallel iterations at a time • Max payload size - 256KB • Execution history - 25,000 events Map State
▪ Totally separated child executions • 25,000 events each •
10,000 executions at a time • S3 as a source ▪ Result output to S3 ▪ Only applicable for Standard flows Distributed Map
Distributed Map
▪ Source types: • S3 object list • JSON file
in S3 • CSV file in S3 • S3 manifest file ▪ Limit no of items ▪ ItemSelector Source
▪ Batching based on: • No of items • Size
▪ Modify input with Batch input Item Batching
▪ Concurrency limit ▪ Child execution types: • Standard •
Express ▪ Error threshold: • Percentage • No of items Runtime Settings
▪ S3 location ▪ Logs • manifest.json • SUCCEEDED_n.json •
FAILED_n.json • PENDING_n.json Export Result
Execution Details - Parent Event Log
Execution Details - Map Run
Execution Details - Single Child Execution
Process
▪ SAAS application to measure candidate experience ▪ Send surveys
▪ Record the feedback ▪ Visualize in a dashboard (benchmark, filter, comparison) ▪ Transform / Enrich data Process
Source from MySQL Transform Save to S3 Load to Postgres
Hourly ETL
▪ Amazon Managed Workflows for Apache Airflow ▪ Amazon EMR
Hourly ETL
None
Problem
▪ Less visibility ▪ Cannot retry single table load ▪
Takes avg 20 minutes ▪ EC2 cost Data Load Step
Solution
None
None
Demo
▪ Reduced time to avg 5 minutes ▪ Load data
parallelly ▪ Better insights ▪ Retry individual table data load ▪ Cost effective Benefits
▪ Use batching ▪ Set concurrency ▪ Set error threshold
▪ Use express child executions Tips / Lesson Learned
https://bit.ly/s3-to-postgres Read more about this
Thank You! @pubudusj