Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
AWS Summit Amsterdam 2023 - SVS204
Search
Pubudu
June 18, 2023
Technology
1
22
AWS Summit Amsterdam 2023 - SVS204
Large scale parallel data processing with AWS Step Functions Distributed Maps
Pubudu
June 18, 2023
Tweet
Share
More Decks by Pubudu
See All by Pubudu
Moving from single tenant to multi tenant
pubudusj
0
42
COM202 Dev Chat at re:Invent 2022
pubudusj
1
84
Manage webhooks at scale with AWS Serverless
pubudusj
0
55
Smart Doorbell with AWS Serverless - AWS UG Coimbatore
pubudusj
0
66
Smart Doorbell with AWS Serverless - Serverless Summit 21
pubudusj
0
96
Other Decks in Technology
See All in Technology
Red Hat OpenStack Services on OpenShift
tamemiya
0
120
Introduction to Sansan for Engineers / エンジニア向け会社紹介
sansan33
PRO
6
68k
Oracle Cloud Observability and Management Platform - OCI 運用監視サービス概要 -
oracle4engineer
PRO
2
14k
Frontier Agents (Kiro autonomous agent / AWS Security Agent / AWS DevOps Agent) の紹介
msysh
3
180
Cosmos World Foundation Model Platform for Physical AI
takmin
0
940
22nd ACRi Webinar - NTT Kawahara-san's slide
nao_sumikawa
0
100
Bill One急成長の舞台裏 開発組織が直面した失敗と教訓
sansantech
PRO
2
380
会社紹介資料 / Sansan Company Profile
sansan33
PRO
15
400k
こんなところでも(地味に)活躍するImage Modeさんを知ってるかい?- Image Mode for OpenShift -
tsukaman
1
160
OWASP Top 10:2025 リリースと 少しの日本語化にまつわる裏話
okdt
PRO
3
820
~Everything as Codeを諦めない~ 後からCDK
mu7889yoon
3
440
20260208_第66回 コンピュータビジョン勉強会
keiichiito1978
0
180
Featured
See All Featured
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
47
7.9k
Leo the Paperboy
mayatellez
4
1.4k
Leading Effective Engineering Teams in the AI Era
addyosmani
9
1.6k
Build your cross-platform service in a week with App Engine
jlugia
234
18k
The Art of Programming - Codeland 2020
erikaheidi
57
14k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
26
3.3k
Chasing Engaging Ingredients in Design
codingconduct
0
110
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
38
2.7k
Git: the NoSQL Database
bkeepers
PRO
432
66k
Java REST API Framework Comparison - PWX 2021
mraible
34
9.1k
Site-Speed That Sticks
csswizardry
13
1.1k
VelocityConf: Rendering Performance Case Studies
addyosmani
333
24k
Transcript
Large scale parallel data processing with Step Functions Distributed Map
SVS204
About Me Pubudu Jayawardana @pubudusj From Amsterdam, the Netherlands Senior
Backend Developer at starred.com AWS Community Builder (Serverless) AWS Certified - SA Pro https://medium.com/@pubudusj https://pubudu.dev https://dev.to/pubudusj
AWS Community Builders
Step Functions Distributed Map
▪ To iterate over an array ▪ Limitations • 40
parallel iterations at a time • Max payload size - 256KB • Execution history - 25,000 events Map State
▪ Totally separated child executions • 25,000 events each •
10,000 executions at a time • S3 as a source ▪ Result output to S3 ▪ Only applicable for Standard flows Distributed Map
Distributed Map
▪ Source types: • S3 object list • JSON file
in S3 • CSV file in S3 • S3 manifest file ▪ Limit no of items ▪ ItemSelector Source
▪ Batching based on: • No of items • Size
▪ Modify input with Batch input Item Batching
▪ Concurrency limit ▪ Child execution types: • Standard •
Express ▪ Error threshold: • Percentage • No of items Runtime Settings
▪ S3 location ▪ Logs • manifest.json • SUCCEEDED_n.json •
FAILED_n.json • PENDING_n.json Export Result
Execution Details - Parent Event Log
Execution Details - Map Run
Execution Details - Single Child Execution
Process
▪ SAAS application to measure candidate experience ▪ Send surveys
▪ Record the feedback ▪ Visualize in a dashboard (benchmark, filter, comparison) ▪ Transform / Enrich data Process
Source from MySQL Transform Save to S3 Load to Postgres
Hourly ETL
▪ Amazon Managed Workflows for Apache Airflow ▪ Amazon EMR
Hourly ETL
None
Problem
▪ Less visibility ▪ Cannot retry single table load ▪
Takes avg 20 minutes ▪ EC2 cost Data Load Step
Solution
None
None
Demo
▪ Reduced time to avg 5 minutes ▪ Load data
parallelly ▪ Better insights ▪ Retry individual table data load ▪ Cost effective Benefits
▪ Use batching ▪ Set concurrency ▪ Set error threshold
▪ Use express child executions Tips / Lesson Learned
https://bit.ly/s3-to-postgres Read more about this
Thank You! @pubudusj