Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
AWS Summit Amsterdam 2023 - SVS204
Search
Pubudu
June 18, 2023
Technology
1
21
AWS Summit Amsterdam 2023 - SVS204
Large scale parallel data processing with AWS Step Functions Distributed Maps
Pubudu
June 18, 2023
Tweet
Share
More Decks by Pubudu
See All by Pubudu
Moving from single tenant to multi tenant
pubudusj
0
38
COM202 Dev Chat at re:Invent 2022
pubudusj
1
81
Manage webhooks at scale with AWS Serverless
pubudusj
0
52
Smart Doorbell with AWS Serverless - AWS UG Coimbatore
pubudusj
0
64
Smart Doorbell with AWS Serverless - Serverless Summit 21
pubudusj
0
90
Other Decks in Technology
See All in Technology
Dylib Hijacking on macOS: Dead or Alive?
patrickwardle
0
460
Digitization部 紹介資料
sansan33
PRO
1
5.7k
Observability — Extending Into Incident Response
nari_ex
1
150
SCONE - 動画配信の帯域を最適化する新プロトコル
kazuho
1
320
知覚とデザイン
rinchoku
1
530
オブザーバビリティが育むシステム理解と好奇心
maruloop
1
800
Introduction to Sansan for Engineers / エンジニア向け会社紹介
sansan33
PRO
5
43k
What's new in OpenShift 4.20
redhatlivestreaming
0
180
Azureコストと向き合った、4年半のリアル / Four and a half years of dealing with Azure costs
aeonpeople
1
270
webpack依存からの脱却!快適フロントエンド開発をViteで実現する #vuefes
bengo4com
3
3.2k
Wasmの気になる最新情報
askua
0
190
ハノーファーメッセ2025で見た生成AI活用ユースケース.pdf
hamadakoji
0
420
Featured
See All Featured
Reflections from 52 weeks, 52 projects
jeffersonlam
353
21k
Git: the NoSQL Database
bkeepers
PRO
431
66k
Code Review Best Practice
trishagee
72
19k
Context Engineering - Making Every Token Count
addyosmani
8
300
Navigating Team Friction
lara
190
15k
Imperfection Machines: The Place of Print at Facebook
scottboms
269
13k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
46
7.7k
Six Lessons from altMBA
skipperchong
29
4k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
667
130k
Bootstrapping a Software Product
garrettdimon
PRO
307
110k
Docker and Python
trallard
46
3.6k
Fireside Chat
paigeccino
41
3.7k
Transcript
Large scale parallel data processing with Step Functions Distributed Map
SVS204
About Me Pubudu Jayawardana @pubudusj From Amsterdam, the Netherlands Senior
Backend Developer at starred.com AWS Community Builder (Serverless) AWS Certified - SA Pro https://medium.com/@pubudusj https://pubudu.dev https://dev.to/pubudusj
AWS Community Builders
Step Functions Distributed Map
▪ To iterate over an array ▪ Limitations • 40
parallel iterations at a time • Max payload size - 256KB • Execution history - 25,000 events Map State
▪ Totally separated child executions • 25,000 events each •
10,000 executions at a time • S3 as a source ▪ Result output to S3 ▪ Only applicable for Standard flows Distributed Map
Distributed Map
▪ Source types: • S3 object list • JSON file
in S3 • CSV file in S3 • S3 manifest file ▪ Limit no of items ▪ ItemSelector Source
▪ Batching based on: • No of items • Size
▪ Modify input with Batch input Item Batching
▪ Concurrency limit ▪ Child execution types: • Standard •
Express ▪ Error threshold: • Percentage • No of items Runtime Settings
▪ S3 location ▪ Logs • manifest.json • SUCCEEDED_n.json •
FAILED_n.json • PENDING_n.json Export Result
Execution Details - Parent Event Log
Execution Details - Map Run
Execution Details - Single Child Execution
Process
▪ SAAS application to measure candidate experience ▪ Send surveys
▪ Record the feedback ▪ Visualize in a dashboard (benchmark, filter, comparison) ▪ Transform / Enrich data Process
Source from MySQL Transform Save to S3 Load to Postgres
Hourly ETL
▪ Amazon Managed Workflows for Apache Airflow ▪ Amazon EMR
Hourly ETL
None
Problem
▪ Less visibility ▪ Cannot retry single table load ▪
Takes avg 20 minutes ▪ EC2 cost Data Load Step
Solution
None
None
Demo
▪ Reduced time to avg 5 minutes ▪ Load data
parallelly ▪ Better insights ▪ Retry individual table data load ▪ Cost effective Benefits
▪ Use batching ▪ Set concurrency ▪ Set error threshold
▪ Use express child executions Tips / Lesson Learned
https://bit.ly/s3-to-postgres Read more about this
Thank You! @pubudusj