Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
AWS Summit Amsterdam 2023 - SVS204
Search
Pubudu
June 18, 2023
Technology
1
16
AWS Summit Amsterdam 2023 - SVS204
Large scale parallel data processing with AWS Step Functions Distributed Maps
Pubudu
June 18, 2023
Tweet
Share
More Decks by Pubudu
See All by Pubudu
COM202 Dev Chat at re:Invent 2022
pubudusj
1
53
Manage webhooks at scale with AWS Serverless
pubudusj
0
43
Smart Doorbell with AWS Serverless - AWS UG Coimbatore
pubudusj
0
57
Smart Doorbell with AWS Serverless - Serverless Summit 21
pubudusj
0
80
Other Decks in Technology
See All in Technology
ルーターでプレゼンする
puhitaku
1
3.3k
IaCジェネレーターとBedrockで詳細設計書を生成してみた
tsukasa_ishimaru
4
900
require(ESM)とECMAScript仕様
uhyo
4
1k
Rustで「プリズモイダル法」を利用して「土量計算」をガチでやる
nokonoko1203
1
310
モーダル間の変換後の一致性とジャンル表を用いた解釈可能性の考察 ~Text-to-MusicとText-To-ImageかつImage-to-Musicを例に~
otanet
0
310
Cypress or Playwright?
rainerhahnekamp
0
170
プロンプトエンジニアリングでがんばらない-Agentic Workflow へ-近藤憲児
kenjikondobai
6
1.2k
よく聞くけど使ったことないソフトウェアNo.1 KafkaとSnowflake
foursue
4
520
一生覚えておきたい「システム開発=コミュニケーション」〜初めての実務案件振り返りLT〜
maimyyym
2
330
生産性向上チームの紹介
cybozuinsideout
PRO
1
930
M&A戦略を支えるデータマネジメント (MIDAS Tech Study #16 GENDA Komiyama)
kommy339
1
120
Microsoft for Startups Founders Hub_20240429 update
daikikanemitsu
1
2.4k
Featured
See All Featured
Building Applications with DynamoDB
mza
88
5.6k
Why Our Code Smells
bkeepers
PRO
331
56k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
221
21k
Scaling GitHub
holman
457
140k
Robots, Beer and Maslow
schacon
PRO
155
7.9k
Why You Should Never Use an ORM
jnunemaker
PRO
51
8.7k
Put a Button on it: Removing Barriers to Going Fast.
kastner
58
3.1k
Building Flexible Design Systems
yeseniaperezcruz
320
37k
Six Lessons from altMBA
skipperchong
22
3k
A Philosophy of Restraint
colly
197
16k
It's Worth the Effort
3n
180
27k
Rebuilding a faster, lazier Slack
samanthasiow
74
8.2k
Transcript
Large scale parallel data processing with Step Functions Distributed Map
SVS204
About Me Pubudu Jayawardana @pubudusj From Amsterdam, the Netherlands Senior
Backend Developer at starred.com AWS Community Builder (Serverless) AWS Certified - SA Pro https://medium.com/@pubudusj https://pubudu.dev https://dev.to/pubudusj
AWS Community Builders
Step Functions Distributed Map
▪ To iterate over an array ▪ Limitations • 40
parallel iterations at a time • Max payload size - 256KB • Execution history - 25,000 events Map State
▪ Totally separated child executions • 25,000 events each •
10,000 executions at a time • S3 as a source ▪ Result output to S3 ▪ Only applicable for Standard flows Distributed Map
Distributed Map
▪ Source types: • S3 object list • JSON file
in S3 • CSV file in S3 • S3 manifest file ▪ Limit no of items ▪ ItemSelector Source
▪ Batching based on: • No of items • Size
▪ Modify input with Batch input Item Batching
▪ Concurrency limit ▪ Child execution types: • Standard •
Express ▪ Error threshold: • Percentage • No of items Runtime Settings
▪ S3 location ▪ Logs • manifest.json • SUCCEEDED_n.json •
FAILED_n.json • PENDING_n.json Export Result
Execution Details - Parent Event Log
Execution Details - Map Run
Execution Details - Single Child Execution
Process
▪ SAAS application to measure candidate experience ▪ Send surveys
▪ Record the feedback ▪ Visualize in a dashboard (benchmark, filter, comparison) ▪ Transform / Enrich data Process
Source from MySQL Transform Save to S3 Load to Postgres
Hourly ETL
▪ Amazon Managed Workflows for Apache Airflow ▪ Amazon EMR
Hourly ETL
None
Problem
▪ Less visibility ▪ Cannot retry single table load ▪
Takes avg 20 minutes ▪ EC2 cost Data Load Step
Solution
None
None
Demo
▪ Reduced time to avg 5 minutes ▪ Load data
parallelly ▪ Better insights ▪ Retry individual table data load ▪ Cost effective Benefits
▪ Use batching ▪ Set concurrency ▪ Set error threshold
▪ Use express child executions Tips / Lesson Learned
https://bit.ly/s3-to-postgres Read more about this
Thank You! @pubudusj