Upgrade to Pro — share decks privately, control downloads, hide ads and more …

AWS Summit Amsterdam 2023 - SVS204

AWS Summit Amsterdam 2023 - SVS204

Large scale parallel data processing with AWS Step Functions Distributed Maps

Pubudu

June 18, 2023
Tweet

More Decks by Pubudu

Other Decks in Technology

Transcript

  1. About Me Pubudu Jayawardana @pubudusj From Amsterdam, the Netherlands Senior

    Backend Developer at starred.com AWS Community Builder (Serverless) AWS Certified - SA Pro https://medium.com/@pubudusj https://pubudu.dev https://dev.to/pubudusj
  2. ▪ To iterate over an array ▪ Limitations • 40

    parallel iterations at a time • Max payload size - 256KB • Execution history - 25,000 events Map State
  3. ▪ Totally separated child executions • 25,000 events each •

    10,000 executions at a time • S3 as a source ▪ Result output to S3 ▪ Only applicable for Standard flows Distributed Map
  4. ▪ Source types: • S3 object list • JSON file

    in S3 • CSV file in S3 • S3 manifest file ▪ Limit no of items ▪ ItemSelector Source
  5. ▪ Batching based on: • No of items • Size

    ▪ Modify input with Batch input Item Batching
  6. ▪ Concurrency limit ▪ Child execution types: • Standard •

    Express ▪ Error threshold: • Percentage • No of items Runtime Settings
  7. ▪ S3 location ▪ Logs • manifest.json • SUCCEEDED_n.json •

    FAILED_n.json • PENDING_n.json Export Result
  8. ▪ SAAS application to measure candidate experience ▪ Send surveys

    ▪ Record the feedback ▪ Visualize in a dashboard (benchmark, filter, comparison) ▪ Transform / Enrich data Process
  9. ▪ Less visibility ▪ Cannot retry single table load ▪

    Takes avg 20 minutes ▪ EC2 cost Data Load Step
  10. ▪ Reduced time to avg 5 minutes ▪ Load data

    parallelly ▪ Better insights ▪ Retry individual table data load ▪ Cost effective Benefits
  11. ▪ Use batching ▪ Set concurrency ▪ Set error threshold

    ▪ Use express child executions Tips / Lesson Learned