Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building And Automating Serverless Auto-Scaling...

Building And Automating Serverless Auto-Scaling Data Pipelines In AWS (2024-11-20: AWS User Group Manchester)

The modern data professional navigates a dynamic data landscape, handling high-velocity raw data at ever-changing volumes. In this 30-minute intermediate session, I demonstrate a fully serverless auto-scaling data pipeline using AWS services.

This session includes:

- Getting and storing API data with AWS Lambda and Amazon S3.
- Transforming the API data with AWS Glue & Amazon Athena.
- Pipeline automation and orchestration with AWS Step Functions and Amazon EventBridge.

Ideal for Data, DevOps, and Architecture professionals, this session offers practical insights into building efficient serverless data pipelines. Join to enhance your skills and explore the latest in Data Engineering and AWS.

Resources at https://github.com/MrDamienJones/Community-Sessions/tree/main/BuildingAndAutomatingServerlessAutoScalingDataPipelinesInAWS

Damien Jones

November 20, 2024
Tweet

More Decks by Damien Jones

Other Decks in Technology

Transcript

  1. Building And Automating Serverless Auto-Scaling Data Pipelines In AWS Damien

    Jones (he/him) AWS Consultant @ Steamhaus 2024-11-20 AWS User Group Manchester @amazonwebshark MrDamienJones amazonwebshark.com [email protected] MrDamienJones https://www.flaticon.com/ @amazonwebshark
  2. Damien Jones Consultant @ Steamhaus Using AWS since 2019 Creator

    @ amazonwebshark.com Runner; Keen Gardener; Dog Dad He/Him Manchester UK Fin Fan
  3. The 4 Vs Of Big Data Characteristics of Big Data…

    …and events… …and API requests… …metrics …traces …logs ...
  4. Velocity “The speed at which something is moving in a

    given direction.” Streaming or Batch Synchronous or Asynchronous Scheduling
  5. Veracity “The quality of being true or the habit of

    telling the truth.” External Security Validation & Health Internal Security
  6. Amazon S3 Serverless object storage Store anything for any reason

    1000s of requests per second Object protection & integrity checks
  7. Amazon Athena Serverless interactive query service Query & access controls

    Create derived tables Read-Only & Open Table support
  8. AWS Glue Fully managed serverless ETL service Crawlers discover data

    automatically Up to 2000 concurrent ETL job runs Data Catalog indexes data assets
  9. AWS Step Functions Serverless task orchestration Invoke over 220 AWS

    services / 10k API calls Standard & Express workflows Design workflows visually and as code
  10. Amazon EventBridge Scheduler Automate recurring & one-off tasks Invoke over

    220 AWS services Set times or fixed-rate schedules Checks target response