Slide 1

Slide 1 text

Building And Automating Serverless Auto-Scaling Data Pipelines In AWS Damien Jones (he/him) Data Engineer 2024-04-24 AWS Summit London @amazonwebshark MrDamienJones amazonwebshark.com [email protected] MrDamienJones https://www.flaticon.com/

Slide 2

Slide 2 text

Here For Data & Analytics?

Slide 3

Slide 3 text

Here For Development & Operations?

Slide 4

Slide 4 text

Here For Free Stuff?

Slide 5

Slide 5 text

Damien Jones Data Engineer Using AWS since 2019 Creator @ amazonwebshark.com Runner; Keen Gardener; Dog Dad He/Him Manchester UK Fin Fan

Slide 6

Slide 6 text

Agenda Problem Definition Solution Architecture Demo Summary & Questions github.com/MrDamienJones /Community-Sessions

Slide 7

Slide 7 text

The 4 Vs Of Big Data Characteristics of Big Data… …and events… …and API requests… …metrics …traces …logs ...

Slide 8

Slide 8 text

Variety “The state of being diverse or varied.”

Slide 9

Slide 9 text

Variety “The state of being diverse or varied.” Structure Purpose Sensitivity

Slide 10

Slide 10 text

Velocity “The speed at which something is moving in a given direction.”

Slide 11

Slide 11 text

Velocity “The speed at which something is moving in a given direction.” Streaming or Batch Synchronous or Asynchronous Scheduling

Slide 12

Slide 12 text

Veracity “The quality of being true or the habit of telling the truth.”

Slide 13

Slide 13 text

Veracity “The quality of being true or the habit of telling the truth.” External Security Validation & Health Internal Security

Slide 14

Slide 14 text

Volume “The amount of space occupied.”

Slide 15

Slide 15 text

Volume “The amount of space occupied.” Size or Amount Access Patterns Backups

Slide 16

Slide 16 text

Building And Automating Serverless Auto-Scaling Data Pipelines In AWS

Slide 17

Slide 17 text

Building And Automating Serverless Auto-Scaling Data Pipelines In AWS

Slide 18

Slide 18 text

Building And Automating Serverless Auto-Scaling Data Pipelines In AWS AWS Lambda Amazon S3

Slide 19

Slide 19 text

AWS Lambda Serverless compute service Supports multiple languages Auto-scales on demand Up to 1000 concurrent executions

Slide 20

Slide 20 text

Amazon S3 Serverless object storage Store anything for any reason 1000s of requests per second Object protection & integrity checks

Slide 21

Slide 21 text

Building And Automating Serverless Auto-Scaling Data Pipelines In AWS

Slide 22

Slide 22 text

Building And Automating Serverless Auto-Scaling Data Pipelines In AWS AWS Glue Amazon Athena

Slide 23

Slide 23 text

AWS Glue Fully managed serverless ETL service Crawlers discover data automatically Up to 2000 concurrent ETL job runs ML-backed data quality checks

Slide 24

Slide 24 text

Amazon Athena Serverless interactive query service Analyse Amazon S3 data with standard SQL Source data is read-only Create derived tables

Slide 25

Slide 25 text

Building And Automating Serverless Auto-Scaling Data Pipelines In AWS

Slide 26

Slide 26 text

Building And Automating Serverless Auto-Scaling Data Pipelines In AWS AWS Step Functions Amazon EventBridge Scheduler

Slide 27

Slide 27 text

Amazon EventBridge Scheduler Automate recurring & one-off tasks Invoke over 220 AWS services Set times or fixed-rate schedules Checks target response

Slide 28

Slide 28 text

AWS Step Functions Serverless task orchestration Invoke over 220 AWS services Design workflows visually and as code Standard & Express workflows

Slide 29

Slide 29 text

Demo

Slide 30

Slide 30 text

Build an AWS CodeBuild Project Description Build a CodeBuild project and send a notification based on the test results. Documentation Link https://docs.aws.amazon.com/st ep-functions/latest/dg/sample- project-codebuild.html Services CodeBuild, SNS

Slide 31

Slide 31 text

Tune a machine learning model Description Tune hyperparameters of a machine learning model and batch transform a test dataset. Documentation Link https://docs.aws.amazon.com/step- functions/latest/dg/sample-hyper- tuning.html Services Lambda, S3, SageMaker

Slide 32

Slide 32 text

Summary Problem Definition Solution Architecture Demo Summary & Questions github.com/MrDamienJones /Community-Sessions

Slide 33

Slide 33 text

Thanks! github.com/MrDamienJones /Community-Sessions @amazonwebshark MrDamienJones amazonwebshark.com [email protected] MrDamienJones