Slide 1

Slide 1 text

Building And Automating Serverless Auto-Scaling Data Pipelines In AWS Damien Jones (he/him) AWS Consultant @ Steamhaus 2024-09-26 AWS Community Summit Manchester @amazonwebshark MrDamienJones amazonwebshark.com [email protected] MrDamienJones https://www.flaticon.com/ @amazonwebshark

Slide 2

Slide 2 text

Here For Data & Analytics?

Slide 3

Slide 3 text

Here For Development & Operations?

Slide 4

Slide 4 text

Here For The Free Stuff?

Slide 5

Slide 5 text

Damien Jones Consultant @ Steamhaus Using AWS since 2019 Creator @ amazonwebshark.com Runner; Keen Gardener; Dog Dad He/Him Manchester UK Fin Fan

Slide 6

Slide 6 text

Agenda Problem Definition Solution Architecture Demos Summary & Questions github.com/MrDamienJones /Community-Sessions

Slide 7

Slide 7 text

The 4 Vs Of Big Data Characteristics of Big Data… …and events… …and API requests… …metrics …traces …logs ...

Slide 8

Slide 8 text

Variety “The state of being diverse or varied.”

Slide 9

Slide 9 text

Variety “The state of being diverse or varied.” Structure Intent Sensitivity

Slide 10

Slide 10 text

Velocity “The speed at which something is moving in a given direction.”

Slide 11

Slide 11 text

Velocity “The speed at which something is moving in a given direction.” Streaming or Batch Synchronous or Asynchronous Scheduling

Slide 12

Slide 12 text

Veracity “The quality of being true or the habit of telling the truth.”

Slide 13

Slide 13 text

Veracity “The quality of being true or the habit of telling the truth.” External Security Validation & Health Internal Security

Slide 14

Slide 14 text

Volume “The amount of space occupied.”

Slide 15

Slide 15 text

Volume “The amount of space occupied.” Size or Amount Access Patterns Backups

Slide 16

Slide 16 text

Value “The importance, worth or usefulness.”

Slide 17

Slide 17 text

Value Return On Investment Reconciliation Liquidity “The importance, worth or usefulness.”

Slide 18

Slide 18 text

Building And Automating Serverless Auto-Scaling Data Pipelines In AWS

Slide 19

Slide 19 text

Building And Automating Serverless Auto-Scaling Data Pipelines In AWS

Slide 20

Slide 20 text

Building And Automating Serverless Auto-Scaling Data Pipelines In AWS AWS Lambda Amazon S3

Slide 21

Slide 21 text

AWS Lambda Serverless compute service Supports multiple languages Auto-scales on demand Up to 1000 concurrent executions

Slide 22

Slide 22 text

Amazon S3 Serverless object storage Store anything for any reason 1000s of requests per second Object protection & integrity checks

Slide 23

Slide 23 text

Amazon S3 Auditing with Inventories Map value with Lifecycles Monetise datasets with Data Exchange

Slide 24

Slide 24 text

Building And Automating Serverless Auto-Scaling Data Pipelines In AWS

Slide 25

Slide 25 text

Building And Automating Serverless Auto-Scaling Data Pipelines In AWS Amazon Athena AWS Glue

Slide 26

Slide 26 text

Amazon Athena Serverless interactive query service Query & access controls Read-Only & Open Table support Create derived tables

Slide 27

Slide 27 text

AWS Glue Fully managed serverless ETL service Crawlers discover data automatically Data Catalog indexes data assets Up to 2000 concurrent ETL job runs

Slide 28

Slide 28 text

AWS Glue Add value with ETL jobs Add efficiencies with ETL jobs Prevent downtime with Data Quality checks Prove value with Data Quality scores

Slide 29

Slide 29 text

AWS Glue Data Quality Demo Summary Data Quality Chart Recent Run Run History

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

Building And Automating Serverless Auto-Scaling Data Pipelines In AWS

Slide 32

Slide 32 text

Building And Automating Serverless Auto-Scaling Data Pipelines In AWS Amazon EventBridge Scheduler AWS Step Functions

Slide 33

Slide 33 text

AWS Step Functions Serverless task orchestration Invoke over 220 AWS services / 10k API calls Design workflows visually and as code Standard & Express workflows

Slide 34

Slide 34 text

No content

Slide 35

Slide 35 text

No content

Slide 36

Slide 36 text

No content

Slide 37

Slide 37 text

No content

Slide 38

Slide 38 text

AWS Step Functions Demo Lambda Function: API Call Glue Job: ETL Athena Query: MSCK REPAIR TABLE

Slide 39

Slide 39 text

No content

Slide 40

Slide 40 text

Amazon EventBridge Scheduler Automate recurring & one-off tasks Invoke over 220 AWS services Set times or fixed-rate schedules Checks target response

Slide 41

Slide 41 text

Amazon EventBridge Scheduler Demo Set schedule Link Step Function workflow Set configuration

Slide 42

Slide 42 text

No content

Slide 43

Slide 43 text

Build an AWS CodeBuild Project Description Build a CodeBuild project and send a notification based on the test results. Documentation Link https://docs.aws.amazon.com/st ep-functions/latest/dg/sample- project-codebuild.html Services CodeBuild, SNS

Slide 44

Slide 44 text

Tune a machine learning model Description Tune hyperparameters of a machine learning model and batch transform a test dataset. Documentation Link https://docs.aws.amazon.com/step- functions/latest/dg/sample-hyper- tuning.html Services Lambda, S3, SageMaker

Slide 45

Slide 45 text

Summary Problem Definition Solution Architecture Demos Summary & Questions github.com/MrDamienJones /Community-Sessions

Slide 46

Slide 46 text

Thanks! github.com/MrDamienJones /Community-Sessions @amazonwebshark MrDamienJones amazonwebshark.com [email protected] MrDamienJones @amazonwebshark