Slide 1

Slide 1 text

Building And Automating Serverless Auto-Scaling Data Pipelines In AWS Damien Jones (he/him) AWS Consultant @ Steamhaus 2024-11-27 AWS User Group Sheffield @amazonwebshark MrDamienJones amazonwebshark.com [email protected] MrDamienJones https://www.flaticon.com/ @amazonwebshark

Slide 2

Slide 2 text

Agenda Problem Definition Solution Architecture Demos Summary & Questions github.com/MrDamienJones /Community-Sessions

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

Damien Jones Consultant @ Steamhaus Using AWS since 2019 Creator @ amazonwebshark.com Runner; Keen Gardener; Dog Dad He/Him Manchester UK Fin Fan

Slide 5

Slide 5 text

The 4 Vs Of Big Data Characteristics of Big Data… …and events… …and API requests… …metrics …traces …logs ...

Slide 6

Slide 6 text

Variety “The state of being diverse or varied.”

Slide 7

Slide 7 text

Variety “The state of being diverse or varied.” Structure Intent Sensitivity

Slide 8

Slide 8 text

Velocity “The speed at which something is moving in a given direction.”

Slide 9

Slide 9 text

Velocity “The speed at which something is moving in a given direction.” Streaming or Batch Synchronous or Asynchronous Scheduling

Slide 10

Slide 10 text

Veracity “The quality of being true or the habit of telling the truth.”

Slide 11

Slide 11 text

Veracity “The quality of being true or the habit of telling the truth.” External Security Validation & Health Internal Security

Slide 12

Slide 12 text

Volume “The amount of space occupied.”

Slide 13

Slide 13 text

Volume “The amount of space occupied.” Size or Amount Storage Options Backups

Slide 14

Slide 14 text

Volume Veracity Variety Velocity

Slide 15

Slide 15 text

Building And Automating Serverless Auto-Scaling Data Pipelines In AWS

Slide 16

Slide 16 text

Building And Automating Serverless Auto-Scaling Data Pipelines In AWS

Slide 17

Slide 17 text

Building And Automating Serverless Auto-Scaling Data Pipelines In AWS AWS Lambda Amazon S3

Slide 18

Slide 18 text

AWS Lambda Serverless compute service Supports multiple languages Auto-scales on demand Up to 1000 concurrent executions

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

No content

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

Amazon S3 Serverless object storage Store anything for any reason 1000s of requests per second Object protection & integrity checks

Slide 23

Slide 23 text

No content

Slide 24

Slide 24 text

S3  1 million buckets

Slide 25

Slide 25 text

Building And Automating Serverless Auto-Scaling Data Pipelines In AWS

Slide 26

Slide 26 text

Building And Automating Serverless Auto-Scaling Data Pipelines In AWS AWS Glue Amazon Athena

Slide 27

Slide 27 text

AWS Glue Fully managed serverless ETL service Crawlers discover data automatically Up to 2000 concurrent ETL job runs Data Catalog indexes data assets

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

Amazon Athena Serverless interactive query service Query & access controls Read-Only & Open Table support Reads Variety Of Objects

Slide 30

Slide 30 text

Wolfie Says Hi

Slide 31

Slide 31 text

Building And Automating Serverless Auto-Scaling Data Pipelines In AWS

Slide 32

Slide 32 text

Building And Automating Serverless Auto-Scaling Data Pipelines In AWS Amazon EventBridge Scheduler AWS Step Functions

Slide 33

Slide 33 text

AWS Step Functions Serverless task orchestration Invoke over 220 AWS services / 10k API calls Standard & Express workflows Design workflows visually and as code

Slide 34

Slide 34 text

No content

Slide 35

Slide 35 text

No content

Slide 36

Slide 36 text

No content

Slide 37

Slide 37 text

No content

Slide 38

Slide 38 text

No content

Slide 39

Slide 39 text

AWS Step Functions Demo Lambda Function: API Call Glue Job: ETL Athena Query: MSCK REPAIR TABLE

Slide 40

Slide 40 text

No content

Slide 41

Slide 41 text

AWS Step Functions Alternatives Lambda function: + Cheaper - Less observability

Slide 42

Slide 42 text

AWS Step Functions Alternatives Glue Workflow: + Free! - Glue resources only

Slide 43

Slide 43 text

AWS Step Functions Alternatives Managed Airflow: + Customisable - Complexity

Slide 44

Slide 44 text

Amazon EventBridge Scheduler Automate recurring & one-off tasks Invoke over 220 AWS services Set times or fixed-rate schedules Checks target response

Slide 45

Slide 45 text

Amazon EventBridge Scheduler Demo Set schedule Link Step Function workflow Set configuration

Slide 46

Slide 46 text

No content

Slide 47

Slide 47 text

Summary Problem Definition Solution Architecture Demos Summary & Questions github.com/MrDamienJones /Community-Sessions

Slide 48

Slide 48 text

Thanks! github.com/MrDamienJones /Community-Sessions @amazonwebshark MrDamienJones amazonwebshark.com [email protected] MrDamienJones @amazonwebshark