Slide 1

Slide 1 text

Big data and serverless Marek Kuczynski Sr. Solutions Architect – Startups @marekq [email protected] A W S U s e r G r o u p N e t h e r l a n d s M e e t u p

Slide 2

Slide 2 text

Various choices for compute on AWS Amazon EC2 Virtual server instances in the cloud Amazon ECS, EKS, and Fargate Container management service for running Docker on a managed cluster of EC2 AWS Lambda Serverless compute for stateless code execution in response to triggers

Slide 3

Slide 3 text

Event based architectures SERVICES (ANYTHING) Changes in data state Requests to endpoints Changes in resource state EVENT SOURCE FUNCTION Node.js Python Java C# Go Ruby PowerShell Bring your own runtime

Slide 4

Slide 4 text

Common Lambda use cases Web Applications • Static websites • Complex web apps • Packages for Flask and Express Data Processing • Real time • MapReduce • Batch Chatbots • Powering chatbot logic Backends • Apps & services • Mobile • IoT > > Amazon Alexa • Powering voice-enabled apps • Alexa Skills Kit IT Automation • Policy engines • Extending AWS services • Infrastructure management

Slide 5

Slide 5 text

AWS serverless portfolio COMPUTE AND DATASTORES AWS Lambda AWS Fargate Amazon API Gateway Amazon SNS Amazon MQ Amazon SQS AWS Step Functions APPLICATION INTEGRATION DEVELOPER TOOLS SECURITY AND ADMINISTRATION Amazon Aurora Serverless Amazon S3 Amazon DynamoDB AWS AppSync AWS IAM Amazon Cognito Amazon Inspector Amazon VPC Amazon GuardDuty AWS CloudFormation AWS Cloud9 AWS CloudTrail Amazon CloudWatch AWS X-Ray AWS CodePipeline AWS Config AWS SSO AWS Shield AWS WAF Amazon Kinesis AWS Serverless Application Repository

Slide 6

Slide 6 text

Serverless is a spectrum More operations Less operations

Slide 7

Slide 7 text

Build well architected • Scalability • Is scalability seamless, semi-automatic or a manual process? • Resilience • To what degree can we (automatically) recover from issues on infrastructure? • Cost • Can we control cost based on pricing per operation/invocation? • Maintenance and operations • How much OS/software maintenance will be needed going forward? • Security • How do I keep infrastructure secure and handle authentication/authorization?

Slide 8

Slide 8 text

A serverless, three tier application Data stored in DynamoDB Dynamic content in AWS Lambda Amazon API Gateway Browser Amazon CloudFront Amazon S3 Amazon Cognito

Slide 9

Slide 9 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. My Demo of a serverless blog – https://marek.rocks

Slide 10

Slide 10 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. My Demo of a serverless blog – https://marek.rocks

Slide 11

Slide 11 text

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Monthly costs of running the blog The website has been running stable for 3+ years with a few hundred visitors every month. • Route53 hosted zone $0,50 • Lambda function cost $0,30 • DynamoDB costs $0,20 • API Gateway costs $0,10 • Email costs $0,02 • Domain name $1 No maintenance (patching, scaling, backups) is required. TCO is at least 10 x cheaper than running this on EC2.

Slide 12

Slide 12 text

Building and orchestrating a serverless data

Slide 13

Slide 13 text

AWS solutions to build a serverless data lake Amazon S3 bucket(s) Amazon ES AWS Glue Amazon DynamoDB Catalog & search AWS Key Management Service (AWS KMS) AWS CloudTrail IAM Amazon Macie Security & auditing Amazon Cognito Amazon API Gateway IAM API/UI Amazon Athena Amazon QuickSight Aurora Serverless or Redshift Analytics & processing AWS Glue AWS Lambda Amazon Kinesis Data Streams Amazon Kinesis Data Firehose AWS Direct Connect Ingest

Slide 14

Slide 14 text

Ingestion using Kinesis Amazon CloudWatch: Delivery metrics Amazon S3: Buffered files Kinesis Agent Record producers Amazon Redshift or Aurora: Table loads Amazon Elasticsearch Service: Domain loads Amazon S3: Source record backup AWS Lambda: Transformations & enrichment Amazon DynamoDB: Lookup tables Raw records Lookup Transformed records Transformed records Raw records Kinesis Data Firehose: Delivery stream

Slide 15

Slide 15 text

Architectures patterns to push or pull data S3 bucket object Lambda function 1. File put into bucket 2. Lambda invoked Lambda function 2. Lambda invoked SNS topic 1. Data published to a topic Data 1. Message inserted into to a queue message Amazon SQS Lambda function 3. Function removes message from queue 2. Lambda polls queue and invokes function

Slide 16

Slide 16 text

Recent launch; richer workflows using Step Functions Simplify building workloads such as order processing, report generation, and data analysis Add services in minutes Write and maintain less code AWS Step Functions AWS Lambda Amazon ECS AWS Fargate AWS Batch Amazon SageMaker AWS Glue Amazon DynamoDB Amazon SNS Amazon SQS

Slide 17

Slide 17 text

Simpler integration, less code With serverless polling With new service integration AWS Lambda functions No Lambda functions

Slide 18

Slide 18 text

Serverless data lakes - Analytics

Slide 19

Slide 19 text

Analytics Various choices for analytics of your data • S3 Select on CSV, JSON and Apache Parquet objects • Amazon Athena • AWS Lambda • Predictions with Amazon SageMaker • Amazon EMR • AWS Glue (ETL) Analytics & processing

Slide 20

Slide 20 text

S3 Select – selecting fields from individual files

Slide 21

Slide 21 text

S3 Select – selecting fields from individual files

Slide 22

Slide 22 text

Athena – running a query on files in S3 buckets 44.66 seconds...Data scanned: 169.53GB Cost: $5/TB or $0.005/GB = $0.85 SELECT custid, year, sum(count) FROM sales WHERE custid = ‘157231’ GROUP BY gram, year ORDER BY year ASC; Analytics & processing

Slide 23

Slide 23 text

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Run a data science framework in Lambda • pandas • SciPy • NumPy • matplotlib

Slide 24

Slide 24 text

Just released : S3 Batch Operations Amazon S3 Lambda Function Lambda Function Lambda Function Lambda Function Lambda Function Lambda Function Lambda Function Lambda Function Lambda Function This new feature can; • Modify the ACL’s or tags of objects on S3 at scale. • Copy objects to a new bucket while preserving properties. • Let Lambda (re)process all your files stored on S3. AWS takes care of running the operations, even if your bucket has billions of objects.

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

Why relational and not NoSQL? Sometimes it’s not possible to use a NoSQL database; • You need to integrate with other backend applications that run on a relational database (i.e. WordPress) or are hard to modify. • You need access to complex queries that are harder to do with NoSQL (i.e. multiple joins, fuzzy searches). • There may be other database features that your application requires (logging, ACID compliance).

Slide 27

Slide 27 text

How does Serverless Aurora work? Availability zone 1 Region App on EC2 or Lambda Shared distributed storage volume Multi-tenant proxy layer Warm-pool of Aurora instances Monitoring service

Slide 28

Slide 28 text

Introducing Amazon Relational Database Service Data API • Simple web service protocol for database access • SQL statements packaged as HTTP requests • Access your database from Lambda and AppSync • Access your database from the AWS SDK & CLI Data API Service Aurora Serverless

Slide 29

Slide 29 text

Introducing RDS Console Query Editor • Access your database from AWS Management Console • No database client application or terminal required • The same requests can be made using the AWS SDK or CLI.

Slide 30

Slide 30 text

A serverless, relational three tier application Data stored in Aurora serverless Dynamic content in AWS Lambda Amazon API Gateway Browser Amazon CloudFront Amazon S3 Amazon Cognito

Slide 31

Slide 31 text

No content

Slide 32

Slide 32 text

Search and Data Catalog • Use DynamoDB as a metadata repository • Optionally use Amazon ElasticSearch for more complex queries AWS Lambda Metadata Index (DynamoDB) Search Index (Amazon ES) ObjectCreated ObjectDeleted PutItem Update Index S3 Bucket https://aws.amazon.com/answers/big-data/data-lake-solution/ Catalog & Search

Slide 33

Slide 33 text

AWS Glue Crawlers AWS Glue Data Catalog Amazon QuickSight Amazon Redshift Spectrum Amazon Athena S3 Bucket(s) Catalog & Search Use Glue Crawlers to build a data catalogue

Slide 34

Slide 34 text

AWS Lake Formation (in preview) Build, secure, and manage data lakes, reducing the set up time from months to days

Slide 35

Slide 35 text

No content

Slide 36

Slide 36 text

AWS CodeStar • Quickly develop, build, and deploy applications on AWS • Start developing on AWS in minutes • Work across your team, securely • Manage software delivery easily • Choose from a variety of project templates

Slide 37

Slide 37 text

AWS CodeStar Project templates for EC2, AWS Lambda, and Elastic Beanstalk

Slide 38

Slide 38 text

Services deployed for you when using CodeStar Source Build Test Deploy Monitor AWS CodeBuild + Third Party AWS CodeCommit AWS CodeBuild AWS CodeDeploy AWS CodePipeline AWS X-Ray Amazon CloudWatch

Slide 39

Slide 39 text

<-THIS BECOMES THIS-> SAM Template

Slide 40

Slide 40 text

Use AWS X-Ray to debug functions

Slide 41

Slide 41 text

No content

Slide 42

Slide 42 text

Further reading and events Well Architected Lens for serverless https://d1.awsstatic.com/whitepapers/architecture/AWS-Serverless- Applications-Lens.pdf Serverless Application Repository https://serverlessrepo.aws.amazon.com/ Free developer event - AWS DevDay on June 19th in Utrecht https://aws.amazon.com/events/Devdays-Utrecht/

Slide 43

Slide 43 text

No server is easier to manage than "no server.” Werner Vogels—Amazon CTO

Slide 44

Slide 44 text

Thank you! Marek Kuczynski Sr. Solutions Architect - startups @marekq [email protected]

Slide 45

Slide 45 text

AWS Amplify – serverless mobile development

Slide 46

Slide 46 text

Bootstrap the runtime Start your code Lambda: The execution lifecycle Cold start Warm start Download your code Start new container Time Learn more about Lambda under the hood on YouTube; - AWS re:Invent 2018: A Serverless Journey: AWS Lambda Under the Hood (SRV409) https://www.youtube.com/watch?v=QdzV04T_kec

Slide 47

Slide 47 text

No content

Slide 48

Slide 48 text

What drives our priorities? Excelling in service fundamentals Availability, latency, security, scale and associated controls Enabling new application development patterns New patterns through events, workflows, functions, and APIs Minimizing undifferentiated code Eliminating duplicate code, increase reuse Empowering serverless developers and operations Meet developers and operations where they are, lead them where they need to be

Slide 49

Slide 49 text

Service fundamentals: 2018 recap Enhanced compliance regime ü FEDRAMP for API Gateway and Lambda ü HIPAA for Step Functions, Serverless Application Repository ü GDPR for all services Scale, availability, and pricing improvements ü AWS Lambda SLA (99.95%), API Gateway SLA (99.95%) ü API Gateway tiered pricing (as low as $1.51/million) ü Increased Step Functions throughput (1,000 transitions/sec) aws.amazon.com/compliance aws.amazon.com/lambda/sla - aws.amazon.com/api-gateway/sla

Slide 50

Slide 50 text

Host KVM MicroVM (Guest OS & Container Workload) Firecracker RESTful API Networks Storage Rate Limiting Metadata Service • Firecracker microVMs have the same security as KVM VMs • Designed for low overhead, high density, and fast start times • Built-in fair sharing Firecracker Architecture and Benefits

Slide 51

Slide 51 text

Firecracker benefits Security Lightweight container encapsulated with VM barrier and strong process isolation Greater efficiency Speed by design Accelerates kernel loading to reduce cold start times (150 microVMs/second) More processes can be run per instance and more efficient use of compute resources. firecracker-microvm.io

Slide 52

Slide 52 text

Launch: ALB integration with Lambda Enables easier transition from existing compute stacks Mix and match compute options to build your backends Robust load balancer controls (health checks, programmable rules engine, traffic shaping) Amazon ALB Amazon EC2 AWS Fargate AWS Lambda

Slide 53

Slide 53 text

Launch: serverless websockets Build real-time two-way communication apps (chat, dashboards, etc.) Handle connections and messages transfer between users and backend services

Slide 54

Slide 54 text

Launch: Custom Runtimes Bring any Linux-compatible language runtime Powered by new Runtime API Already powering Ruby support More runtimes from partners (PHP and Erlang) Runtimes distributed as “layers” Layers Rule Stack

Slide 55

Slide 55 text

Launch: Lambda Layers Easily share code Upload layer once, reference within any function Promote separation of responsibilities Built-in support for secure sharing by ecosystem

Slide 56

Slide 56 text

Launch: cross toolchain app view Application Views on the Lambda Console - view and monitor all resources of the same app New IDE toolkits for IntelliJ, PyCharm, and VS Code

Slide 57

Slide 57 text

AWS Amplify – serverless mobile development

Slide 58

Slide 58 text

Just released : S3 Batch Operations Amazon S3 Lambda Function Lambda Function Lambda Function Lambda Function Lambda Function Lambda Function Lambda Function Lambda Function Lambda Function

Slide 59

Slide 59 text

Serverless enables: Greater agility Less overhead Better focus Increased scale More flexibility Faster time to market

Slide 60

Slide 60 text

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. https://secure.flickr.com/photos/mgifford/4525333972 Why are we here today?

Slide 61

Slide 61 text

Lambda and Fargate use Firecracker Security Lightweight container encapsulated with VM barrier and strong process isolation Greater efficiency Speed by design Accelerates kernel loading to reduce cold start times (150 microVMs/second) More processes can be run per instance and more efficient use of compute resources. firecracker-microvm.io

Slide 62

Slide 62 text

Host KVM MicroVM (Guest OS & Container Workload) Firecracker RESTful API Networks Storage Rate Limiting Metadata Service • Firecracker microVMs have the same security as KVM VMs • Designed for low overhead, high density, and fast start times • Built-in fair sharing Firecracker Architecture and Benefits