Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Big data and serverless - AWS User Group NL

Big data and serverless - AWS User Group NL

This presentation was given at the DAZN office in Amsterdam on the 3rd of June 2019. More information about the meetup can be found here; https://awsug.nl/events/2019/06/03/dealing-with-bigdata-on-serverless/

Marek Kuczynski

June 03, 2019
Tweet

More Decks by Marek Kuczynski

Other Decks in Technology

Transcript

  1. Big data and serverless Marek Kuczynski Sr. Solutions Architect –

    Startups @marekq [email protected] A W S U s e r G r o u p N e t h e r l a n d s M e e t u p
  2. Various choices for compute on AWS Amazon EC2 Virtual server

    instances in the cloud Amazon ECS, EKS, and Fargate Container management service for running Docker on a managed cluster of EC2 AWS Lambda Serverless compute for stateless code execution in response to triggers
  3. Event based architectures SERVICES (ANYTHING) Changes in data state Requests

    to endpoints Changes in resource state EVENT SOURCE FUNCTION Node.js Python Java C# Go Ruby PowerShell Bring your own runtime
  4. Common Lambda use cases Web Applications • Static websites •

    Complex web apps • Packages for Flask and Express Data Processing • Real time • MapReduce • Batch Chatbots • Powering chatbot logic Backends • Apps & services • Mobile • IoT </> </> Amazon Alexa • Powering voice-enabled apps • Alexa Skills Kit IT Automation • Policy engines • Extending AWS services • Infrastructure management
  5. AWS serverless portfolio COMPUTE AND DATASTORES AWS Lambda AWS Fargate

    Amazon API Gateway Amazon SNS Amazon MQ Amazon SQS AWS Step Functions APPLICATION INTEGRATION DEVELOPER TOOLS SECURITY AND ADMINISTRATION Amazon Aurora Serverless Amazon S3 Amazon DynamoDB AWS AppSync AWS IAM Amazon Cognito Amazon Inspector Amazon VPC Amazon GuardDuty AWS CloudFormation AWS Cloud9 AWS CloudTrail Amazon CloudWatch AWS X-Ray AWS CodePipeline AWS Config AWS SSO AWS Shield AWS WAF Amazon Kinesis AWS Serverless Application Repository
  6. Build well architected • Scalability • Is scalability seamless, semi-automatic

    or a manual process? • Resilience • To what degree can we (automatically) recover from issues on infrastructure? • Cost • Can we control cost based on pricing per operation/invocation? • Maintenance and operations • How much OS/software maintenance will be needed going forward? • Security • How do I keep infrastructure secure and handle authentication/authorization?
  7. A serverless, three tier application Data stored in DynamoDB Dynamic

    content in AWS Lambda Amazon API Gateway Browser Amazon CloudFront Amazon S3 Amazon Cognito
  8. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. My Demo of a serverless blog – https://marek.rocks
  9. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. My Demo of a serverless blog – https://marek.rocks
  10. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Monthly costs of running the blog The website has been running stable for 3+ years with a few hundred visitors every month. • Route53 hosted zone $0,50 • Lambda function cost $0,30 • DynamoDB costs $0,20 • API Gateway costs $0,10 • Email costs $0,02 • Domain name $1 No maintenance (patching, scaling, backups) is required. TCO is at least 10 x cheaper than running this on EC2.
  11. AWS solutions to build a serverless data lake Amazon S3

    bucket(s) Amazon ES AWS Glue Amazon DynamoDB Catalog & search AWS Key Management Service (AWS KMS) AWS CloudTrail IAM Amazon Macie Security & auditing Amazon Cognito Amazon API Gateway IAM API/UI Amazon Athena Amazon QuickSight Aurora Serverless or Redshift Analytics & processing AWS Glue AWS Lambda Amazon Kinesis Data Streams Amazon Kinesis Data Firehose AWS Direct Connect Ingest
  12. Ingestion using Kinesis Amazon CloudWatch: Delivery metrics Amazon S3: Buffered

    files Kinesis Agent Record producers Amazon Redshift or Aurora: Table loads Amazon Elasticsearch Service: Domain loads Amazon S3: Source record backup AWS Lambda: Transformations & enrichment Amazon DynamoDB: Lookup tables Raw records Lookup Transformed records Transformed records Raw records Kinesis Data Firehose: Delivery stream
  13. Architectures patterns to push or pull data S3 bucket object

    Lambda function 1. File put into bucket 2. Lambda invoked Lambda function 2. Lambda invoked SNS topic 1. Data published to a topic Data 1. Message inserted into to a queue message Amazon SQS Lambda function 3. Function removes message from queue 2. Lambda polls queue and invokes function
  14. Recent launch; richer workflows using Step Functions Simplify building workloads

    such as order processing, report generation, and data analysis Add services in minutes Write and maintain less code AWS Step Functions AWS Lambda Amazon ECS AWS Fargate AWS Batch Amazon SageMaker AWS Glue Amazon DynamoDB Amazon SNS Amazon SQS
  15. Simpler integration, less code With serverless polling With new service

    integration AWS Lambda functions No Lambda functions
  16. Analytics Various choices for analytics of your data • S3

    Select on CSV, JSON and Apache Parquet objects • Amazon Athena • AWS Lambda • Predictions with Amazon SageMaker • Amazon EMR • AWS Glue (ETL) Analytics & processing
  17. Athena – running a query on files in S3 buckets

    44.66 seconds...Data scanned: 169.53GB Cost: $5/TB or $0.005/GB = $0.85 SELECT custid, year, sum(count) FROM sales WHERE custid = ‘157231’ GROUP BY gram, year ORDER BY year ASC; Analytics & processing
  18. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Run a data science framework in Lambda • pandas • SciPy • NumPy • matplotlib
  19. Just released : S3 Batch Operations Amazon S3 Lambda Function

    Lambda Function Lambda Function Lambda Function Lambda Function Lambda Function Lambda Function Lambda Function Lambda Function This new feature can; • Modify the ACL’s or tags of objects on S3 at scale. • Copy objects to a new bucket while preserving properties. • Let Lambda (re)process all your files stored on S3. AWS takes care of running the operations, even if your bucket has billions of objects.
  20. Why relational and not NoSQL? Sometimes it’s not possible to

    use a NoSQL database; • You need to integrate with other backend applications that run on a relational database (i.e. WordPress) or are hard to modify. • You need access to complex queries that are harder to do with NoSQL (i.e. multiple joins, fuzzy searches). • There may be other database features that your application requires (logging, ACID compliance).
  21. How does Serverless Aurora work? Availability zone 1 Region App

    on EC2 or Lambda Shared distributed storage volume Multi-tenant proxy layer Warm-pool of Aurora instances Monitoring service
  22. Introducing Amazon Relational Database Service Data API • Simple web

    service protocol for database access • SQL statements packaged as HTTP requests • Access your database from Lambda and AppSync • Access your database from the AWS SDK & CLI Data API Service Aurora Serverless
  23. Introducing RDS Console Query Editor • Access your database from

    AWS Management Console • No database client application or terminal required • The same requests can be made using the AWS SDK or CLI.
  24. A serverless, relational three tier application Data stored in Aurora

    serverless Dynamic content in AWS Lambda Amazon API Gateway Browser Amazon CloudFront Amazon S3 Amazon Cognito
  25. Search and Data Catalog • Use DynamoDB as a metadata

    repository • Optionally use Amazon ElasticSearch for more complex queries AWS Lambda Metadata Index (DynamoDB) Search Index (Amazon ES) ObjectCreated ObjectDeleted PutItem Update Index S3 Bucket https://aws.amazon.com/answers/big-data/data-lake-solution/ Catalog & Search
  26. AWS Glue Crawlers AWS Glue Data Catalog Amazon QuickSight Amazon

    Redshift Spectrum Amazon Athena S3 Bucket(s) Catalog & Search Use Glue Crawlers to build a data catalogue
  27. AWS Lake Formation (in preview) Build, secure, and manage data

    lakes, reducing the set up time from months to days
  28. AWS CodeStar • Quickly develop, build, and deploy applications on

    AWS • Start developing on AWS in minutes • Work across your team, securely • Manage software delivery easily • Choose from a variety of project templates
  29. Services deployed for you when using CodeStar Source Build Test

    Deploy Monitor AWS CodeBuild + Third Party AWS CodeCommit AWS CodeBuild AWS CodeDeploy AWS CodePipeline AWS X-Ray Amazon CloudWatch
  30. Further reading and events Well Architected Lens for serverless https://d1.awsstatic.com/whitepapers/architecture/AWS-Serverless-

    Applications-Lens.pdf Serverless Application Repository https://serverlessrepo.aws.amazon.com/ Free developer event - AWS DevDay on June 19th in Utrecht https://aws.amazon.com/events/Devdays-Utrecht/
  31. Bootstrap the runtime Start your code Lambda: The execution lifecycle

    Cold start Warm start Download your code Start new container Time Learn more about Lambda under the hood on YouTube; - AWS re:Invent 2018: A Serverless Journey: AWS Lambda Under the Hood (SRV409) https://www.youtube.com/watch?v=QdzV04T_kec
  32. What drives our priorities? Excelling in service fundamentals Availability, latency,

    security, scale and associated controls Enabling new application development patterns New patterns through events, workflows, functions, and APIs Minimizing undifferentiated code Eliminating duplicate code, increase reuse Empowering serverless developers and operations Meet developers and operations where they are, lead them where they need to be
  33. Service fundamentals: 2018 recap Enhanced compliance regime ü FEDRAMP for

    API Gateway and Lambda ü HIPAA for Step Functions, Serverless Application Repository ü GDPR for all services Scale, availability, and pricing improvements ü AWS Lambda SLA (99.95%), API Gateway SLA (99.95%) ü API Gateway tiered pricing (as low as $1.51/million) ü Increased Step Functions throughput (1,000 transitions/sec) aws.amazon.com/compliance aws.amazon.com/lambda/sla - aws.amazon.com/api-gateway/sla
  34. Host KVM MicroVM (Guest OS & Container Workload) Firecracker RESTful

    API Networks Storage Rate Limiting Metadata Service • Firecracker microVMs have the same security as KVM VMs • Designed for low overhead, high density, and fast start times • Built-in fair sharing Firecracker Architecture and Benefits
  35. Firecracker benefits Security Lightweight container encapsulated with VM barrier and

    strong process isolation Greater efficiency Speed by design Accelerates kernel loading to reduce cold start times (150 microVMs/second) More processes can be run per instance and more efficient use of compute resources. firecracker-microvm.io
  36. Launch: ALB integration with Lambda Enables easier transition from existing

    compute stacks Mix and match compute options to build your backends Robust load balancer controls (health checks, programmable rules engine, traffic shaping) Amazon ALB Amazon EC2 AWS Fargate AWS Lambda
  37. Launch: serverless websockets Build real-time two-way communication apps (chat, dashboards,

    etc.) Handle connections and messages transfer between users and backend services
  38. Launch: Custom Runtimes Bring any Linux-compatible language runtime Powered by

    new Runtime API Already powering Ruby support More runtimes from partners (PHP and Erlang) Runtimes distributed as “layers” Layers Rule Stack
  39. Launch: Lambda Layers Easily share code Upload layer once, reference

    within any function Promote separation of responsibilities Built-in support for secure sharing by ecosystem
  40. Launch: cross toolchain app view Application Views on the Lambda

    Console - view and monitor all resources of the same app New IDE toolkits for IntelliJ, PyCharm, and VS Code
  41. Just released : S3 Batch Operations Amazon S3 Lambda Function

    Lambda Function Lambda Function Lambda Function Lambda Function Lambda Function Lambda Function Lambda Function Lambda Function
  42. © 2018, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. https://secure.flickr.com/photos/mgifford/4525333972 Why are we here today?
  43. Lambda and Fargate use Firecracker Security Lightweight container encapsulated with

    VM barrier and strong process isolation Greater efficiency Speed by design Accelerates kernel loading to reduce cold start times (150 microVMs/second) More processes can be run per instance and more efficient use of compute resources. firecracker-microvm.io
  44. Host KVM MicroVM (Guest OS & Container Workload) Firecracker RESTful

    API Networks Storage Rate Limiting Metadata Service • Firecracker microVMs have the same security as KVM VMs • Designed for low overhead, high density, and fast start times • Built-in fair sharing Firecracker Architecture and Benefits