Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Advanced Serverless Architectural Patterns on AWS [Devoxx Poland]

Advanced Serverless Architectural Patterns on AWS [Devoxx Poland]

Alex Casalboni

June 25, 2019
Tweet

More Decks by Alex Casalboni

Other Decks in Programming

Transcript

  1. About me • Software Engineer & Web Developer • Worked

    in a startup for 4.5 years • ServerlessDays Organizer • AWS customer since 2013
  2. Agenda Serverless foundations (quickly, I promise!) Advanced serverless patterns: 1.

    Web application / API 2. Stream processing 3. Data lakes 4. Machine learning
  3. Compute Spectrum AWS Lambda Amazon Kinesis Amazon S3 Amazon API

    Gateway Amazon SQS Amazon DynamoDB AWS IoT Amazon EMR Amazon ElastiCache Amazon RDS Amazon Redshift Amazon Elasticsearch Managed Serverless Amazon EC2 Microsoft SQL Server “On Amazon EC2” Amazon Cognito Amazon CloudWatch Amazon Athena AWS X-Ray AWS Step Functions Amazon MQ Amazon SageMaker Amazon Neptune AWS Fargate Amazon DocumentDB
  4. Bootstrap the runtime Start your code Lambda: The execution lifecycle

    Cold start Warm start Download your code Start new container Time
  5. Tune your function’s resources Only a memory control - %

    of CPU core and network capacity allocated to a function proportionally Is your code CPU, network or memory-bound? If so, it could be cheaper to choose more memory > Memory, > Cores, > Network
  6. “AWS Lambda Power Tuning” Data-driven cost & performance optimization for

    AWS Lambda github.com/alexcasalboni/aws-lambda-power-tuning Don’t guesstimate!
  7. Lambda best practices Minimize your package size & use only

    needed SDK modules Put your dependency (e.g. jar files) in a separate directory Improve dependency injection with smaller and simpler IoC frameworks that load quickly on startup, like Dagger2 Leverage smaller and faster frameworks like jackson-jr for Java data binding Use environment variables to modify operational behavior Secure secrets/tokens/passwords with Parameter Store and AWS Secrets Manager
  8. AWS Serverless Application Model (SAM) AWS CloudFormation extension (Macro) to

    simplify serverless apps New serverless resource types: functions, APIs, and tables Local testing with SAM CLI github.com/awslabs/serverless-application-model
  9. Source Build Test Deploy AWS CodeCommit AWS CodeBuild Third Party

    Tooling AWS CodeDeploy AWS CodePipeline AWS CodeStar AWS code services
  10. Choose the right API endpoint type Edge optimized: reduce latency

    from anywhere on the Internet AWS Region API Gateway Internet edge location edge location edge location CloudFront Distribution API Gateway Managed
  11. Choose the right API endpoint type Regional AWS us-east-2 API

    Gateway Internet AWS us-west-2 API Gateway Route 53 Lambda DynamoDB Lambda DynamoDB Global Tables
  12. Regional API Gateway Internet API Gateway Route 53 Lambda DynamoDB

    Lambda DynamoDB Global Tables Lambda@Edge CloudFront Choose the right API endpoint type AWS us-east-2 AWS us-west-2
  13. Private: expose APIs only inside your VPC AWS Region API

    Gateway Your VPC AWS Direct Connect On-premises Choose the right API endpoint type
  14. DynamoDB Lambda API Gateway Browser CloudFront S3 Cognito Serverless web

    app security Static Content • Geo-Restrictions • Signed Cookies • Signed URLs • DDOS Protection • Bucket Policies • ACLs AuthZ • Cross Account • Throttling per method • Resource Policies • Usage Plans • Encryption at Rest • VPC Endpoint • Function policies • Env Variables • Parameters/Secrets
  15. Streaming with Amazon Kinesis Collect, process, and analyze video and

    data streams in real time Kinesis Data Firehose SQL Kinesis Data Analytics Kinesis Data Streams Kinesis Video Streams
  16. Streaming data ingestion Amazon S3: Buffered files Kinesis Agent Record

    producers Amazon Redshift: Table loads Amazon Elasticsearch Service: Domain loads Amazon S3: Source record backup Transformed records Put Records Kinesis Firehose: Delivery stream AWS Lambda: Transformations & enrichment Amazon DynamoDB: Lookup tables Raw Lookup Transformed
  17. Streaming data ingestion (HTTP) HTTP POST/PUT API Gateway Browser Amazon

    S3: Buffered files Amazon Redshift: Table loads Amazon Elasticsearch Service: Domain loads Amazon S3: Source record backup AWS Lambda: Transformations & enrichment Amazon DynamoDB: Lookup tables Raw Lookup Transformed Transformed records Kinesis Firehose: Delivery stream
  18. Streaming data ingestion (at the edge) Amazon S3: Buffered files

    Amazon Redshift: Table loads Amazon Elasticsearch Service: Domain loads Amazon S3: Source record backup AWS Lambda: Transformations & enrichment Amazon DynamoDB: Lookup tables Raw Lookup Transformed Transformed records Kinesis Firehose: Delivery stream HTTP POST/PUT CloudFront Lambda@Edge Browser
  19. Kinesis Best practices Tune Firehose buffer size and buffer interval

    • Larger objects = fewer Lambda invocations & Amazon S3 PUTs Enable compression to reduce storage costs Enable Parquet format transformation (columnar) Enable Source Record Backup for transformations • Recover from transformation errors
  20. Kinesis Data Streams and Lambda # of shards corresponds to

    concurrent invocations of Lambda function Batch size sets maximum # of records per invocation (min 1, max 10K) Data Stream Processor Function Streaming source Other AWS services
  21. Fan-out pattern Trade strict message ordering for higher throughput &

    lower latency Kinesis Data Streams: Stream Lambda: Dispatcher function Lambda: Processor function Increase throughput, reduce processing latency Streaming source github.com/aws-samples/aws-lambda-fanout
  22. Real-time analytics Data Stream Kinesis Data Analytics: Time window aggregation

    Kinesis Data Firehose: Error stream S3: Error records Record producers Lambda: Alert function DynamoDB SNS: Notifications
  23. CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM" SELECT

    STREAM "device_id", STEP("SOURCE_SQL_STREAM_001".ROWTIME BY INTERVAL '10' MINUTE) as "window_ts", SUM("measurement") as "sample_sum", COUNT(*) AS "sample_count" FROM "SOURCE_SQL_STREAM_001" GROUP BY "device_id", STEP("SOURCE_SQL_STREAM_001".ROWTIME BY INTERVAL '10' MINUTE); Kinesis Data Analytics Aggregation 10-minute tumbling window Kinesis Data Analytics: Time window aggregation Source stream Destination stream(s)
  24. Data lake characteristics Collect, store, process, consume, and analyze organizational

    data Structured, semi-structured, and unstructured data Decoupled compute and storage Fast automated ingestion Schema on-read Complementary to data warehouses
  25. Serverless data lake S3 Elasticsearch Glue DynamoDB Catalog & search

    Cognito API Gateway API/UI Athena QuickSight Redshift Spectrum Analytics & processing Lambda Kinesis Streams Kinesis Firehose Direct Connect Ingest AWS IoT KMS CloudTrail IAM Macie Security & auditing
  26. Glue Crawlers Glue Data Catalog QuickSight Redshift Spectrum Athena S3

    Bucket(s) How to “serverlessly” query your data lake
  27. Athena – Just SQL (presto) Query duration: 44.66 seconds Data

    scanned: 169.53GB Cost*: $0.85 * $5/TB or $0.005/GB SELECT gram, year, sum(count) FROM ngram WHERE gram = 'just say no' GROUP BY gram, year ORDER BY year ASC;
  28. Athena best practices Partition data s3://my-bucket/my-data/parquet/year=2018/month=11/day=25/ Use columnar formats –

    Apache Parquet, AVRO, ORC Compress files with splittable compression (bzip2) Optimize file sizes aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena
  29. Pywren - http://pywren.io Python library developed by UCI (University of

    California, Berkeley) Up to 40 TFLOPS of peak compute power Over 700 GB/sec of read and 500 GB/sec of write performance using S3 “numpywren: Serverless Linear Algebra” https://arxiv.org/pdf/1810.09679.pdf
  30. M L F R A M E W O R

    K S & I N F R A S T R U C T U R E The Amazon ML Stack: Broadest & Deepest Set of Capabilities A I S E R V I C E S R E K O G N I T I O N I M A G E P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D C O M P R E H E N D M E D I C A L L E X R E K O G N I T I O N V I D E O Vision Speech Chatbots A M A Z O N S A G E M A K E R B U I L D T R A I N F O R E C A S T T E X T R A C T P E R S O N A L I Z E D E P L O Y Pre-built algorithms & notebooks Data labeling (G R O U N D T R U T H ) One-click model training & tuning Optimization ( N E O ) One-click deployment & hosting M L S E R V I C E S F r a m e w o r k s I n t e r f a c e s I n f r a s t r u c t u r e E C 2 P 3 & P 3 d n E C 2 C 5 F P G A s G R E E N G R A S S E L A S T I C I N F E R E N C E Models without training data (REINFORCEMENT LEARNING) Algorithms & models ( A W S M A R K E T P L A C E ) Language Forecasting Recommendations NEW NEW NEW NEW NEW NEW NEW NEW NEW
  31. 1. Upload 2. Submit image Image processing with Amazon Rekognition

    Image Step Functions 3. Store image Lambda DynamoDB Elasticsearch 8. Store metadata & analysis 4. DetectFaces 7. DetectText 5. DetectLabels 6. DetectModeration
  32. Media analysis solution S3: Web interface Cognito Amazon Rekognition Video:

    Detect objects, scenes, faces, & celebrities Elasticsearch: Search index API Gateway: REST APIs https://aws.amazon.com/answers/media-entertainment/media-analysis-solution/ AWS Elemental MediaConvert: Transcode videos S3: Media storage Step Functions: Orchestrate analysis Transcribe Comprehend
  33. Amazon Connect (Serverless contact center) Real time and historical analytics

    High-quality voice capability Call recording Skills-based routing [Automatic Call Distribution (ACD)]
  34. Intelligent call center chatbot Amazon Connect Customer Amazon Lex Lambda:

    Chatbot Processing DynamoDB: Customer Data SNS: SMS Messaging Customer calls Connect to reschedule an appointment Connect calls Lex chatbot Lex chatbot calls Lambda function to get customer preferences and fulfil Intents Lambda function sends text message confirmation via SNS Customer receives appointment confirmation text message Lambda function writes updates to DynamoDB
  35. Call center analytics Amazon Connect Customers Agents Call recordings S3:

    Call recordings S3: Call transcripts Step Functions Transcribe Lambda S3: Sentiment, key phrases, entities Step Functions S3 Notifications for call transcripts Comprehend Lambda Athena QuickSight Contact trace records (CTRs) Kinesis Data Streams Kinesis Data Firehose S3: CTRs
  36. Alex Casalboni Technical Evangelist, AWS @alex_casalboni Thank you! @ 2019,

    Amazon Web Services, Inc. or its Affiliates. All rights reserved