$30 off During Our Annual Pro Sale. View Details »

Advanced Serverless Architectural Patterns on AWS [Devoxx Poland]

Advanced Serverless Architectural Patterns on AWS [Devoxx Poland]

Alex Casalboni

June 25, 2019
Tweet

More Decks by Alex Casalboni

Other Decks in Programming

Transcript

  1. @alex_casalboni
    #DevoxxPL
    Advanced Serverless Architectural
    Patterns on AWS
    Alex Casalboni
    Sr. Technical Evangelist
    Amazon Web Services

    View Slide

  2. About me
    • Software Engineer & Web Developer
    • Worked in a startup for 4.5 years
    • ServerlessDays Organizer
    • AWS customer since 2013

    View Slide

  3. Agenda
    Serverless foundations (quickly, I promise!)
    Advanced serverless patterns:
    1. Web application / API
    2. Stream processing
    3. Data lakes
    4. Machine learning

    View Slide

  4. Compute Spectrum
    AWS
    Lambda
    Amazon
    Kinesis
    Amazon
    S3
    Amazon API
    Gateway
    Amazon
    SQS
    Amazon
    DynamoDB
    AWS
    IoT
    Amazon
    EMR
    Amazon
    ElastiCache
    Amazon
    RDS
    Amazon
    Redshift
    Amazon
    Elasticsearch
    Managed Serverless
    Amazon EC2
    Microsoft SQL
    Server
    “On Amazon EC2”
    Amazon
    Cognito
    Amazon
    CloudWatch
    Amazon
    Athena
    AWS
    X-Ray
    AWS Step
    Functions
    Amazon
    MQ
    Amazon
    SageMaker
    Amazon
    Neptune
    AWS Fargate
    Amazon
    DocumentDB

    View Slide

  5. Serverless means…
    No server or container
    management
    Flexible scaling
    No idle capacity
    $
    High availability

    View Slide

  6. Bootstrap
    the runtime
    Start your
    code
    Lambda: The execution lifecycle
    Cold start
    Warm
    start
    Download
    your code
    Start new
    container
    Time

    View Slide

  7. Tune your function’s resources
    Only a memory control - % of CPU core and network capacity
    allocated to a function proportionally
    Is your code CPU, network or memory-bound? If so, it
    could be cheaper to choose more memory
    > Memory, > Cores, > Network

    View Slide

  8. “AWS Lambda Power Tuning”
    Data-driven cost & performance
    optimization for AWS Lambda
    github.com/alexcasalboni/aws-lambda-power-tuning
    Don’t guesstimate!

    View Slide

  9. Lambda best practices
    Minimize your package size & use only needed SDK modules
    Put your dependency (e.g. jar files) in a separate directory
    Improve dependency injection with smaller and simpler IoC
    frameworks that load quickly on startup, like Dagger2
    Leverage smaller and faster frameworks like jackson-jr for
    Java data binding
    Use environment variables to modify operational behavior
    Secure secrets/tokens/passwords with Parameter Store and
    AWS Secrets Manager

    View Slide

  10. AWS Serverless Application Model (SAM)
    AWS CloudFormation extension
    (Macro) to simplify serverless apps
    New serverless resource types:
    functions, APIs, and tables
    Local testing with SAM CLI
    github.com/awslabs/serverless-application-model

    View Slide

  11. Source Build Test Deploy
    AWS CodeCommit AWS CodeBuild Third Party
    Tooling
    AWS CodeDeploy
    AWS CodePipeline
    AWS CodeStar
    AWS code services

    View Slide

  12. Pattern 1
    Web app / microservice / API

    View Slide

  13. Web application (1)
    DynamoDB
    Lambda
    API Gateway
    Browser
    CloudFront Amazon S3 Cognito

    View Slide

  14. Choose the right API endpoint type
    Edge optimized: reduce latency from anywhere on the Internet
    AWS Region
    API Gateway
    Internet
    edge location
    edge location
    edge location
    CloudFront
    Distribution
    API Gateway Managed

    View Slide

  15. Web application (2)
    DynamoDB
    Lambda
    API Gateway
    Browser CloudFront
    S3 Cognito
    Lambda@Edge

    View Slide

  16. Choose the right API endpoint type
    Regional AWS us-east-2
    API Gateway
    Internet AWS us-west-2
    API Gateway
    Route 53
    Lambda DynamoDB
    Lambda DynamoDB
    Global Tables

    View Slide

  17. Regional
    API Gateway
    Internet
    API Gateway
    Route 53
    Lambda DynamoDB
    Lambda DynamoDB
    Global Tables
    Lambda@Edge
    CloudFront
    Choose the right API endpoint type
    AWS us-east-2
    AWS us-west-2

    View Slide

  18. Private: expose APIs only inside your VPC
    AWS Region
    API Gateway
    Your VPC
    AWS Direct
    Connect
    On-premises
    Choose the right API endpoint type

    View Slide

  19. DynamoDB
    Lambda
    API Gateway
    Browser
    CloudFront Amazon S3 Cognito
    Serverless web app security

    View Slide

  20. DynamoDB
    Lambda
    API Gateway
    Browser
    CloudFront S3 Cognito
    Serverless web app security
    Static Content
    • Geo-Restrictions
    • Signed Cookies
    • Signed URLs
    • DDOS Protection
    • Bucket Policies
    • ACLs
    AuthZ
    • Cross Account
    • Throttling per method
    • Resource Policies
    • Usage Plans
    • Encryption at Rest
    • VPC Endpoint
    • Function policies
    • Env Variables
    • Parameters/Secrets

    View Slide

  21. Lambda
    Authorizer
    Client
    Lambda
    API
    Gateway
    DynamoDB
    IAM
    Lambda authorizers

    View Slide

  22. Pattern 2
    Data processing (stream)

    View Slide

  23. Streaming with Amazon Kinesis
    Collect, process, and analyze video and data streams in real time
    Kinesis Data
    Firehose
    SQL
    Kinesis Data
    Analytics
    Kinesis Data
    Streams
    Kinesis Video
    Streams

    View Slide

  24. Streaming data ingestion
    Amazon S3:
    Buffered files
    Kinesis
    Agent
    Record
    producers Amazon Redshift:
    Table loads
    Amazon Elasticsearch Service:
    Domain loads
    Amazon S3:
    Source record backup
    Transformed records
    Put Records
    Kinesis Firehose:
    Delivery stream
    AWS Lambda:
    Transformations &
    enrichment
    Amazon DynamoDB:
    Lookup tables
    Raw
    Lookup
    Transformed

    View Slide

  25. Streaming data ingestion (HTTP)
    HTTP
    POST/PUT
    API
    Gateway
    Browser
    Amazon S3:
    Buffered files
    Amazon Redshift:
    Table loads
    Amazon Elasticsearch Service:
    Domain loads
    Amazon S3:
    Source record backup
    AWS Lambda:
    Transformations &
    enrichment
    Amazon DynamoDB:
    Lookup tables
    Raw
    Lookup
    Transformed
    Transformed records
    Kinesis Firehose:
    Delivery stream

    View Slide

  26. Streaming data ingestion (at the edge)
    Amazon S3:
    Buffered files
    Amazon Redshift:
    Table loads
    Amazon Elasticsearch Service:
    Domain loads
    Amazon S3:
    Source record backup
    AWS Lambda:
    Transformations &
    enrichment
    Amazon DynamoDB:
    Lookup tables
    Raw
    Lookup
    Transformed
    Transformed records
    Kinesis Firehose:
    Delivery stream
    HTTP
    POST/PUT
    CloudFront
    Lambda@Edge
    Browser

    View Slide

  27. Kinesis Best practices
    Tune Firehose buffer size and buffer interval
    • Larger objects = fewer Lambda invocations & Amazon S3 PUTs
    Enable compression to reduce storage costs
    Enable Parquet format transformation (columnar)
    Enable Source Record Backup for transformations
    • Recover from transformation errors

    View Slide

  28. Kinesis Data Streams and Lambda
    # of shards corresponds to concurrent invocations of Lambda function
    Batch size sets maximum # of records per invocation (min 1, max 10K)
    Data Stream Processor Function
    Streaming source Other AWS services

    View Slide

  29. Fan-out pattern
    Trade strict message ordering for higher throughput & lower latency
    Kinesis Data Streams:
    Stream
    Lambda:
    Dispatcher function
    Lambda:
    Processor function
    Increase throughput, reduce processing latency
    Streaming source
    github.com/aws-samples/aws-lambda-fanout

    View Slide

  30. Real-time analytics
    Data Stream Kinesis Data Analytics:
    Time window aggregation
    Kinesis Data Firehose:
    Error stream
    S3:
    Error records
    Record
    producers
    Lambda:
    Alert function
    DynamoDB
    SNS:
    Notifications

    View Slide

  31. CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM"
    SELECT STREAM
    "device_id",
    STEP("SOURCE_SQL_STREAM_001".ROWTIME BY INTERVAL '10' MINUTE) as "window_ts",
    SUM("measurement") as "sample_sum",
    COUNT(*) AS "sample_count"
    FROM "SOURCE_SQL_STREAM_001"
    GROUP BY
    "device_id",
    STEP("SOURCE_SQL_STREAM_001".ROWTIME BY INTERVAL '10' MINUTE);
    Kinesis Data Analytics
    Aggregation
    10-minute tumbling window
    Kinesis Data Analytics:
    Time window aggregation
    Source stream Destination stream(s)

    View Slide

  32. Pattern 3
    Data Lakes

    View Slide

  33. Data lake characteristics
    Collect, store, process, consume, and analyze organizational data
    Structured, semi-structured, and unstructured data
    Decoupled compute and storage
    Fast automated ingestion
    Schema on-read
    Complementary to data warehouses

    View Slide

  34. Serverless data lake
    S3
    Elasticsearch
    Glue
    DynamoDB
    Catalog & search
    Cognito
    API
    Gateway
    API/UI
    Athena QuickSight
    Redshift
    Spectrum
    Analytics & processing
    Lambda
    Kinesis
    Streams
    Kinesis
    Firehose
    Direct
    Connect
    Ingest
    AWS
    IoT
    KMS CloudTrail
    IAM Macie
    Security & auditing

    View Slide

  35. Glue
    Crawlers
    Glue
    Data Catalog
    QuickSight
    Redshift
    Spectrum
    Athena
    S3
    Bucket(s)
    How to “serverlessly” query your data lake

    View Slide

  36. Athena – Just SQL (presto)
    Query duration: 44.66 seconds
    Data scanned: 169.53GB
    Cost*: $0.85
    * $5/TB or $0.005/GB
    SELECT gram, year, sum(count)
    FROM ngram
    WHERE gram = 'just say no'
    GROUP BY gram, year
    ORDER BY year ASC;

    View Slide

  37. Athena best practices
    Partition data
    s3://my-bucket/my-data/parquet/year=2018/month=11/day=25/
    Use columnar formats – Apache Parquet, AVRO, ORC
    Compress files with splittable compression (bzip2)
    Optimize file sizes
    aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena

    View Slide

  38. Serverless MapReduce
    Lambda:
    Splitter
    S3
    Object
    DynamoDB:
    Mapper Results
    Lambda:
    Mappers
    ….
    …. Lambda:
    Reducer
    S3
    Results

    View Slide

  39. Pywren - http://pywren.io
    Python library developed by UCI (University of California, Berkeley)
    Up to 40 TFLOPS of peak compute power
    Over 700 GB/sec of read and 500 GB/sec of write performance using S3
    “numpywren: Serverless Linear Algebra”
    https://arxiv.org/pdf/1810.09679.pdf

    View Slide

  40. Pattern 4
    Machine Learning

    View Slide

  41. M L F R A M E W O R K S &
    I N F R A S T R U C T U R E
    The Amazon ML Stack: Broadest & Deepest Set of Capabilities
    A I S E R V I C E S
    R E K O G N I T I O N
    I M A G E
    P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D
    C O M P R E H E N D
    M E D I C A L
    L E X
    R E K O G N I T I O N
    V I D E O
    Vision Speech Chatbots
    A M A Z O N S A G E M A K E R
    B U I L D T R A I N
    F O R E C A S T
    T E X T R A C T P E R S O N A L I Z E
    D E P L O Y
    Pre-built algorithms & notebooks
    Data labeling (G R O U N D T R U T H )
    One-click model training & tuning
    Optimization ( N E O )
    One-click deployment & hosting
    M L S E R V I C E S
    F r a m e w o r k s I n t e r f a c e s I n f r a s t r u c t u r e
    E C 2 P 3
    & P 3 d n
    E C 2 C 5 F P G A s G R E E N G R A S S E L A S T I C
    I N F E R E N C E
    Models without training data (REINFORCEMENT LEARNING)
    Algorithms & models ( A W S M A R K E T P L A C E )
    Language Forecasting Recommendations
    NEW NEW
    NEW
    NEW
    NEW
    NEW
    NEW
    NEW
    NEW

    View Slide

  42. 1. Upload
    2. Submit
    image
    Image processing with Amazon Rekognition Image
    Step Functions
    3. Store image Lambda
    DynamoDB
    Elasticsearch
    8. Store metadata &
    analysis
    4. DetectFaces 7. DetectText
    5. DetectLabels 6. DetectModeration

    View Slide

  43. Media analysis solution
    S3:
    Web interface
    Cognito
    Amazon Rekognition Video:
    Detect objects, scenes,
    faces, & celebrities
    Elasticsearch:
    Search index
    API Gateway:
    REST APIs
    https://aws.amazon.com/answers/media-entertainment/media-analysis-solution/
    AWS Elemental MediaConvert:
    Transcode videos
    S3:
    Media storage
    Step Functions:
    Orchestrate
    analysis
    Transcribe Comprehend

    View Slide

  44. Amazon Connect
    (Serverless contact center)
    Real time and
    historical analytics
    High-quality
    voice capability
    Call
    recording
    Skills-based routing
    [Automatic Call Distribution (ACD)]

    View Slide

  45. Intelligent call center chatbot
    Amazon
    Connect
    Customer
    Amazon
    Lex
    Lambda:
    Chatbot
    Processing
    DynamoDB:
    Customer
    Data
    SNS:
    SMS Messaging
    Customer calls
    Connect to
    reschedule an
    appointment
    Connect calls
    Lex chatbot
    Lex chatbot calls
    Lambda function to
    get customer
    preferences and
    fulfil Intents
    Lambda function
    sends text message
    confirmation via SNS
    Customer receives
    appointment
    confirmation text
    message
    Lambda
    function writes
    updates to
    DynamoDB

    View Slide

  46. Call center analytics
    Amazon
    Connect
    Customers
    Agents
    Call
    recordings
    S3: Call
    recordings
    S3: Call
    transcripts
    Step Functions Transcribe
    Lambda
    S3: Sentiment,
    key phrases,
    entities
    Step Functions
    S3 Notifications
    for call
    transcripts
    Comprehend
    Lambda
    Athena
    QuickSight
    Contact trace records (CTRs)
    Kinesis Data
    Streams
    Kinesis Data
    Firehose
    S3: CTRs

    View Slide

  47. Go Build!
    Here to help you build

    View Slide

  48. Alex Casalboni
    Technical Evangelist, AWS
    @alex_casalboni
    Thank you!
    @ 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved

    View Slide