Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Serverless architectures on AWS Lambda

Serverless architectures on AWS Lambda

OpsGenie leverages Serverless Architectures at AWS Lambda for the last 3 years. In this presentation, we’ll explain the reasoning behind leveraging AWS Lambda and show some real-life examples. Of course, nothing just works. So, we’ll also mention some of the challenges and explain how we overcome them.

Mors Alfabesi Bileklik

March 15, 2023
Tweet

More Decks by Mors Alfabesi Bileklik

Other Decks in Technology

Transcript

  1. @srhtcn
    Serverless Architectures on AWS Lambda:
    The Opsgenie Experience
    Serhat Can @srhtcn

    View Slide

  2. @srhtcn
    Who am I?
    ● Ex-Software Engineer
    Technical Evangelist at
    ● DevOpsDays core team member
    ● AWS Community Hero
    ● @srhtcn on Twitter

    View Slide

  3. @srhtcn
    You want to run code on cloud. Your options:
    Bare metal
    IaaS (VM)
    CaaS (container)
    PaaS (app)
    Serverless (function)
    More control,
    more code
    Less control,
    less code

    View Slide

  4. @srhtcn
    Making thoughtful decisions about tools and architecture can help;
    well-considered constraints can free us from the decisions that
    aren't bringing us distinguishable benefit.
    Bridget Kromhout
    https://queue.acm.org/detail.cfm?id=3185224

    View Slide

  5. @srhtcn
    What is Serverless?

    View Slide

  6. @srhtcn
    Defining Serverless
    Serverless is an event driven, utility based, stateless, code execution environment.
    Simon Wardley @swardley

    View Slide

  7. @srhtcn
    Defining Serverless
    Event driven: Code is initiated and run after an event like HTTP request or storage of a file triggers.

    View Slide

  8. @srhtcn
    Defining Serverless
    Event driven: Code is initiated and run after an event like HTTP request or storage of a file triggers.
    Utility based: No payment for idle time or hosting. You pay for the resources you use when your code is
    triggered.

    View Slide

  9. @srhtcn
    Defining Serverless
    Event driven: Code is initiated and run after an event like HTTP request or storage of a file triggers.
    Utility based: No payment for idle time or hosting. You pay for the resources you use when your code is
    triggered.
    Stateless: Code execution environment is deconstructed after sometime. No information is guaranteed
    to stay in the environment after function execution is completed.

    View Slide

  10. @srhtcn
    Defining Serverless
    Event driven: Code is initiated and run after an event like HTTP request or storage of a file triggers.
    Utility based: No payment for idle time or hosting. You pay for the resources you use when your code is
    triggered.
    Stateless: Code execution environment is deconstructed after sometime. No information is guaranteed
    to stay in the environment after function execution is completed.
    Code execution: Just code, not servers / VMs / containers etc.

    View Slide

  11. @srhtcn
    Why should I go
    Serverless?
    Two main reasons

    View Slide

  12. @srhtcn
    Less is more
    Less code to maintain, less ops, less toil (work tied to running a production service that tends to be manual, repetitive)
    - Scaling
    - Provisioning
    - OS or Language updates
    - Resource utilization
    - Network monitoring
    - Fault tolerance
    - Shipping logs
    https://landing.google.com/sre/book/chapters/eliminating-toil.html

    View Slide

  13. @srhtcn

    View Slide

  14. @srhtcn
    Economics
    - No payment for idle time or hosting
    - Easy to get started
    - Faster time to market

    View Slide

  15. @srhtcn
    AWS Lambda

    View Slide

  16. @srhtcn
    How it works?

    View Slide

  17. @srhtcn
    How it works internally?
    https://engineering.opsgenie.com/what-is-different-in-the-serverless-world-b9e0f68de191

    View Slide

  18. @srhtcn
    Language support
    Node.js (JavaScript)
    Python,
    Java (Java 8 compatible),
    C# (.NET Core)
    Golang
    NEW: Bring your own language!

    View Slide

  19. @srhtcn
    Pricing
    You choose memory size
    % of CPU core and network capacity increases proportionally with memory
    More memory doesn’t always mean you pay more

    View Slide

  20. @srhtcn
    https://www.slideshare.net/ChrisMunns/aws-startup-day-boston-2018-the-best-practices-and-hard-lessons-learned-of-serverless-applications

    View Slide

  21. @srhtcn
    Supported event sources
    20 different services can trigger AWS Lambda functions including.
    Event sources that aren't stream-based:
    Synchronous invocation: AWS SDK, Cognito, Alexa, API Gateway
    Asynchronous invocation: S3, SNS, CloudWatch logs, CloudWatch events
    Poll-based (or pull model) event sources that are stream-based: Kinesis, DynamoDB Streams
    Poll-based event sources that are not stream-based: SQS

    View Slide

  22. @srhtcn
    Toolkit around AWS Lambda
    Orchestration: Step Functions
    Deployment: SAM, Serverless.js, CloudFormation, Apex, Terraform
    Monitoring: X-Ray, Thundra
    Marketplace: AWS Serverless Application Repository

    View Slide

  23. @srhtcn
    AWS Lambda
    at OpsGenie
    AWS Lambda with Java 8
    DynamoDB
    SQS
    SNS
    VPC
    Serverless.js

    View Slide

  24. @srhtcn
    Fast scaling under immediate high load
    Under-utilized machines
    Pricing (still not a huge concern)
    Operational complexity
    Learning curve - kubernetes?
    AWS Fargate - YES
    Why did we consider AWS Lambda?

    View Slide

  25. @srhtcn
    OpsGenie’s Serverless journey
    2015
    Writing small scale
    custom integrations
    At this point, we started
    leveraging AWS Lambda to
    help our customer run custom
    code

    View Slide

  26. @srhtcn
    OpsGenie’s Serverless journey
    2015
    Writing small scale
    custom integrations
    At this point, we started
    leveraging AWS Lambda to
    help our customer run custom
    code
    First production usage
    Started using AWS Lambda
    for leveraging async / not
    business critical jobs such as
    DynamoDB autoscale
    2016

    View Slide

  27. @srhtcn
    OpsGenie’s Serverless journey
    2015
    Writing small scale
    custom integrations
    At this point, we started
    leveraging AWS Lambda to
    help our customer run custom
    code
    First production usage
    Started using AWS Lambda
    for leveraging async / not
    business critical jobs such as
    DynamoDB autoscale
    2016
    Service and Incident
    Management
    A new customer facing
    feature running on AWS
    Lambda integrated with the
    rest of the code base.
    2017

    View Slide

  28. @srhtcn
    OpsGenie’s Serverless journey
    2015
    Writing small scale
    custom integrations
    At this point, we started
    leveraging AWS Lambda to
    help our customer run custom
    code
    First production usage
    Started using AWS Lambda
    for leveraging async / not
    business critical jobs such as
    DynamoDB autoscale
    2016
    Service and Incident
    Management
    A new customer facing
    feature running on AWS
    Lambda integrated with the
    rest of the code base.
    2017
    A Spinoff: Thundra
    Observability for AWS Lambda
    2018

    View Slide

  29. @srhtcn
    Fixing “it is slow” is harder in AWS Lambda
    Too many moving pieces
    No way to attach an agent
    Even how to send the monitoring data is a discussion point

    View Slide

  30. @srhtcn
    Determine the latency in different levels
    Automatic instrumentation
    GC, Thread counts & durations, CPU usage details
    Get the stack trace in case of an error and drill down
    See logs, traces, and metrics in one view
    thundra.io
    What we needed was

    View Slide

  31. @srhtcn
    Serverless
    Architectures

    View Slide

  32. @srhtcn
    Custom integrations
    AWS Lambda is a life saver for custom solutions, because;
    ○ Customers do not need to manage servers
    ○ Easy to get started and deploy (give a .zip file)
    ○ Real pay what you use pricing

    View Slide

  33. @srhtcn
    Create Alerts from Slack Messages
    Source: https://github.com/opsgenie/slack-to-opsgenie-alert-creator

    View Slide

  34. @srhtcn
    Alert Enrichment

    View Slide

  35. @srhtcn
    Elasticsearch data indexing

    View Slide

  36. @srhtcn
    DynamoDB Cross Region Replication

    View Slide

  37. @srhtcn
    DynamoDB Auto Scale

    View Slide

  38. @srhtcn
    Service and Incident Management

    View Slide

  39. @srhtcn
    An incident of
    $40,000

    View Slide

  40. @srhtcn
    Lessons learned: An incident of $40,000
    Avoid infinite retries
    Monitor and alert for pricing (no pricing metric for AWS Lambda)
    Think of Cloudwatch cost and sample logs & metrics

    View Slide

  41. @srhtcn
    Challenges
    ● Cold start
    ● Local development
    ● Concurrent execution limit
    ● No well-known good practices

    View Slide

  42. @srhtcn
    Concurrent Executions
    Lambda concurrent execution count for non stream based events:
    events (or requests) per second * function duration
    Hard to deal with peaks in request numbers
    Takes time to increase the limit
    Functions affect each other’s scalability

    View Slide

  43. @srhtcn
    Cold start
    When:
    - memory size
    - code size
    - VPC
    - the language
    https://read.acloud.guru/does-coding-language-memory-or-package-size-affect-cold-starts-of-aws-lambda-a15e26d12c76

    View Slide

  44. @srhtcn
    Solving cold start problem
    Wait for AWS to improve it
    Increase memory (and pay more)
    Lightweight application framework instead of Spring
    Do some smart warm-up

    View Slide

  45. @srhtcn
    Thank you!
    Serhat Can
    twitter.com/srhtcn
    linkedin.com/in/serhatcan
    medium.com/@serhatcan

    View Slide