Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Serverless Architectures on AWS Lambda: The Ops...

Serhat Can
August 07, 2018

Serverless Architectures on AWS Lambda: The OpsGenie Experience

OpsGenie leverages Serverless Architectures at AWS Lambda for the last 3 years. In this presentation, we’ll explain the reasoning behind leveraging AWS Lambda and show some real-life examples. Of course, nothing just works. So, we’ll also mention some of the challenges and explain how we overcome them.

Serhat Can

August 07, 2018
Tweet

More Decks by Serhat Can

Other Decks in Technology

Transcript

  1. @srhtcn Who am I? • Ex-Software Engineer Technical Evangelist at

    OpsGenie • Speak, code, write on DevOps, On-Call, Incident Response, Serverless • Co-organizer ◦ Serverless Turkey Meetup ◦ DevOpsDays İstanbul ◦ DevOps Turkey Meetup
  2. @srhtcn Modern incident management platform for operating always-on services •

    Plan and prepare for incidents • Ensure issues are never missed, and the right people are notified • Gain insights to improve your operational efficiency • 200+ integrations
  3. @srhtcn You want to run code on cloud. Your options:

    Bare metal IaaS (VM) CaaS (container) PaaS (app) Serverless (function) More control, more code Less control, less code
  4. @srhtcn Making thoughtful decisions about tools and architecture can help;

    well-considered constraints can free us from the decisions that aren't bringing us distinguishable benefit. Bridget Kromhout https://queue.acm.org/detail.cfm?id=3185224
  5. @srhtcn Defining Serverless Serverless is an event driven, utility based,

    stateless, code execution environment. Simon Wardley @swardley
  6. @srhtcn Defining Serverless Event driven: Code is initiated and run

    after an event like HTTP request or storage of a file triggers.
  7. @srhtcn Defining Serverless Event driven: Code is initiated and run

    after an event like HTTP request or storage of a file triggers. Utility based: No payment for idle time or hosting. You pay for the resources you use when your code is triggered.
  8. @srhtcn Defining Serverless Event driven: Code is initiated and run

    after an event like HTTP request or storage of a file triggers. Utility based: No payment for idle time or hosting. You pay for the resources you use when your code is triggered. Stateless: Code execution environment is deconstructed after sometime. No information is guaranteed to stay in the environment after function execution is completed.
  9. @srhtcn Defining Serverless Event driven: Code is initiated and run

    after an event like HTTP request or storage of a file triggers. Utility based: No payment for idle time or hosting. You pay for the resources you use when your code is triggered. Stateless: Code execution environment is deconstructed after sometime. No information is guaranteed to stay in the environment after function execution is completed. Code execution: Just code, not servers / VMs / containers etc.
  10. @srhtcn Less is more Less code to maintain, less ops,

    less toil (work tied to running a production service that tends to be manual, repetitive) - Scaling - Provisioning - OS or Language updates - Resource utilization - Network monitoring - Fault tolerance - Shipping logs https://landing.google.com/sre/book/chapters/eliminating-toil.html
  11. @srhtcn Economics - No payment for idle time or hosting

    - Easy to get started - Faster time to market
  12. @srhtcn Pricing You choose memory size % of CPU core

    and network capacity increases proportionally with memory More memory doesn’t always mean you pay more
  13. @srhtcn Supported event sources 20 different services can trigger AWS

    Lambda functions including. Event sources that aren't stream-based: Synchronous invocation: AWS SDK, Cognito, Alexa, API Gateway Asynchronous invocation: S3, SNS, CloudWatch logs, CloudWatch events Poll-based (or pull model) event sources that are stream-based: Kinesis, DynamoDB Streams Poll-based event sources that are not stream-based: SQS
  14. @srhtcn Toolkit around AWS Lambda Orchestration: Step Functions Deployment: SAM,

    Serverless.js, CloudFormation, Apex, Terraform Monitoring: Cloudwatch, X-Ray, Thundra Marketplace: AWS Serverless Application Repository
  15. @srhtcn AWS Lambda at OpsGenie AWS Lambda with Java 8

    DynamoDB SQS SNS VPC Serverless.js
  16. @srhtcn Fast scaling under immediate high load Under-utilized machines Operational

    complexity Learning curve - kubernetes? AWS Fargate - YES! Why did we consider AWS Lambda?
  17. @srhtcn OpsGenie’s Serverless journey 2015 Writing small scale custom integrations

    At this point, we started leveraging AWS Lambda to help our customer run custom code
  18. @srhtcn OpsGenie’s Serverless journey 2015 Writing small scale custom integrations

    At this point, we started leveraging AWS Lambda to help our customer run custom code First production usage Started using AWS Lambda for leveraging async / not business critical jobs such as DynamoDB autoscale 2016
  19. @srhtcn OpsGenie’s Serverless journey 2015 Writing small scale custom integrations

    At this point, we started leveraging AWS Lambda to help our customer run custom code First production usage Started using AWS Lambda for leveraging async / not business critical jobs such as DynamoDB autoscale 2016 Service and Incident Management A new customer facing feature running on AWS Lambda integrated with the rest of the code base. 2017
  20. @srhtcn OpsGenie’s Serverless journey 2015 Writing small scale custom integrations

    At this point, we started leveraging AWS Lambda to help our customer run custom code First production usage Started using AWS Lambda for leveraging async / not business critical jobs such as DynamoDB autoscale 2016 Service and Incident Management A new customer facing feature running on AWS Lambda integrated with the rest of the code base. 2017 A Spinoff: Thundra Observability for AWS Lambda 2018
  21. @srhtcn Fixing “it is slow” is harder in AWS Lambda

    Too many moving pieces No way to attach an agent Even how to send the monitoring data is a discussion point
  22. @srhtcn Determine the latency in different levels Automatic instrumentation GC,

    Thread counts & durations, CPU usage details Get the stack trace in case of an error and drill down See logs, traces, and metrics in one view thundra.io What we needed was