Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Rocky Path To Migrating Production Applicat...

The Rocky Path To Migrating Production Applications To Serverless Architecture

Challenges of running Serverless applications in production, experiences at OpsGenie

Serhat Can

June 28, 2018
Tweet

More Decks by Serhat Can

Other Decks in Technology

Transcript

  1. @srhtcn The Rocky Path To Migrating Production Applications To Serverless

    Architecture Serhat Can @srhtcn Technical Evangelist @OpsGenie
  2. @srhtcn Disclaimer: We still love AWS Lambda Serverless Turkey meetup

    Advocate for Serverless tech Use AWS Lambda in production A brand-new spinoff: Thundra
  3. @srhtcn Modern incident management platform for operating always-on services •

    Plan and prepare for incidents • Ensure issues are never missed, and the right people are notified • Gain insights to improve your operational efficiency
  4. @srhtcn OpsGenie’s Serverless journey 2015 Writing small scale custom integrations

    At this point, we started leveraging AWS Lambda to help our customer run custom code
  5. @srhtcn OpsGenie’s Serverless journey 2015 Writing small scale custom integrations

    At this point, we started leveraging AWS Lambda to help our customer run custom code First production usage Started using AWS Lambda for leveraging async / not business critical jobs such as DynamoDB autoscale 2016
  6. @srhtcn OpsGenie’s Serverless journey 2015 Writing small scale custom integrations

    At this point, we started leveraging AWS Lambda to help our customer run custom code First production usage Started using AWS Lambda for leveraging async / not business critical jobs such as DynamoDB autoscale 2016 Service and Incident Management A new customer facing feature running on AWS Lambda integrated with the rest of the code base. 2017
  7. @srhtcn OpsGenie’s Serverless journey 2015 Writing small scale custom integrations

    At this point, we started leveraging AWS Lambda to help our customer run custom code First production usage Started using AWS Lambda for leveraging async / not business critical jobs such as DynamoDB autoscale 2016 Service and Incident Management A new customer facing feature running on AWS Lambda integrated with the rest of the code base. 2017 A Spinoff: Thundra Observability for AWS Lambda 2018
  8. @srhtcn Cold start, the effect Caring about an operational concern

    which has nothing to do with you Frustrated users because of slow response Paying more money Timeouts in the calling function
  9. @srhtcn Cold start, the solutions Wait for AWS to improve

    it Increase memory (and pay more) Lightweight application framework instead of Spring Do some smart warm-up https://medium.com/thundra/dealing-with-cold-starts-in-aws-lambda-a5e3aa8f532
  10. @srhtcn Account level concurrent execution limit Lambda concurrent execution count

    for non stream based events: events (or requests) per second * function duration
  11. @srhtcn Account level concurrent execution limit Lambda concurrent execution count

    for non stream based events: events (or requests) per second * function duration Hard to deal with peaks in request numbers Takes time to increase the limit Functions affect each other’s scalability
  12. @srhtcn Latency in a third party can bring your whole

    system down https://read.acloud.guru/does-aws-lambda-keep-its-serverless-marketing-promise-of-continuous-scaling-e990114bb379
  13. @srhtcn Function level concurrent execution limit Limit the scalability of

    non-critical functions Reserved capacity is subtracted from the global limit
  14. @srhtcn Don’t put your functions in a VPC unless you

    have to You need sufficient IP addresses in your subnet and ENI to scale https://docs.aws.amazon.com/lambda/latest/dg/vpc.html Determine the ENI capacity you need: Concurrent executions * (Memory in GB / 3 GB)
  15. @srhtcn Use your 6th sense to debug a scaling issue

    https://docs.aws.amazon.com/lambda/latest/dg/vpc.html
  16. @srhtcn Fixing “it is slow” is harder in AWS Lambda

    Too many moving pieces No way to attach an agent Even how to send the monitoring data is a discussion point
  17. @srhtcn Determine the latency in different levels Automatic instrumentation GC,

    Thread counts & durations, CPU usage details Get the stack trace in case of an error and drill down See logs, traces, and metrics in one view thundra.io What we needed was
  18. @srhtcn Lessons learned: An incident of 40.000$ Avoid infinite retries

    Monitor and alert for pricing (no pricing metric for AWS Lambda) Think of Cloudwatch cost and sample logs & metrics
  19. @srhtcn Tools can and do help, but they can't make

    us care. Containers Will Not Fix Your Broken Culture (and Other Hard Truths) https://queue.acm.org/detail.cfm?id=3185224 - Bridget Kromhout