Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Rocky Path To Migrating Production Applications To Serverless Architecture

The Rocky Path To Migrating Production Applications To Serverless Architecture

Challenges of running Serverless applications in production, experiences at OpsGenie

8f43892395260c6ad14618987099ddcc?s=128

Serhat Can

June 28, 2018
Tweet

Transcript

  1. @srhtcn The Rocky Path To Migrating Production Applications To Serverless

    Architecture Serhat Can @srhtcn Technical Evangelist @OpsGenie
  2. @srhtcn Disclaimer: We still love AWS Lambda Serverless Turkey meetup

    Advocate for Serverless tech Use AWS Lambda in production A brand-new spinoff: Thundra
  3. @srhtcn Modern incident management platform for operating always-on services •

    Plan and prepare for incidents • Ensure issues are never missed, and the right people are notified • Gain insights to improve your operational efficiency
  4. @srhtcn Engineering grew from 3 to 60 people

  5. @srhtcn https://twitter.com/kelseyhightower/status/998977286895423489

  6. @srhtcn https://twitter.com/kelseyhightower/status/998977286895423489

  7. @srhtcn The promise of AWS Lambda

  8. @srhtcn OpsGenie’s Serverless journey 2015 Writing small scale custom integrations

    At this point, we started leveraging AWS Lambda to help our customer run custom code
  9. @srhtcn OpsGenie’s Serverless journey 2015 Writing small scale custom integrations

    At this point, we started leveraging AWS Lambda to help our customer run custom code First production usage Started using AWS Lambda for leveraging async / not business critical jobs such as DynamoDB autoscale 2016
  10. @srhtcn OpsGenie’s Serverless journey 2015 Writing small scale custom integrations

    At this point, we started leveraging AWS Lambda to help our customer run custom code First production usage Started using AWS Lambda for leveraging async / not business critical jobs such as DynamoDB autoscale 2016 Service and Incident Management A new customer facing feature running on AWS Lambda integrated with the rest of the code base. 2017
  11. @srhtcn OpsGenie’s Serverless journey 2015 Writing small scale custom integrations

    At this point, we started leveraging AWS Lambda to help our customer run custom code First production usage Started using AWS Lambda for leveraging async / not business critical jobs such as DynamoDB autoscale 2016 Service and Incident Management A new customer facing feature running on AWS Lambda integrated with the rest of the code base. 2017 A Spinoff: Thundra Observability for AWS Lambda 2018
  12. @srhtcn OpsGenie stack Java 8

  13. @srhtcn OpsGenie stack Java 8 AWS services including “Serverless” DynamoDB,

    SQS
  14. @srhtcn

  15. @srhtcn Cold start Photo by Chris Marquardt on Unsplash

  16. @srhtcn Cold start, why https://engineering.opsgenie.com/what-is-different-in-the-serverless-world-b9e0f68de191

  17. @srhtcn Cold start, when Memory size Code size VPC Classpath

    scan The language choice
  18. @srhtcn Cold start https://read.acloud.guru/does-coding-language-memory-or-package-size-affect-cold-starts-of-aws-lambda-a15e26d12c76

  19. @srhtcn Cold start, the effect Caring about an operational concern

    which has nothing to do with you Frustrated users because of slow response Paying more money Timeouts in the calling function
  20. @srhtcn Cold start, the solutions Wait for AWS to improve

    it Increase memory (and pay more) Lightweight application framework instead of Spring Do some smart warm-up https://medium.com/thundra/dealing-with-cold-starts-in-aws-lambda-a5e3aa8f532
  21. @srhtcn Scaling Photo by Vladimir Riabinin on Unsplash

  22. @srhtcn Functions scale nicely

  23. @srhtcn Functions scale nicely until they don’t

  24. @srhtcn Account level concurrent execution limit Lambda concurrent execution count

    for non stream based events: events (or requests) per second * function duration
  25. @srhtcn Account level concurrent execution limit Lambda concurrent execution count

    for non stream based events: events (or requests) per second * function duration Hard to deal with peaks in request numbers Takes time to increase the limit Functions affect each other’s scalability
  26. @srhtcn Latency in a third party can bring your whole

    system down https://read.acloud.guru/does-aws-lambda-keep-its-serverless-marketing-promise-of-continuous-scaling-e990114bb379
  27. @srhtcn Function level concurrent execution limit Limit the scalability of

    non-critical functions Reserved capacity is subtracted from the global limit
  28. @srhtcn Don’t put your functions in a VPC unless you

    have to You need sufficient IP addresses in your subnet and ENI to scale https://docs.aws.amazon.com/lambda/latest/dg/vpc.html Determine the ENI capacity you need: Concurrent executions * (Memory in GB / 3 GB)
  29. @srhtcn Use your 6th sense to debug a scaling issue

    https://docs.aws.amazon.com/lambda/latest/dg/vpc.html
  30. @srhtcn Photo by Anna on Unsplash Observability

  31. @srhtcn Fixing “it is slow” is harder in AWS Lambda

  32. @srhtcn Fixing “it is slow” is harder in AWS Lambda

    Too many moving pieces No way to attach an agent Even how to send the monitoring data is a discussion point
  33. @srhtcn Determine the latency in different levels Automatic instrumentation GC,

    Thread counts & durations, CPU usage details Get the stack trace in case of an error and drill down See logs, traces, and metrics in one view thundra.io What we needed was
  34. @srhtcn Event driven Photo by Ian Froome on Unsplash

  35. @srhtcn You got an unexpected bill from AWS?

  36. @srhtcn An incident of 40.000$

  37. @srhtcn Lessons learned: An incident of 40.000$ Avoid infinite retries

    Monitor and alert for pricing (no pricing metric for AWS Lambda) Think of Cloudwatch cost and sample logs & metrics
  38. @srhtcn Functions will be triggered more than once. Design idempotent

    functions considering the trigger type
  39. @srhtcn https://www.stackery.io/blog/self-healing-serverless-applications-part-1-of-3/

  40. @srhtcn Tools can and do help, but they can't make

    us care. Containers Will Not Fix Your Broken Culture (and Other Hard Truths) https://queue.acm.org/detail.cfm?id=3185224 - Bridget Kromhout
  41. @srhtcn Thank you! Serhat Can @srhtcn