The Rocky Path To Migrating Production Applications To Serverless Architecture

@srhtcn The Rocky Path To Migrating Production Applications To Serverless
Architecture Serhat Can @srhtcn Technical Evangelist @OpsGenie

@srhtcn Disclaimer: We still love AWS Lambda Serverless Turkey meetup
Advocate for Serverless tech Use AWS Lambda in production A brand-new spinoff: Thundra

@srhtcn Modern incident management platform for operating always-on services •
Plan and prepare for incidents • Ensure issues are never missed, and the right people are notified • Gain insights to improve your operational efficiency

@srhtcn Engineering grew from 3 to 60 people

@srhtcn https://twitter.com/kelseyhightower/status/998977286895423489

@srhtcn The promise of AWS Lambda

@srhtcn OpsGenie’s Serverless journey 2015 Writing small scale custom integrations
At this point, we started leveraging AWS Lambda to help our customer run custom code

At this point, we started leveraging AWS Lambda to help our customer run custom code First production usage Started using AWS Lambda for leveraging async / not business critical jobs such as DynamoDB autoscale 2016

At this point, we started leveraging AWS Lambda to help our customer run custom code First production usage Started using AWS Lambda for leveraging async / not business critical jobs such as DynamoDB autoscale 2016 Service and Incident Management A new customer facing feature running on AWS Lambda integrated with the rest of the code base. 2017

At this point, we started leveraging AWS Lambda to help our customer run custom code First production usage Started using AWS Lambda for leveraging async / not business critical jobs such as DynamoDB autoscale 2016 Service and Incident Management A new customer facing feature running on AWS Lambda integrated with the rest of the code base. 2017 A Spinoff: Thundra Observability for AWS Lambda 2018

@srhtcn OpsGenie stack Java 8

@srhtcn OpsGenie stack Java 8 AWS services including “Serverless” DynamoDB,
SQS

@srhtcn

@srhtcn Cold start Photo by Chris Marquardt on Unsplash

@srhtcn Cold start, why https://engineering.opsgenie.com/what-is-different-in-the-serverless-world-b9e0f68de191

@srhtcn Cold start, when Memory size Code size VPC Classpath
scan The language choice

@srhtcn Cold start https://read.acloud.guru/does-coding-language-memory-or-package-size-affect-cold-starts-of-aws-lambda-a15e26d12c76

@srhtcn Cold start, the effect Caring about an operational concern
which has nothing to do with you Frustrated users because of slow response Paying more money Timeouts in the calling function

@srhtcn Cold start, the solutions Wait for AWS to improve
it Increase memory (and pay more) Lightweight application framework instead of Spring Do some smart warm-up https://medium.com/thundra/dealing-with-cold-starts-in-aws-lambda-a5e3aa8f532

@srhtcn Scaling Photo by Vladimir Riabinin on Unsplash

@srhtcn Functions scale nicely

@srhtcn Functions scale nicely until they don’t

@srhtcn Account level concurrent execution limit Lambda concurrent execution count
for non stream based events: events (or requests) per second * function duration

@srhtcn Account level concurrent execution limit Lambda concurrent execution count
for non stream based events: events (or requests) per second * function duration Hard to deal with peaks in request numbers Takes time to increase the limit Functions affect each other’s scalability

@srhtcn Latency in a third party can bring your whole
system down https://read.acloud.guru/does-aws-lambda-keep-its-serverless-marketing-promise-of-continuous-scaling-e990114bb379

@srhtcn Function level concurrent execution limit Limit the scalability of
non-critical functions Reserved capacity is subtracted from the global limit

@srhtcn Don’t put your functions in a VPC unless you
have to You need sufficient IP addresses in your subnet and ENI to scale https://docs.aws.amazon.com/lambda/latest/dg/vpc.html Determine the ENI capacity you need: Concurrent executions * (Memory in GB / 3 GB)

@srhtcn Use your 6th sense to debug a scaling issue
https://docs.aws.amazon.com/lambda/latest/dg/vpc.html

@srhtcn Photo by Anna on Unsplash Observability

@srhtcn Fixing “it is slow” is harder in AWS Lambda

@srhtcn Fixing “it is slow” is harder in AWS Lambda
Too many moving pieces No way to attach an agent Even how to send the monitoring data is a discussion point

@srhtcn Determine the latency in different levels Automatic instrumentation GC,
Thread counts & durations, CPU usage details Get the stack trace in case of an error and drill down See logs, traces, and metrics in one view thundra.io What we needed was

@srhtcn Event driven Photo by Ian Froome on Unsplash

@srhtcn You got an unexpected bill from AWS?

@srhtcn An incident of 40.000$

@srhtcn Lessons learned: An incident of 40.000$ Avoid infinite retries
Monitor and alert for pricing (no pricing metric for AWS Lambda) Think of Cloudwatch cost and sample logs & metrics

@srhtcn Functions will be triggered more than once. Design idempotent
functions considering the trigger type

@srhtcn https://www.stackery.io/blog/self-healing-serverless-applications-part-1-of-3/

@srhtcn Tools can and do help, but they can't make
us care. Containers Will Not Fix Your Broken Culture (and Other Hard Truths) https://queue.acm.org/detail.cfm?id=3185224 - Bridget Kromhout

@srhtcn Thank you! Serhat Can @srhtcn

The Rocky Path To Migrating Production Applicat...

The Rocky Path To Migrating Production Applications To Serverless Architecture

More Decks by Serhat Can

Other Decks in Technology

Featured

Transcript