Monitoring Serverless Architectures on AWS Lambda - Nordics Lego themed

8f43892395260c6ad14618987099ddcc?s=47 Serhat Can
February 19, 2019

Monitoring Serverless Architectures on AWS Lambda - Nordics Lego themed

Serverless is different in many ways. How we operate software in a world that we don’t have access to servers with small and focused microservices is changing the traditional monitoring approaches. In this presentation, I’ll talk about challenges and new ways to monitor Serverless architectures.

8f43892395260c6ad14618987099ddcc?s=128

Serhat Can

February 19, 2019
Tweet

Transcript

  1. 1.
  2. 2.

    @srhtcn Monitoring Serverless Architectures on AWS Lambda Technical Evangelist at

    Atlassian AWS Community Hero @srhtcn Serhat Can LEGO brand and images belong to LEGO® lego.com
  3. 5.
  4. 9.

    @srhtcn Opsgenie’s Serverless journey 2015 2016 Small scale custom integrations

    First production usage with async like dynamodb auto scale
  5. 10.

    @srhtcn Opsgenie’s Serverless journey 2015 2016 2017 Small scale custom

    integrations First production usage with async like dynamodb auto scale Customer facing feature “Service and Incident Management”
  6. 11.

    @srhtcn Opsgenie’s Serverless journey 2015 2016 2018 2017 Small scale

    custom integrations First production usage with async like dynamodb auto scale Customer facing feature “Service and Incident Management” Thundra: Observability for AWS Lambda
  7. 16.

    @srhtcn Agenda • Introduction to Serverless computing and AWS Lambda

    • What is available and what is not for monitoring • Monitoring challenges and our solutions
  8. 17.
  9. 19.

    @srhtcn What is Serverless? From IaaS → CaaS → PaaS

    → FaaS Serverless is an event driven, utility based, stateless, code execution environment. Simon Wardley @swardley
  10. 21.

    @srhtcn ServerLess is more Less code to maintain, less ops,

    less toil • Scaling • Provisioning • OS or Language updates • Resource utilization • Network monitoring • Fault tolerance • Shipping logs https://landing.google.com/sre/book/chapters/eliminating-toil.html
  11. 22.

    @srhtcn Economics No payment for idle time or hosting Easy

    to get started Faster time to market
  12. 24.

    @srhtcn AWS Lambda Can be triggered by 20+ different services

    Native support for many languages including Java, Node, .Net Core, Golang, Python Bring any language with Runtime API Layers for sharing code Also take a look at: SAM, AWS Serverless Application Repository, Step Functions, X-Ray
  13. 29.

    @srhtcn Be aware of some challenges Cold start Local development

    Concurrent execution limit Well-known good practices Debugging and monitoring
  14. 31.

    @srhtcn Out of the box basic monitoring • Cloudwatch metrics:

    Invocations, errors (permission, timeout, out of memory etc), concurrent executions
  15. 32.

    @srhtcn Out of the box basic monitoring • Cloudwatch metrics:

    Invocations, errors (permission, timeout, out of memory etc), concurrent executions • Cloudwatch logs: Fast, easy, simple. Now, easier to search with its new interactive log analytics!
  16. 33.

    @srhtcn Out of the box basic monitoring • Cloudwatch metrics:

    Invocations, errors (permission, timeout, out of memory etc), concurrent executions • Cloudwatch logs: Fast, easy, simple. Now, easier to search with its new interactive log analytics! • Distributed tracing with AWS X-Ray Easy to use end to end requests in distributed applications
  17. 34.

    @srhtcn • Logs: application logs • Metrics: CPU, memory etc.

    • Traces ◦ Local: Debugging inside the functions ◦ Distributed: Trace external service calls • Aggregate logs, metrics and traces What data do we need in Serverless monitoring?
  18. 35.

    @srhtcn Why are current solutions not enough? Existing solutions do

    not play well with serverless environments 1. Only sync data senders 2. Only manual instrumentation 3. Distributed tracing for distributed systems
  19. 36.

    @srhtcn 1. Only sync data senders • Longer request duration

    • Stateless environment • Data publish failures • Access within VPC https://medium.com/thundra/4-reasons-why-you-should-publish-monitoring-data-async-in-aws-lambda-fd1e56473941 https://twitter.com/legobatmanmovie
  20. 37.

    @srhtcn Our approach Sync for development environment • Faster to

    send the data • Easier to set up and debug • Cheaper
  21. 38.

    @srhtcn Our approach Async first • Doesn’t block Lambda functions

    • Actual function is not affected • Best option for production Write data as log Data in Cloudwatch Consumer Lambda Thundra
  22. 39.

    @srhtcn 2. Only manual instrumentation • No way to attach

    an agent • Polluting the code • Error-prone • Maintenance burden
  23. 41.

    @srhtcn Our approach - Local tracing for more context •

    High resolution local tracing • Debugging at some sense • Trace external dependencies and 3rd party libraries
  24. 42.
  25. 43.

    @srhtcn 3. Distributed tracing for distributed systems Functions talk with

    other services and functions AWS X-Ray does a good job at tracing external calls Topology detection matters for cost monitoring and optimization
  26. 44.

    @srhtcn Our approach • Local traces with automatic or manual

    instrumentation • Integrates with AWS X-Ray to enrich distributed tracing data • Service map (topology) and distributed tracing is coming this April!
  27. 46.

    @srhtcn We also suggest: • Plugin for reducing cold-starts •

    Open tracing compatible API and data model • Metrics, traces, and logs all in one view • Looking at outliers and go deep in what actually happened • Highlight outliers in a heat map, especially for external resources
  28. 47.

    @srhtcn Serverless computing will become the default computing paradigm of

    the Cloud Era. @srhtcn https://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-3.pdf
  29. 49.