Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Distributed Tracing in Serverless Systems - KubeCon 2018.pdf

Distributed Tracing in Serverless Systems - KubeCon 2018.pdf

Serverless and FaaS naturally fit microservices architectures. Observability of such systems is very complicated, since each microservice is separated and working asynchronously from the others. Distributed tracing is a key approach to understand such systems. In serverless, there are new challenges and opportunities which make distributed tracing a very interesting and useful technique for high observability.

I will go quickly over the history of tracing and the popular tools. I will focus on the key differences between observability using distributed tracing in generic microservices environments compared to serverless. Examples from the popular cloud vendors will be shown, including full visualization of asynchronous transactions in a highly distributed serverless system, and detecting business flows across multiple asynchronous communication resources (e.g. SNS, Kinesis, and more).

Nitzan Shapira

December 13, 2018
Tweet

More Decks by Nitzan Shapira

Other Decks in Technology

Transcript

  1. What is serverless? How is it different? What is observability

    for serverless? How can distributed tracing help? How will it help my job? 3 Things to discuss
  2. 4 [Compute-as-a-Service] FaaS: Function-as-a-Service CaaS: Container-as-a-Service + Managed services (APIs)

    = Don’t manage infrastructure Focus on business logic What is serverless?
  3. 5 Why serverless? Pay-per-use: reduces cloud compute cost by 90%

    Out-of-the-box auto-scaling DevOps à LowOps ++Developer velocity Focus on business logic – iterate faster Server Utilization
  4. 6 The limitations of FaaS Limited memory Limited running time

    Cold starts Stateless + concurrency limit + some others…
  5. 7 The properties of serverless applications Serverless is micro-services Serverless

    applications are - Highly distributed - Highly event-driven Utilizing managed services via APIs is key
  6. 11 Observability – why do we need it? Track system

    health Troubleshoot and fix Optimize performance and cost
  7. 22 Distributed tracing …a trace tells the story of a

    transaction or workflow as it propagates through a (potentially distributed) system. Distributed tracing is a method used to profile and monitor applications.
  8. 24 Implementing distributed tracing Manual tracing/instrumentation Before/after calls At the

    end of each micro-service High maintenance High potential of errors
  9. 25 Serverless apps are very distributed Complex systems have thousands

    of functions What about the developer velocity?
  10. 34 Scanning functions Scanning CloudWatch using AWS Lambda Every 5

    minutes, save to RDS A new Lambda is spawned for every customer’s function Poll Spawn (async) CloudWatch
  11. 35 As time flies… CloudWatch became highly throttled Requests took

    too much time 5K concurrent Lambdas, for 5 minutes, timing out , every 5 minutes !!!!