Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Error Handling in Stateless Environments

Error Handling in Stateless Environments

Errors happen in every application. In serverless, additional failures exist - timeouts, out of memory, and more. In many cases, errors are handled by the cloud vendor using a retry mechanism. Since the code is stateless, how can you be sure that the application actually works?

Nitzan Shapira

December 19, 2018
Tweet

More Decks by Nitzan Shapira

Other Decks in Technology

Transcript

  1. What is serverless? How is it different? What is stateless?

    Why is it relevant today? How to handle errors in such environments? How will it help my job? 3 Things to discuss
  2. 4 [Compute-as-a-Service] FaaS: Function-as-a-Service CaaS: Container-as-a-Service + Managed services (APIs)

    = Don’t manage infrastructure Focus on business logic What is serverless?
  3. 5 Why serverless? Pay-per-use: reduces cloud compute cost by 90%

    Out-of-the-box auto-scaling DevOps à LowOps ++Developer velocity Focus on business logic – iterate faster Server Utilization
  4. 6 The limitations of FaaS Limited memory Limited running time

    Cold starts Stateless + concurrency limit + some others…
  5. 7 The properties of serverless applications Serverless is micro-services Serverless

    applications are - Highly distributed - Highly event-driven Utilizing managed services via APIs is key
  6. 13 What do you do when something goes wrong? Take

    a look at the log – it will tell the story Connect to the host! Run a debugger!
  7. 14 Stateless environments challenges Event-drive design No server to connect

    to – difficult to troubleshoot No current state – difficult to determine system health
  8. 16 Types of failures in serverless Unhandled exception – from

    your own code Timeout – due to your code or an external service Out-of-memory – due to your code or misconfiguration
  9. 17 Retries behavior Synchronous events • The invoking application is

    in charge of the error Asynchronous events • A retry mechanism is triggered for a certain period of time Stream-based events • A retry mechanism is triggered until the data is expired
  10. 18 Retry behavior consequences Retries might change the logical flow

    of the application In order for retries to succeed, the code must be idempotent: “Idempotence is the property of certain operations in mathematics and computer science that they can be applied multiple times without changing the result beyond the initial application” (Wikipedia).
  11. 19 Example of idempotent operations Update the same DB entry

    to the same value multiple times Authenticate a user Check if a file exists, and if not, create an empty file with that name Confusing to design, difficult to implement!
  12. 20 Practical methods for retries Write idempotent code Tough! Use

    a proper service Example : AWS Step Functions Source: AWS
  13. 25 Observability – why do we need it? Track system

    health Troubleshoot and fix Optimize performance and cost
  14. 36 Distributed tracing …a trace tells the story of a

    transaction or workflow as it propagates through a (potentially distributed) system. Distributed tracing is a method used to profile and monitor applications.
  15. 38 Implementing distributed tracing Manual tracing/instrumentation Before/after calls At the

    end of each micro-service High maintenance High potential of errors
  16. 39 Serverless apps are very distributed Complex systems have thousands

    of functions What about the developer velocity?