Resilient architecture is crucial for all cloud implementations. In this talk, we explore different design patterns to make a distributed application more resilient.

As part of this journey, for any process, we need to ask what if something goes wrong? Then, plan a course of action to the process auto heal without any human intervention and how to lower risks by performing canary deployments. Design starts with at first understanding of requirements and performing empathy map and value chain analysis.

Thinking application as stateless for all the API calls makes the system available most of the time requires creating a cache for common distributed data. Next, we examine how to deal with cascading failures, and timeouts scenarios. Applications, as part of auto-healing, need to Detect, Prevent, Recover, Mitigate, Complement so that the service is resilient.

Key takeaways for the audience are as follows:

Resiliency is essential for any feature in cloud
Understanding the value chain is critical to identify failure points
Challenges come in determining if there is a failure and design the system for auto-healing.
The focus should be first to prevent a failure to occur.
Identifying key challenges in your company and tools and techniques to auto-heal and provide a sustainable solution


Rohit Bhardwaj

March 14, 2019