What we have learned during building resilient apps? How cultural and technical aspects matter. How they reflect in the solutions you develop
View Slide
Some context
What is resiliency?
How failures are introduced?
• Human error
• Human error• Application error
• Human error• Application error• External sources
Changing mindset, accept failuresinstead of avoiding
Designing for resiliency
Ephemeral everything
Automation is an asymptoticphenomenon
• See if you can do itmanually
• See if you can do it manually• Build tools to adopt semiautomatic workflows
• See if you can do it manually• Build tools to adopt semiautomatic workflows• Remember not to introduce adead end
Resiliency in different applicationtiers
• Application tier
• Application tier• Persistence tier
• Application tier• Persistence tier• Distributes systems (2PC,RAFT, Paxos etc)
Patterns for distributed systems
Avoid cascading failures
• Automation will increaserisk of failures also
• Automation will increase risk offailures also• Components, allow independentfailures
• Automation will increase risk offailures also• Components, allow independentfailures• Contain failures
Supporting degraded modes
Metrics for everything
• Metrics from app, tools andintegrations
• Metrics from app, tools andintegrations• Logging and metrics
The cultural aspects
Everyone can be on call
Build resiliency in human side
Build safety nets
• Test apps
• Test apps• Test infrastructure
• Test apps• Test infrastructure• Test integrations
Inject failures
Concluding thoughts
• Failures are inevitable,accept them
• Failures are inevitable,accept them• It’s a mindset. That needsto be reflected in tools andculture
Thank You@RanjibDey