Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Attaining Resiliency - Culture , Tools and Practices

7090d58d804c96911a37c84e4e90a9cf?s=47 Ranjib Dey
November 17, 2013

Attaining Resiliency - Culture , Tools and Practices

What we have learned during building resilient apps? How cultural and technical aspects matter. How they reflect in the solutions you develop

7090d58d804c96911a37c84e4e90a9cf?s=128

Ranjib Dey

November 17, 2013
Tweet

More Decks by Ranjib Dey

Other Decks in Technology

Transcript

  1. None
  2. Some context

  3. What is resiliency?

  4. How failures are introduced?

  5. • Human error

  6. • Human error • Application error

  7. • Human error • Application error • External sources

  8. Changing mindset, accept failures instead of avoiding

  9. Designing for resiliency

  10. Ephemeral everything

  11. Automation is an asymptotic phenomenon

  12. • See if you can do it manually

  13. • See if you can do it manually • Build

    tools to adopt semi automatic workflows
  14. • See if you can do it manually • Build

    tools to adopt semi automatic workflows • Remember not to introduce a dead end
  15. Resiliency in different application tiers

  16. • Application tier

  17. • Application tier • Persistence tier

  18. • Application tier • Persistence tier

  19. • Application tier • Persistence tier • Distributes systems (2PC,

    RAFT, Paxos etc)
  20. Patterns for distributed systems

  21. Avoid cascading failures

  22. • Automation will increase risk of failures also

  23. • Automation will increase risk of failures also • Components,

    allow independent failures
  24. • Automation will increase risk of failures also • Components,

    allow independent failures • Contain failures
  25. Supporting degraded modes

  26. Metrics for everything

  27. • Metrics from app, tools and integrations

  28. • Metrics from app, tools and integrations • Logging and

    metrics
  29. The cultural aspects

  30. Everyone can be on call

  31. Build resiliency in human side

  32. Build safety nets

  33. • Test apps

  34. • Test apps • Test infrastructure

  35. • Test apps • Test infrastructure • Test integrations

  36. Inject failures

  37. Concluding thoughts

  38. • Failures are inevitable, accept them

  39. • Failures are inevitable, accept them • It’s a mindset.

    That needs to be reflected in tools and culture
  40. Thank You @RanjibDey