Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Attaining Resiliency - Culture , Tools and Practices

Ranjib Dey
November 17, 2013

Attaining Resiliency - Culture , Tools and Practices

What we have learned during building resilient apps? How cultural and technical aspects matter. How they reflect in the solutions you develop

Ranjib Dey

November 17, 2013
Tweet

More Decks by Ranjib Dey

Other Decks in Technology

Transcript

  1. View Slide

  2. Some context

    View Slide

  3. What is resiliency?

    View Slide

  4. How failures are introduced?

    View Slide

  5. • Human error

    View Slide

  6. • Human error
    • Application error

    View Slide

  7. • Human error
    • Application error
    • External sources

    View Slide

  8. Changing mindset, accept failures
    instead of avoiding

    View Slide

  9. Designing for resiliency

    View Slide

  10. Ephemeral everything

    View Slide

  11. Automation is an asymptotic
    phenomenon

    View Slide

  12. • See if you can do it
    manually

    View Slide

  13. • See if you can do it manually
    • Build tools to adopt semi
    automatic workflows

    View Slide

  14. • See if you can do it manually
    • Build tools to adopt semi
    automatic workflows
    • Remember not to introduce a
    dead end

    View Slide

  15. Resiliency in different application
    tiers

    View Slide

  16. • Application tier

    View Slide

  17. • Application tier
    • Persistence tier

    View Slide

  18. • Application tier
    • Persistence tier

    View Slide

  19. • Application tier
    • Persistence tier
    • Distributes systems (2PC,
    RAFT, Paxos etc)

    View Slide

  20. Patterns for distributed systems

    View Slide

  21. Avoid cascading failures

    View Slide

  22. • Automation will increase
    risk of failures also

    View Slide

  23. • Automation will increase risk of
    failures also
    • Components, allow independent
    failures

    View Slide

  24. • Automation will increase risk of
    failures also
    • Components, allow independent
    failures
    • Contain failures

    View Slide

  25. Supporting degraded modes

    View Slide

  26. Metrics for everything

    View Slide

  27. • Metrics from app, tools and
    integrations

    View Slide

  28. • Metrics from app, tools and
    integrations
    • Logging and metrics

    View Slide

  29. The cultural aspects

    View Slide

  30. Everyone can be on call

    View Slide

  31. Build resiliency in human side

    View Slide

  32. Build safety nets

    View Slide

  33. • Test apps

    View Slide

  34. • Test apps
    • Test infrastructure

    View Slide

  35. • Test apps
    • Test infrastructure
    • Test integrations

    View Slide

  36. Inject failures

    View Slide

  37. Concluding thoughts

    View Slide

  38. • Failures are inevitable,
    accept them

    View Slide

  39. • Failures are inevitable,
    accept them
    • It’s a mindset. That needs
    to be reflected in tools and
    culture

    View Slide

  40. Thank You
    @RanjibDey

    View Slide