Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Ignite Talk San Jose

Ignite Talk San Jose

The Case for Chaos: Thinking About Failure Holistically


Pat Higgins

June 12, 2018


  1. T H I N K I N G A B

    O U T FA I L U R E H O L I S T I C A L LY T H E C A S E F O R C H A O S :
  2. None
  3. ~ W H O A M I Patrick Higgins 

    @higgyCodes UI Engineer @ Gremlin
  4. G R E M L I N • We are

    practitioners of Chaos Engineering • We build tools that help people develop resilient systems • We offer “resiliency as a service”
  5. – K O LT O N A N D R

    U S “Thoughtful, planned experiments designed to reveal the weaknesses in our system”
  6. N E W E M P L O Y E

    E • Wanted to prove myself and ship lots of features. • First iteration of support for containers. • Metrics dashboard for attacks.
  7. D O G F O O D I N G

    • The process of using your own product. • In our case it meant introducing failure in our own system. • “Failure Fridays”
  8. FA I L U R E F R I D

    AY WA S A G A M E D AY – H O M I N G L I Dedicated time for teams to collaboratively focus on using Chaos Engineering practices to reveal weaknesses in your services
  9. R U N N I N G G A M

    E D AY S • We were setting hypotheses regarding potential outcomes of our experiments. • We had mapped out our system architecture • Picked specific areas to test, started small, and iterated
  10. O U T C O M E => =>

  11. R E M E D I AT I O N

  12. R E A L I Z AT I O N

    • Failure Fridays were a forcing function • I was beginning to uncover brittleness in the UI • I was also beginning to understand our system at a deeper level if ( ) {
  13. FA I L U R E A N D U

    I • Data and functionality can be categorized: ✴ Critical User Paths ✴ Auxiliary Paths
  14. FA I L U R E A N D U

    I ❤
  15. None
  16. C H A O S E N G I N

    E E R I N G A N D U I • Graceful degradation currently pertains to browser compatibility or bandwidth. • OSS tooling around failure mitigation in UI is underdeveloped. • Tooling is regularly company specific.
  17. C H A O S E N G I N

    E E R I N G A N D P R O D U C T • Mapping out potential alternative states (reroute, retry) • Product specs that include comprehensive responses to failure scenarios are (anecdotally) rare
  18. E V E RY B O D Y S H

    O U L D G A M E D AY • Everybody benefits from observing failure • Find your champions across the company • Encourages varied perspectives on failure mitigation
  19. T H A N K S ! Patrick Higgins 

    @higgyCodes UI Engineer @ Gremlin