Engineering Resilient Systems through Cross-Disciplinary insight

What do I know? •  Trolltech, Nokia, CFEngine •  Team
lead, Facilitator •  Couchsurfer •  Taste-Discoverer •  @vhilsheimer

resilience [noun] The physical property of a material that can
return to its original shape or position after deformation that does not exceed its elastic limit The ability of an ecosystem to return to its original state after being disturbed The ability to recover quickly from illness, change, or misfortune.

The Deception

The Reality

Design vs Reality •  Systems as Designed –  State diagrams,
ﬂowcharts, models –  Static and deterministic •  Systems in Reality –  Complex, non-linear –  Dynamic, stochastic, non-deterministic •  And yet, things work most of the time

Why do things NOT go wrong? •  Negatives have only
limited use for improvement –  Stability is not about the absence of something –  It is not something a system has –  It is something a system does •  People hold the inherent imperfection together –  Able to adjust the system beyond its limitations –  Can anticipate, recognize, respond and learn How empowered are the people in your system?

Reliable and Resilient

Resilience is the ability to Anticipate Recognize Respond Learn To
engineer resilient systems we must therefore encourage the principles, methods and behaviors by which these qualities can be brought about.

Resilience in Organizations Patterns •  Try to understand •  Establish
a shared purpose •  Engaging dialog, respect •  Blame-free retrospection •  Autonomy and self-organization •  Promote collaboration, agreement •  Create transparency •  Encourage leadership at all levels •  Take risks and fail fast Anti-Patterns •  Focus on org-charts and hierarchy •  Internal competition •  Weed out failures •  Over-commit, lack focus •  Apply micro-management •  Push change through top-down •  Take past success for granted •  Local optimization •  Don’t take risks

Resilience in Software Projects Patterns •  Meritocratic, responsibility •  Explicit
policies, coding style •  Short cycles creating user value •  Modularity, testable code •  Continuous integration •  Time for refactoring, iterations •  Fix bugs before writing new code •  Build knowledge through code reviews and commit logs Anti-Patterns •  Chief Architect and Planners •  Change Control mechanisms •  Painful release process •  Spaghetti code, complex dependencies •  (Waterfall) Plans over results •  QA organization, late testing •  Knowledge exists outside the code (if at all)

Engineering Resilient IT Systems •  Allow operators to anticipate and
recognize –  Automation frees the operators from mundane tasks –  Connect monitoring to the ontology of the system •  Enable reasoning about the system –  Avoid black boxes – they block mental simulation –  Make knowledge about the system part of the system •  Enable fast failure and fast recovery –  Infrastructure is testable code, speciﬁcation, documentation •  Remove bottlenecks and dependencies –  Loose coupling and voluntary cooperation –  Autonomy, agility

Engineering Resilient IT Systems •  Cross-functional teams with shared goals
–  DevOps, Kanban, Agile •  Design for continuous maintenance –  Not a periodic, planned activity –  Make visible how components can safely be replaced •  Empower the operators –  We must trust them with the controls to our systems –  Use tools and workﬂows that increase conﬁdence, not control

Can we measure resilience? Your system has a resilience score
of 37. To increase the score, you can •  Add more comments and use self-explanatory variable names in your policy code for service “webshop” •  Share more knowledge – only one user has made 39 changes to the conﬁguration of “apache” in the last 7 months •  Increase redundancy – you have a linear dependency between hosts ws1 as1 and db1 for your mission critical service “webshop” •  Increase capacity for host db1 – CPU and disk IO are high at the same time as service “payment service” peaks

Commercial Break We’re hiring! Visit http://cfengine.com/jobs

Engineering Resilient Systems through Cross-Dis...

Engineering Resilient Systems through Cross-Disciplinary insight

Volker Hilsheimer

Other Decks in Technology

Featured

Transcript