Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Lessons Learned: Auto-Remediation & Even Driven Automation

Lessons Learned: Auto-Remediation & Even Driven Automation

At StackStorm we have had the chance to speak with many different organization about auto-remediation and some of the difficulties they encountered when trying to implement it. This talk was an overview of some of the most common pitfalls.


Patrick Hoolboom

July 16, 2015


  1. 1 Event Driven Automation & Auto- Remediation July 2015

  2. 2 Who Am I? @DoriftoShoes #irc Freenode: doriftoshoes https://linkedin.com/in/pwhoolboom Patrick

    Hoolboom • 10+ Years IT Operations Experience • Firm Believer in Automating All The Things
  3. 3 What is Event Driven Automation? Event driven automation is

    using system events from your environment to initiate automatic responses. Complex Long Running Workflows Handler Scripts Fired from Monitoring Complexity • Opaque • Hard to standardize • High visibility • Workflow patterns
  4. 4 What is Auto-Remediation? Auto-Remediation builds on event driven automation.

    Outages and failures are automatically remediated based on events emitted by various systems within the environment. Diagnostic Workflow Remediation Workflow Alerting Host/Service
  5. 5 Event Driven Automation Adoption Blockers Not many! • Most

    organizations are open to the concept of event driven automation on a high level. Yet Another Tool/Process • Well established teams do not want to add yet another tool/process to their already full plate (Just one more wiki article to maintain).
  6. 6 Auto-Remediation Adoption Blockers Fear of the Automation (#badauto) •

    Everyone has a #badauto story (automation run amok). Yet Another Tool/Process • Well established teams do not want to add yet another tool/process to their already full plate (Just one more wiki article to maintain). Fear of the Change (in process) • Status quo is comfortable Fear of becoming Obsolete • There is a common misconception that automating IT Operations tasks will eliminate the need for certain jobs.
  7. 7 Fear of the Automation When in a firefight, what

    is the first thing most operators do? Turn Off The Automation But, Why?
  8. 8 Fear of Trust the Automation We must learn to

    trust the automation, not suspect it first. Automation needs to earn out trust. How? • Facilitated Troubleshooting • Full Collaboration with the end users • Human Controls (a.k.a Nuclear Launch Codes) • Audit Audit Audit
  9. 9 Fear of Trust in the Change Our processes are

    only as good as the effort put in to them. How? • Living Documentation • Peer Review • Participate • Be Ready to Change Again
  10. 10 Fear of Becoming Obsolete Just Don't.

  11. 11 Let's Talk Details How to Get Started • Find

    the right tools. • Peer Review Everything • Don't Overcomplicate Things • Don't throw away your old tools, but be ready to throw away (most of) your process
  12. 12 Find the Right Tools • Workflow is critical. A

    good workflow engine, with an easy to use DSL. • Inventory and service discovery will be critical in workflow construction • All tools must allow you to version control their content and configuration. Version control the world!
  13. 13 Peer Review Everything • Proper tooling that can be

    version controlled makes peer review infinitely easier • Just like code, all processes need to be peer reviewed. Formalized reviews allow organizations to iterate across their process and improve much faster than they could otherwise
  14. 14 Don't Overcomplicate Things • If there is an easier

    route to the remediation, take it. • Let the automation take care of the low hanging fruit. • The automations will refine and improve over time.
  15. 15 Don't throw away your old tools *but be ready

    to throw away (most of) your process • Your new fancy workflow tool isn't meant to replace your existing tools (configuration management for example) • Let the workflow orchestrate the tools you already have • Your processes will change.
  16. StackStorm • StackStorm is an open source project aimed at

    providing a unified control plane for all of your tools. • Simple Workflow construction • Easy integration with existing scripts & tools (1,000+ integrations in the StackStorm community repos) • http://docs.stackstorm.com • IRC: Freenode#stackstorm • support@stackstorm.com