Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Lessons Learned: Auto-Remediation & Even Driven Automation

Lessons Learned: Auto-Remediation & Even Driven Automation

At StackStorm we have had the chance to speak with many different organization about auto-remediation and some of the difficulties they encountered when trying to implement it. This talk was an overview of some of the most common pitfalls.

Patrick Hoolboom

July 16, 2015
Tweet

More Decks by Patrick Hoolboom

Other Decks in Technology

Transcript

  1. 1
    Event Driven Automation & Auto-
    Remediation
    July 2015

    View full-size slide

  2. 2
    Who Am I?
    @DoriftoShoes
    #irc Freenode: doriftoshoes
    https://linkedin.com/in/pwhoolboom
    Patrick Hoolboom

    10+ Years IT Operations Experience

    Firm Believer in Automating All The Things

    View full-size slide

  3. 3
    What is Event Driven Automation?
    Event driven automation is using system events from your environment
    to initiate automatic responses.
    Complex
    Long
    Running
    Workflows
    Handler
    Scripts Fired
    from
    Monitoring
    Complexity

    Opaque

    Hard to
    standardize

    High
    visibility

    Workflow
    patterns

    View full-size slide

  4. 4
    What is Auto-Remediation?
    Auto-Remediation builds on event driven automation. Outages and
    failures are automatically remediated based on events emitted by
    various systems within the environment.
    Diagnostic
    Workflow
    Remediation
    Workflow
    Alerting
    Host/Service

    View full-size slide

  5. 5
    Event Driven Automation Adoption
    Blockers
    Not many!

    Most organizations are open to the concept of event driven
    automation on a high level.
    Yet Another Tool/Process

    Well established teams do not want to add yet another
    tool/process to their already full plate (Just one more wiki
    article to maintain).

    View full-size slide

  6. 6
    Auto-Remediation Adoption Blockers
    Fear of the Automation (#badauto)

    Everyone has a #badauto story (automation run amok).
    Yet Another Tool/Process

    Well established teams do not want to add yet another
    tool/process to their already full plate (Just one more wiki
    article to maintain).
    Fear of the Change (in process)

    Status quo is comfortable
    Fear of becoming Obsolete

    There is a common misconception that automating IT
    Operations tasks will eliminate the need for certain jobs.

    View full-size slide

  7. 7
    Fear of the Automation
    When in a firefight, what is the first thing most operators do?
    Turn Off The Automation
    But, Why?

    View full-size slide

  8. 8
    Fear of Trust the Automation
    We must learn to trust the automation, not suspect it first.
    Automation needs to earn out trust.
    How?

    Facilitated Troubleshooting

    Full Collaboration with the end users

    Human Controls (a.k.a Nuclear Launch Codes)

    Audit Audit Audit

    View full-size slide

  9. 9
    Fear of Trust in the Change
    Our processes are only as good as the effort put in to them.
    How?

    Living Documentation

    Peer Review

    Participate

    Be Ready to Change Again

    View full-size slide

  10. 10
    Fear of Becoming Obsolete
    Just Don't.

    View full-size slide

  11. 11
    Let's Talk Details
    How to Get Started

    Find the right tools.

    Peer Review Everything

    Don't Overcomplicate Things

    Don't throw away your old tools, but be ready to throw away
    (most of) your process

    View full-size slide

  12. 12
    Find the Right Tools

    Workflow is critical. A good workflow engine, with an easy to
    use DSL.

    Inventory and service discovery will be critical in workflow
    construction

    All tools must allow you to version control their content and
    configuration. Version control the world!

    View full-size slide

  13. 13
    Peer Review Everything

    Proper tooling that can be version controlled makes peer
    review infinitely easier

    Just like code, all processes need to be peer reviewed.
    Formalized reviews allow organizations to iterate across their
    process and improve much faster than they could otherwise

    View full-size slide

  14. 14
    Don't Overcomplicate Things

    If there is an easier route to the remediation, take it.

    Let the automation take care of the low hanging fruit.

    The automations will refine and improve over time.

    View full-size slide

  15. 15
    Don't throw away your old tools
    *but be ready to throw away (most of) your process

    Your new fancy workflow tool isn't meant to replace your
    existing tools (configuration management for example)

    Let the workflow orchestrate the tools you already have

    Your processes will change.

    View full-size slide

  16. StackStorm

    StackStorm is an open source project aimed at providing a unified
    control plane for all of your tools.

    Simple Workflow construction

    Easy integration with existing scripts & tools (1,000+ integrations in
    the StackStorm community repos)

    http://docs.stackstorm.com

    IRC: Freenode#stackstorm

    [email protected]

    View full-size slide