ChatOps for Incidents

ChatOps for Incidents

Culmination of many previous presentations. Done for Rx Savings Solutions in Overland Park, a company currently going through explosive growth.

260a95e08b7880ecd76b964203f25c87?s=128

Aaron Blythe

October 05, 2018
Tweet

Transcript

  1. 6.
  2. 8.
  3. 10.

    Best Practices ! Say who you are and what team

    you are on ! Threaded Conversations ! Set Away Settings ! Star your favorite Channels ! @ mention people ! Manage Notifications - Preferences
  4. 11.

    Simple Features ! Star vs. Pinned ! View Members !

    Reactions ! Activity (the @ sign in upper right) ! Snooze notifications ! All Threads
  5. 12.

    Neat Features ! Simple Slack Bot - Wifi, Apples ->

    How do you like them apples? ! Integrations - Pipeline Gitlab ! I would like to add the Jira Integration and Dynatrace Integrations soon ! Alexa connect in for PagerDuty and New Relic
  6. 14.

    Maturity with Chat 1. Some teams start getting accustomed to

    chat 2. Design and/or long term technical topic channels form 3. Useful apps start to be installed that behave like command line apps 4. Incidents are being ran on public channels
  7. 18.

    Things that annoy me in Incidents • Trying to figure

    out what has been done so far (if I don’t know yet) • Re-hashing what we know (if I am the one that knows)
  8. 20.

    Matty Stratton - PagerDuty, Arrested DevOps Podcast https://noti.st/mattstratton/rZ8NCv Matty Stratton

    - Incidents and Accidents • Have clearly defined roles • Rules change when you go from Normal to Emergency 
 • Post incident criteria widely —> Do not litigate during the call 

  9. 22.

    Service Level (from Google SRE Book) • SLI - Service

    Level Indicator • Example: Status code is 200 • SLO - Service Level Objective • Example: Service is available 99.9% • SLA - Service Level Agreement • Example: Partial subscription fee refunded if not 99% availability met • NOTE: Should be less than SLO so you have space • Click here: https://twitter.com/rakyll/status/974826146343788544?lang=en
  10. 23.
  11. 24.
  12. 28.
  13. 30.

    Slack Integrations • Benefits • Quick Setup (minimal configuration) •

    Often managed by company that owns integration • Drawbacks • Often simplistic workflow
  14. 31.
  15. 35.
  16. 47.
  17. 52.

    SLAPI Bot (Slack API) • Why? • Take advantage of

    Slack API (Hubot is least common denom.) • Language agnostic plugins • Docker as packaging system • https://github.com/ImperialLabs/slapi
  18. 54.

    • Practice in a non-stressful situation • In an incident

    - don’t use “can someone…?” • In a post-mortem - stop on each “could have”, “should have” or “would have” • Continuous Improvement just like any other part of Software Engineering • Automate the Mundane • Spending time, Saves Time