Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ChatOps at Shopify: Inviting Bots in Our Day-to-day Operations

ChatOps at Shopify: Inviting Bots in Our Day-to-day Operations

ChatOps has already been identified as instrumental for DevOps success. In this talk, I will describe how we use chatbots to accelerate developer onboarding, increase developer productivity and manage service disruption incidents. ChatOps is about bringing tools into your conversations and using them to interact with the infrastructure. It traditionally combines a chatbot, key plugins and scripts. I will describe how we integrate these to perform actions related to the infrastructure such as rebalancing traffic, querying the infrastructure state, and other various actions.

Daniella Niyonkuru

November 01, 2017
Tweet

More Decks by Daniella Niyonkuru

Other Decks in Technology

Transcript

  1. # defining a new command command :find_answer, 'answer', help: 'the

    answer to life, universe, and everything' def find_answer reply(42) end $ spy find_answer 42 $ spy help find_answer the answer to life, universe, and everything Adding Commands
  2. Region Host Web Server Load Balancers Host Job Server Host

    Web Server Hosts Web Servers Host Job Server Hosts Job Servers Host DB Standby The Internet Host DB Reader Load Balancers Host DB Writer Edge Router Edge Router Region Host Web Server Load Balancers Host Job Server Host Web Server Hosts Web Servers Host Job Server Hosts Job Servers Host DB Standby Host DB Reader Load Balancers Host DB Writer Edge Router Edge Router A Global Scale Resilient Web App CDN
  3. Region Host Web Server Load Balancers Host Web Server Shared

    Workers Pod 2 Pods Load Balancers Pod N Pod 5 Pod 9 Redis Pod N Shared Workers Dedicated Workers Dedicated Workers Memcache MySQL Dedicated Workers Active Passive
  4. ➔ Shit breaks ➔ Detection ➔ Start Incident ➔ Communicate

    ➔ Fix ➔ Stop Incident ➔ Document (Service Disruption) ➔ Investigation ➔ Root Cause Analysis (RCA) ➔ Action Items ➔ Resolution Incident Response
  5. Third Party Services ➔ spy status ➔ spy status :provider

    :status for :feature ➔ spy pager imoc res 123
  6. Reminders when: [30, stop] command: :check_status_page - when: 120 command:

    :notify_support_atc message: 'Spy has notified the Support Response Manager (SRM) on your behalf.' - when: 120 command: :srm_fill_out_doc - when: 300 message: 'You should coordinate external comms with the support incident responder.’ - when: 600 command: :srm_checking_in - when: [3600] command: :notify_imoc_team - when: stop message: 'Please create a Service Disruptions report.
  7. Hit The Ground Running • spy github add user :user

    :team • spy circle add my_new_shiny_project • spy buildkite add my_new_shiny_repo • spy shipit lock :stack *message
  8. • Increased sharing and focus • Shortened feedback loop •

    Eliminated manual toil • Smoother incident handling • Faster onboarding experience But, we have also learned ...