Upgrade to Pro — share decks privately, control downloads, hide ads and more …

RailsConf 2016 Talk - Your software is broken — pay attention: Rethinking production monitoring

RailsConf 2016 Talk - Your software is broken — pay attention: Rethinking production monitoring

Your team has been tasked with releasing new and better versions of your product at record speed. But the risk of moving quickly is things break in production and users abandon your buggy app. To stay competitive, you can't just ship fast - you also have to solve for quality.

We'll rethink what it means to actively monitor your application in production so your team can ship fast with confidence. With the right tooling, workflow, and organizational structures, you don't have to sacrifice release times or stability. When things break, you'll be able to fix errors before they impact your users.

20171a64471be602e50c8ba7662313af?s=128

James Smith

May 06, 2016
Tweet

More Decks by James Smith

Other Decks in Technology

Transcript

  1. RETHINKING PRODUCTION MONITORING YOUR SOFTWARE IS BROKEN — PAY ATTENTION

  2. JAMES SMITH @loopj loopj

  3. None
  4. CODE TEST DEPLOY YOLO ¯\_(ϑ)_/¯

  5. CODE TEST DEPLOY YOLO CODE TEST DEPLOY CONFIDENCE ¯\_(ϑ)_/¯ :)

  6. STABILITY PERFORMANCE AVAILABILITY

  7. DELIVERING AN AWESOME EXPERIENCE TO CUSTOMERS

  8. WHY MONITORING MATTERS

  9. YOUR APP WILL LIVE OR DIE BASED ON ITS QUALITY

    — CUSTOMERS HAVE A CHOICE
  10. 84% OF USERS ABANDON AFTER TWO CRASHES

  11. 49% OF ENGINEERING TIME FINDING & FIXING BUGS

  12. SINS OF PRODUCTION MONITORING WHAT AM I DOING WRONG?

  13. 1. PRETENDING NOTHING IS WRONG

  14. “But I’ve written tests!” “The QA Team will check that!”

    “Works great for me!”
  15. 2. WAITING FOR CUSTOMERS TO COMPLAIN

  16. “Nobody complained so everything must be OK”

  17. 3. LACK OF VISIBILITY

  18. “We’ll just check the logs” “Did you remember to add

    a log statement?”
  19. 4. LACK OF OWNERSHIP

  20. “Not my problem!” “I’ve got a feature to ship” “My

    code works fine”
  21. HOW CAN WE DO BETTER?

  22. ACCEPT AUTOMATE AGGREGATE NOTIFY PRIORITIZE DIAGNOSE TEND CORE PRINCIPLES OF

    PRODUCTION MONITORING
  23. 1. ACCEPT ACCEPT THAT YOUR SOFTWARE WILL BREAK AFTER SHIPPING

  24. 2. AUTOMATE ADD HOOKS TO DETECT CRASHES/ERRORS/ISSUES IN PRODUCTION

  25. 3. AGGREGATE DON'T JUST HAVE A STREAM OF EVENTS -

    GROUP LIKE ISSUES TOGETHER
  26. 4. NOTIFY ALERT YOUR DEV TEAM WHERE THEY ALREADY COMMUNICATE

  27. 5. PRIORITIZE YOU CAN'T FIX EVERY ERROR - SO FOCUS

    ON THE MOST HARMFUL ONES
  28. 6. DIAGNOSE KNOWING ABOUT ISSUES ISN'T ENOUGH - THEY MUST

    BE ACTIONABLE
  29. 7. TEND MAKE AN ORGANIZATIONAL CHANGE - SOMEONE NEEDS TO

    CARE ABOUT ERRORS
  30. TAKING ACTION

  31. TOOLS

  32. USES “FAILURE” HOOKS

  33. ASSESS IMPACT

  34. ASSESS SEVERITY

  35. CAPTURES DIAGNOSTIC DATA

  36. WORKFLOW

  37. USE TEAM CHAT

  38. EMBRACE COLLABORATION

  39. TRACK PROGRESS OF FIXES

  40. TEAM STRUCTURES

  41. EMBRACE RAPID ITERATION

  42. CREATE A “BUG TEAM”

  43. OR CREATE A “BUG ROTATION”

  44. OR KNOW “WHO LAST TOUCHED THIS CODE”?

  45. TL;DR

  46. AVOID THE SINS

  47. EMBRACE CORE PRINCIPLES

  48. TAKE ACTION

  49. THANK YOU!

  50. QUESTIONS? @loopj 2 months free bugsnag @ bugsnag.com/p/railsconf2016 come and

    say hi at our booth - grab a rails pin or sticker