A Practical Taxonomy of Bugs and How to Squash Them-RubyConf Italy 2016

2ed277a8bb1cc68c943fd84f3ce32782?s=47 Kylie
November 24, 2016

A Practical Taxonomy of Bugs and How to Squash Them-RubyConf Italy 2016

The slides I used to accompany my presentation "A Practical Taxonomy of Bugs and How to Squash Them" at Ruby Conf Italy 2016. Happy squashing!

2ed277a8bb1cc68c943fd84f3ce32782?s=128

Kylie

November 24, 2016
Tweet

Transcript

  1. 4.
  2. 6.

    “Whenever I see something like this happening, the first thing

    I do is scan the logs to see if this process is completing or is sending a weird message.”
  3. 8.
  4. 10.

    Research Methods • containment sometimes takes priority over squashing •

    we can only work with facts • we can’t squash every bug in this talk
  5. 19.

    Observable Attributes is the bug observable in production? can it

    be reproduced locally? does it seem to be restricted to one area?
  6. 24.

    Reproduction & Resolution replicate locally and in test write the

    simple solution rewrite to be highly readable and extendable UPSETTINGLY OBSERVABLE
  7. 28.

    Observable Attributes how does this work? does this work? wait,

    what is this even testing? did this ever work?
  8. 30.

    Schrödinbug Likes to pretend to be working code. On close

    inspection, reveals itself to be a bug. UPSETTINGLY OBSERVABLE
  9. 35.

    Reproduction & Resolution reproduce the “broken” state locally and in

    test add log statements until you can verify what causes the broken state. if the bug did work at some point, find the point at which it did work. write tests to represent the configuration and flow of the fixed state
  10. 45.

    Reproduction & Resolution use profiling to find the trigger state

    use the app (not fixtures or DB manipulation) to get the data in this state recreate that state in test follow borhbug instruction
  11. 54.

    “The bug is huge and everywhere at once. SQL: could

    not connect to server: Connection refused was bubbling up all over the place. Jobs won’t run, emails won’t send, every submit button on the site fatal errored.” on-call log 24 June 2014 WILDLY CHAOTIC
  12. 56.

    Reproduction & Resolution attempt to connect to server & view

    logs use df -h to find if all the storage is being used can that be restarted, rotated or killed at this time?
  13. 59.
  14. 67.

    Observe & Classify Verify with logging and time travel Verify

    without changing state by profiling Use linux server tools to observe entire process
  15. 69.

    Resources & Further Study • “Linux Debugging Tools I Love”,

    Julia Evans • Systems Performance, Brendan Gregg • Site Reliability Engineering, Betsy Beyer, Chris Jones, Jennifer Petoff, Niall Richard Murphy • “Why Do Computers Stop and What Can Be Done About It?”, Jim Gray • “Debug Patterns for Efficient High- levelSystemC Debugging”, Frank Rogin, Erhard Fehlauer, Christian Haufe, Sebastian Ohnewald
  16. 70.