Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Practical Taxonomy of Bugs and How to Squash ...

A Practical Taxonomy of Bugs and How to Squash Them-We Rise 2017

These are the slides I used for We Rise Tech, Women Who Code Atlanta's first conference.

Kylie

June 23, 2017
Tweet

More Decks by Kylie

Other Decks in Programming

Transcript

  1. “Whenever I see something like this happening, the first thing

    I do is scan the logs to see if this process is completing or is sending a weird message.”
  2. “Whenever I see something like this happening, the first thing

    I do is scan the logs to see if this process is completing or is sending a weird message.”
  3. Research Methods • containment sometimes takes priority over squashing •

    we can only work with facts • we can’t squash every bug in this talk
  4. Observable Attributes is the bug observable in production? can it

    be reproduced locally? does it seem to be restricted to one area?
  5. Reproduction & Resolution replicate locally and in test write the

    simple solution rewrite to be highly readable and extendable UPSETTINGLY OBSERVABLE
  6. Observable Attributes how does this work? does this work? wait,

    what is this even testing? did this ever work?
  7. Schrödinbug Likes to pretend to be working code. On close

    inspection, reveals itself to be a bug. UPSETTINGLY OBSERVABLE
  8. Schrödinbug Type I. In the wild: UI shows update but

    database entry not updated. UPSETTINGLY OBSERVABLE
  9. Basic Reproduction & Resolution replicate locally and in test write

    the simple solution rewrite to be highly readable and extendable UPSETTINGLY OBSERVABLE
  10. Reproduction & Resolution reproduce the “broken” state locally and in

    test add log statements until you can verify what causes the broken state.
  11. Reproduction & Resolution reproduce the “broken” state locally and in

    test add log statements until you can verify what causes the broken state. if the bug did work at some point, find the point at which it did work.
  12. Reproduction & Resolution reproduce the “broken” state locally and in

    test add log statements until you can verify what causes the broken state. if the bug did work at some point, find the point at which it did work. write tests to represent the configuration and flow of the fixed state
  13. Reproduction & Resolution use profiling to find the trigger state

    use the app (not fixtures or DB manipulation) to get the data in this state
  14. Reproduction & Resolution use profiling to find the trigger state

    use the app (not fixtures or DB manipulation) to get the data in this state recreate that state in test
  15. Reproduction & Resolution use profiling to find the trigger state

    use the app (not fixtures or DB manipulation) to get the data in this state recreate that state in test follow borhbug instruction
  16. “The bug is huge and everywhere at once. SQL: could

    not connect to server: Connection refused was bubbling up all over the place. Jobs won’t run, emails won’t send, every submit button on the site fatal errored.” on-call log WILDLY CHAOTIC
  17. Reproduction & Resolution use df -h to find if all

    the storage is being use attempt to connect to server & view logs
  18. Reproduction & Resolution use df -h to find if all

    the storage is being used attempt to connect to server & view logs can that be restarted, rotated or killed at this time?
  19. Observe & Classify Verify with logging and time travel Verify

    without changing state by profiling Use server tools to observe entire process
  20. Resources & Further Study • “Linux Debugging Tools I Love”,

    Julia Evans • Systems Performance, Brendan Gregg • “Why Do Computers Stop and What Can Be Done About It?”, Jim Gray • “Debug Patterns for Efficient High- levelSystemC Debugging”, Frank Rogin, Erhard Fehlauer, Christian Haufe, Sebastian Ohnewald