Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Practical Taxonomy of Bugs and How to Squash Them-RubyConf Italy 2016

Kylie
November 24, 2016

A Practical Taxonomy of Bugs and How to Squash Them-RubyConf Italy 2016

The slides I used to accompany my presentation "A Practical Taxonomy of Bugs and How to Squash Them" at Ruby Conf Italy 2016. Happy squashing!

Kylie

November 24, 2016
Tweet

More Decks by Kylie

Other Decks in Programming

Transcript

  1. A PRACTICAL
    TAXONOMY OF BUGS
    AND HOW TO SQUASH THEM

    View full-size slide

  2. Instinctual Indications…6
    Research Methods…9
    Practical Taxonomy…13
    Bohrbug…17
    Schrödinbug…25
    Fractalbug…24
    Heisenbug…35
    Mandelbug…44
    Resources…55
    Table of Contents

    View full-size slide

  3. Debugging Skills

    View full-size slide

  4. “As you familiarize yourself with the
    application, you’ll build up some
    debugging instincts"

    View full-size slide

  5. “Whenever I see something like this
    happening, the first thing I do is scan
    the logs to see if this process is
    completing or is sending a weird
    message.”

    View full-size slide

  6. Debugging Instincts
    “ ”

    View full-size slide

  7. “Whenever I see #{x}, I always check
    #{y}”

    View full-size slide

  8. Research Methods
    • containment sometimes takes priority
    over squashing
    • we can only work with facts
    • we can’t squash every bug in this
    talk

    View full-size slide

  9. Observable Attributes

    View full-size slide

  10. Warning: Contrived
    Scenarios Ahead

    View full-size slide

  11. A Practical Taxonomy of Bugs
    Upsettingly
    Observable
    Wildly
    Chaotic
    {

    View full-size slide

  12. Upsettingly Observable

    View full-size slide

  13. Wildly Chaotic

    View full-size slide

  14. How to Squash Them

    View full-size slide

  15. upsettingly
    observable bug #1
    UPSETTINGLY
    OBSERVABLE

    View full-size slide

  16. Observable Attributes
    is the bug observable in production?
    can it be reproduced locally?
    does it seem to be restricted to one area?

    View full-size slide

  17. Bohrbug
    deterministic,
    highly
    reproducible
    UPSETTINGLY
    OBSERVABLE

    View full-size slide

  18. Bohrbug
    Commonly found in
    code,sometimes on
    server
    UPSETTINGLY
    OBSERVABLE

    View full-size slide

  19. Bohrbug
    likes to hide in
    complex branching in
    functions, classes or
    config
    UPSETTINGLY
    OBSERVABLE

    View full-size slide

  20. Bohrbug
    In the wild:
    validation
    UPSETTINGLY
    OBSERVABLE

    View full-size slide

  21. Reproduction & Resolution
    replicate locally and in test
    write the simple solution
    rewrite to be highly readable and
    extendable
    UPSETTINGLY
    OBSERVABLE

    View full-size slide

  22. Bohrbug
    UPSETTINGLY
    OBSERVABLE

    View full-size slide

  23. Bohrbug
    UPSETTINGLY
    OBSERVABLE

    View full-size slide

  24. upsettingly
    observable bug #2
    UPSETTINGLY
    OBSERVABLE

    View full-size slide

  25. Observable Attributes
    how does this work?
    does this work?
    wait, what is this even testing?
    did this ever work?

    View full-size slide

  26. Schrödinbug
    stick-like body
    appendages
    look like
    twigs
    UPSETTINGLY
    OBSERVABLE

    View full-size slide

  27. Schrödinbug
    Likes to
    pretend to be
    working code.
    On close
    inspection,
    reveals itself
    to be a bug.
    UPSETTINGLY
    OBSERVABLE

    View full-size slide

  28. Schrödinbug
    In the wild:
    code that never
    worked
    UPSETTINGLY
    OBSERVABLE

    View full-size slide

  29. Schrödinbug
    In the wild:
    it didn’t
    work how you
    thought it
    did
    UPSETTINGLY
    OBSERVABLE

    View full-size slide

  30. Logging as Verification
    Tool

    View full-size slide

  31. Git Bisect
    Tool

    View full-size slide

  32. Reproduction & Resolution
    reproduce the “broken” state locally and
    in test
    add log statements until you can verify
    what causes the broken state.
    if the bug did work at some point, find
    the point at which it did work.
    write tests to represent the
    configuration and flow of the fixed
    state

    View full-size slide

  33. Schrödinbug
    UPSETTINGLY
    OBSERVABLE

    View full-size slide

  34. Schrödinbug
    UPSETTINGLY
    OBSERVABLE

    View full-size slide

  35. wildly chaotic bug #1
    WILDLY
    CHAOTIC

    View full-size slide

  36. Observable Attributes
    Does it appear non-deterministic?
    Does it seem to disappear once you
    observe or debug it?

    View full-size slide

  37. Heisenbug
    “now you see it, now you don’t”
    WILDLY
    CHAOTIC

    View full-size slide

  38. Heisenbug
    WILDLY
    CHAOTIC
    In the wild: a
    heisenbug that
    lives in code

    View full-size slide

  39. Heisenbug
    WILDLY
    CHAOTIC
    In the wild: a
    heisenbug that
    lives in data

    View full-size slide

  40. Profiling for Verification
    https://kcachegrind.github.io/html/CallgrindFormat.html
    Tool

    View full-size slide

  41. FLAME GRAPHS
    http://www.brendangregg.com/FlameGraphs/cpu-mysql-updated.svg
    Tool

    View full-size slide

  42. Reproduction & Resolution
    use profiling to find the trigger state
    use the app (not fixtures or DB
    manipulation) to get the data in this
    state
    recreate that state in test
    follow borhbug instruction

    View full-size slide

  43. Heisenbug
    WILDLY
    CHAOTIC

    View full-size slide

  44. Heisenbug
    WILDLY
    CHAOTIC

    View full-size slide

  45. wildly chaotic bug #2
    WILDLY
    CHAOTIC

    View full-size slide

  46. Observable Attributes
    is everything broken?
    all of it?
    send help??

    View full-size slide

  47. Mandelbug WILDLY
    CHAOTIC

    View full-size slide

  48. Mandelbug
    seems like
    everything
    is broken
    at once
    WILDLY
    CHAOTIC

    View full-size slide

  49. Mandelbug
    people are
    very upset
    with you
    WILDLY
    CHAOTIC

    View full-size slide

  50. Mandelbug
    likely an
    issue with
    your
    system, not
    code
    WILDLY
    CHAOTIC

    View full-size slide

  51. “The bug is huge and everywhere at once.
    SQL: could not connect to server: Connection refused was
    bubbling up all over the place.
    Jobs won’t run, emails won’t send, every submit button on
    the site fatal errored.”
    on-call log 24 June 2014
    WILDLY
    CHAOTIC

    View full-size slide

  52. Disk Usage
    Tool
    df -h

    View full-size slide

  53. Reproduction & Resolution
    attempt to connect to server & view logs
    use df -h to find if all the storage is
    being used
    can that be restarted, rotated or killed
    at this time?

    View full-size slide

  54. Mandelbug
    WILDLY
    CHAOTIC

    View full-size slide

  55. Mandelbug
    WILDLY
    CHAOTIC

    View full-size slide

  56. A Practical Taxonomy of Bugs
    Upsettingly
    Observable
    Wildly
    Chaotic
    {
    bohrbug
    schrödinbug
    mandelbug
    heisenbug

    View full-size slide

  57. “Debugging Instincts”

    View full-size slide

  58. “Debugging Instincts”

    View full-size slide

  59. Debugging Skills

    View full-size slide

  60. Observe & Classify

    View full-size slide

  61. Verify with logging and time travel

    View full-size slide

  62. Verify without
    changing state by
    profiling

    View full-size slide

  63. Use linux server tools to observe entire
    process

    View full-size slide

  64. Observe & Classify
    Verify with logging and time travel
    Verify without
    changing state by
    profiling
    Use linux server tools to observe entire
    process

    View full-size slide

  65. Build Up Your Own
    Toolkit and Share it

    View full-size slide

  66. Resources & Further Study
    • “Linux Debugging Tools I Love”, Julia Evans
    • Systems Performance, Brendan Gregg
    • Site Reliability Engineering, Betsy Beyer,
    Chris Jones, Jennifer Petoff, Niall Richard
    Murphy
    • “Why Do Computers Stop and What Can Be Done
    About It?”, Jim Gray
    • “Debug Patterns for Efficient High-
    levelSystemC Debugging”, Frank Rogin, Erhard
    Fehlauer, Christian Haufe, Sebastian Ohnewald

    View full-size slide