Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How to Debug Anything - DevOpsDay PGH

How to Debug Anything - DevOpsDay PGH

Sam Kottler

August 13, 2015
Tweet

More Decks by Sam Kottler

Other Decks in Programming

Transcript

  1. How to Debug
    Anything
    @samkottler
    1

    View Slide

  2. About me
    • Engineering manager @ DigitalOcean
    • Formerly of Red Hat’s virtualization team
    • Ruby core security
    • CentOS/Icinga/Bundler/Rubygems/Ansible
    2

    View Slide

  3. tl;dr I professionally debug
    software and systems.
    3

    View Slide

  4. Most software is
    pretty bad.
    4

    View Slide

  5. Operating systems
    are no exception.
    5

    View Slide

  6. Verifying correctness
    on changing systems
    is impossible.
    6

    View Slide

  7. Turtles all the way down:
    • Unreliable hardware
    • Unreliable devices
    • Unreliable kernels
    • Unreliable drivers
    • Unreliable libraries
    • Unreliable runtimes
    7

    View Slide

  8. 8

    View Slide

  9. Bugs happen. Tools help quickly
    find and fix them.
    9

    View Slide

  10. But first, let's talk
    process.
    10

    View Slide

  11. 1. Suspend your disbelief
    "This isn't even possible!"
    11

    View Slide

  12. 2. Reproduce the issue
    Without being able to create the bad condition, it's impossible to
    fix it.
    12

    View Slide

  13. 3. Compare with healthy systems
    Sometimes simple or seemingly insignificant changes can lead to
    major issues.
    13

    View Slide

  14. 4. Figure out what's wrong.
    This is probably gonna take a while.
    14

    View Slide

  15. 5. Fix the problem
    Almost always easier than figuring out the problem because of
    the understanding built while debugging.
    15

    View Slide

  16. Let's jump back to step #4
    16

    View Slide

  17. Bugs happen. Tools help quickly
    find and fix them.
    17

    View Slide

  18. require "otherthing"
    module Derp
    class Yerp
    def self.string
    "derping in the USA"
    end
    end
    end
    puts Derp::Yerp.string
    18

    View Slide

  19. `require': cannot load such file -- otherthing (LoadError)
    from /usr/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
    from test.rb:1:in `'
    19

    View Slide

  20. open("/usr/lib/ruby/1.9.1/rubygems/path_support.rb", O_RDONLY) = 5
    fstat(5, {st_mode=S_IFREG|0644, st_size=1538, ...}) = 0
    fstat(5, {st_mode=S_IFREG|0644, st_size=1538, ...}) = 0
    close(5) = 0
    getuid() = 10100
    geteuid() = 10100
    getgid() = 10000
    getegid() = 10000
    open("/usr/lib/ruby/1.9.1/rubygems/path_support.rb", O_RDONLY) = 5
    fstat(5, {st_mode=S_IFREG|0644, st_size=1538, ...}) = 0
    ioctl(5, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7ffc21fe8df0) = -1 ENOTTY (Inappropriate ioctl for device)
    fstat(5, {st_mode=S_IFREG|0644, st_size=1538, ...}) = 0
    read(5, "##\n# Gem::PathSupport facilitate"..., 8192) = 1538
    read(5, "", 8192) = 0
    close(5) = 0
    lstat("/usr", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
    lstat("/usr/lib", {st_mode=S_IFDIR|0755, st_size=12288, ...}) = 0
    lstat("/usr/lib/ruby", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
    lstat("/usr/lib/ruby/1.9.1", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
    lstat("/usr/lib/ruby/1.9.1/rubygems", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
    lstat("/usr/lib/ruby/1.9.1/rubygems/path_support.rb", {st_mode=S_IFREG|0644, st_size=1538, ...}) = 0
    brk(0x1719000) = 0x1719000
    stat("/home/skottler", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
    openat(AT_FDCWD, "/var/lib/gems/1.9.1/specifications", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
    openat(AT_FDCWD, "/home/skottler/.gem/ruby/1.9.1/specifications", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
    rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
    open("/proc/self/maps", O_RDONLY|O_CLOEXEC) = 5
    20

    View Slide

  21. Reading strace output is easy!
    fstat(5, {st_mode=S_IFREG|0644, st_size=1538, ...}) = 0
    21

    View Slide

  22. Let's break it down:
    function(file descriptor, {....args.....}) = return value
    22

    View Slide

  23. System calls are a unified
    interface to debug userspace
    23

    View Slide

  24. Debugging a crash
    24

    View Slide

  25. #include
    int main () {
    int foo[5], n;
    memset((char *)0x0, 1, 100);
    return 0;
    }
    25

    View Slide

  26. gcc -g -o foo foo.c
    26

    View Slide

  27. $ gdb foo
    27

    View Slide

  28. (gdb) list
    1 #include
    2
    3 int main () {
    4 int foo[5], n;
    5
    6 memset((char *)0x0, 1, 100);
    7
    8 return 0;
    9 }
    28

    View Slide

  29. (gdb) run
    Starting program: /home/skottler/foo
    Program received signal SIGSEGV, Segmentation fault.
    0x00007ffff7aa1614 in memset () from /lib/x86_64-linux-gnu/libc.so.6
    29

    View Slide

  30. (gdb) where
    #0 0x00007ffff7aa1614 in memset () from /lib/x86_64-linux-gnu/libc.so.6
    #1 0x0000000000400599 in main () at foo.c:6
    30

    View Slide

  31. (gdb) list
    1 #include
    2
    3 int main () {
    4 int foo[5], n;
    5
    6 memset(&n, 0, sizeof(foo));
    7
    8 return 0;
    9 }
    31

    View Slide

  32. (gdb) run
    Starting program: /home/skottler/foo
    [Inferior 1 (process 44204) exited normally]
    32

    View Slide

  33. 33

    View Slide

  34. Profiling
    34

    View Slide

  35. The anatomy of a
    memory leak
    35

    View Slide

  36. A brief detour to
    redis-server
    36

    View Slide

  37. #include
    void f(void) {
    int* x = malloc(10 * sizeof(int));
    x[10] = 0;
    }
    int main(void) {
    f();
    return 0;
    }
    37

    View Slide

  38. gcc -g -o heap heap.c
    38

    View Slide

  39. valgrind --leak-check=yes heap
    39

    View Slide

  40. ==18809== Invalid write of size 4
    ==18809== at 0x40054B: f (heap.c:5)
    ==18809== by 0x40055B: main (heap.c:9)
    ==18809== Address 0x51fc068 is 0 bytes after a block of size 40 alloc'd
    ==18809== at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
    ==18809== by 0x40053E: f (heap.c:4)
    ==18809== by 0x40055B: main (heap.c:9)
    ==18809==
    ==18809==
    ==18809== HEAP SUMMARY:
    ==18809== in use at exit: 40 bytes in 1 blocks
    ==18809== total heap usage: 1 allocs, 0 frees, 40 bytes allocated
    ==18809==
    ==18809== 40 bytes in 1 blocks are definitely lost in loss record 1 of 1
    ==18809== at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
    ==18809== by 0x40053E: f (heap.c:4)
    ==18809== by 0x40055B: main (heap.c:9)
    40

    View Slide

  41. void f(void) {
    int* x = malloc(10 * sizeof(int));
    x[10] = 0;
    }
    ==18809== Invalid write of size 4
    ==18809== at 0x40054B: f (heap.c:5)
    ==18809== by 0x40055B: main (heap.c:9)
    41

    View Slide

  42. void f(void) {
    int* x = malloc(10 * sizeof(int));
    x[9] = 0;
    }
    42

    View Slide

  43. void f(void) {
    int* x = malloc(10 * sizeof(int));
    x[10] = 0;
    }
    ==18809== Address 0x51fc068 is 0 bytes after a block of size 40 alloc'd
    ==18809== at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
    ==18809== by 0x40053E: f (heap.c:4)
    ==18809== by 0x40055B: main (heap.c:9)
    43

    View Slide

  44. void f(void) {
    int* x = malloc(10 * sizeof(int));
    x[9] = 0;
    free(x);
    }
    44

    View Slide

  45. ==18855== HEAP SUMMARY:
    ==18855== in use at exit: 0 bytes in 0 blocks
    ==18855== total heap usage: 1 allocs, 1 frees, 40 bytes allocated
    ==18855==
    ==18855== All heap blocks were freed -- no leaks are possible
    ==18855==
    ==18855== For counts of detected and suppressed errors, rerun with: -v
    ==18855== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
    45

    View Slide

  46. 1. Suspend your disbelief
    2. Reproduce the issue
    3. Compare with healthy systems
    4. Figure out what's wrong.
    5. Fix the problem
    46

    View Slide

  47. Thanks!
    @samkottler
    [email protected]
    github.com/skottler
    47

    View Slide