Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How I debugged debugger #RubyKaigi 2015

How I debugged debugger #RubyKaigi 2015

RubyKaigi 2015 Lightning Talk
"How I debugged debugger"

http://rubykaigi.org/2015

Takashi Kokubun

December 12, 2015
Tweet

More Decks by Takashi Kokubun

Other Decks in Programming

Transcript

  1. How I debugged debugger
    @k0kubun

    View Slide

  2. Hi, I'm @k0kubun
    • Takashi Kokubun
    • Twitter, GitHub: @k0kubun
    • Developer Productivity Group,
    Cookpad Inc.

    View Slide

  3. Debugger

    View Slide

  4. What's debugger?
    • "A tool to test and debug other programs"
    • Typically it has:
    • breakpoint
    • step execution
    • tracking values of variables

    View Slide

  5. Ruby's debuggers
    • Ruby 1.9.2+
    • debugger, pry-debugger
    • Ruby 2.0+
    • byebug, pry-byebug
    • partially working but not supported in 2.0

    View Slide

  6. byebug's
    performance issue

    View Slide

  7. Ruby process got over
    10x slower after using it
    • Rails' response: 41ms -> 685ms

    View Slide

  8. But why did it matter for us?
    • We also use it in rails
    • The issue is just 40ms -> 685ms, isn't it?
    • Why don't you restart rails server?

    View Slide

  9. But why did it matter for us?
    • We also use it in rails
    • The issue is just 40ms -> 685ms, isn't it?
    • Why don't you restart rails server?
    No

    View Slide

  10. Our rails application is
    "World's Largest Rails Monolith"
    https://speakerdeck.com/a_matsuda/the-recipe-for-the-worlds-largest-rails-monolith

    View Slide

  11. Our rails application is
    "World's Largest Rails Monolith"

    View Slide

  12. Our rails application is
    "World's Largest Rails Monolith"
    • It was TOO hard for us...
    • Some heavy req: 6,000ms -> 80,000ms
    • Restarting `rails s` takes 40,000ms

    View Slide

  13. Our rails application is
    "World's Largest Rails Monolith"
    80,053ms

    View Slide

  14. Let's optimize

    View Slide

  15. Optimization idea
    • byebug author commented about how to
    optimize on the issue
    • "disable the debugger when user hits continue"

    View Slide

  16. Optimization idea
    • byebug author commented about how to
    optimize on the issue
    • but he hadn't got to it for a long time
    • so I got to it

    View Slide

  17. First enemy:
    uncertain backtrace

    View Slide

  18. Many test failures
    • Just disabling tracepoints (breakpoints for Ruby)
    after `continue` made byebug test suite broken
    and it had never finished running
    # Running:
    ...........FFEE.......EEE.E...EE.......F.........................F...........
    .........F............................................................S..F..F
    .............................E.EFEEEEE....S....E..E..E...F...............F.F.
    ........................EE...FF..F.FF.........E.EEEE...........E..EE.EEE..EE.
    ...F........................S......EE.....E.E..........
    No output has been received in the last 10 minutes, this potentially
    https://travis-ci.org/deivid-rodriguez/byebug/jobs/74734077

    View Slide

  19. Reproduce a test failure
    1. module Byebug
    2. Context.interface = TestInterface.new
    3. Context.interface.input << 'continue'
    4. byebug
    5. b = 5
    5. end
    • Minimum code to reproduce failure of a
    test case

    View Slide

  20. Execution result
    1. module Byebug
    2. Context.interface = TestInterface.new
    3. Context.interface.input << 'continue'
    4. byebug
    5. b = 5
    5. end
    $ bundle exec ruby -Ilib -rbyebug test.rb
    test.rb:5:in `': wrong argument type nil
    (expected Data) (TypeError)
    from test.rb:1:in `'

    View Slide

  21. ????
    1. module Byebug
    2. Context.interface = TestInterface.new
    3. Context.interface.input << 'continue'
    4. byebug
    5. b = 5
    5. end
    $ bundle exec ruby -Ilib -rbyebug test.rb
    test.rb:5:in `': wrong argument type nil
    (expected Data) (TypeError)
    from test.rb:1:in `'

    View Slide

  22. No information in backtrace
    • I could understand nothing in Ruby-level
    backtrace...

    View Slide

  23. I tried to debug with gdb
    • Since byebug is written in C, byebug can't
    debug itself
    • I had to use gdb to debug byebug

    View Slide

  24. Still no luck
    (gdb) run -Ilib -rbyebug test.rb
    Starting program: /usr/local/bin/ruby -Ilib -rbyebug
    test.rb
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib/x86_64-linux-
    gnu/libthread_db.so.1".
    [New Thread 0x7ffff7ff8700 (LWP 14327)]
    test.rb:5:in `': wrong argument type
    nil (expected Data) (TypeError)
    from test.rb:1:in `'
    [Thread 0x7ffff7ff8700 (LWP 14327) exited]
    [Inferior 1 (process 14324) exited with code 01]

    View Slide

  25. Debugging C extension
    • I could understand nothing with gdb too
    • Since it's not SEGV, gdb does not stop
    • Data? But where is it raised from?

    View Slide

  26. Break with rb_raise
    • My solution was to break in rb_raise
    • This was my first experience of using gdb and
    debugging C extension, so please tell me if
    there is smarter solution

    View Slide

  27. Breakpoint 1, rb_raise (exc=93824997653520, fmt=0x555555760d70 "wrong argument type %s
    (expected %s)") at error.c:1939
    1939 va_start(args, fmt);
    (gdb) bt
    #0 rb_raise (exc=93824997653520, fmt=0x555555760d70 "wrong argument type %s (expected
    %s)") at error.c:1939
    #1 0x000055555571e598 in rb_check_type (x=8, t=12) at error.c:567
    #2 0x00007ffff5fa4d79 in rb_data_object_get (obj=8) at /usr/local/include/ruby-2.2.0/
    ruby/ruby.h:1191
    #3 0x00007ffff5fa4f91 in cleanup_dead_threads () at ../../../../ext/byebug/threads.c:105
    #4 0x00007ffff5fa50c5 in release_lock () at ../../../../ext/byebug/threads.c:157
    #5 0x00007ffff5fa0e96 in cleanup (dc=0x555556204310) at ../../../../ext/byebug/byebug.c:
    132
    Break with rb_raise
    • A place raising an error is found!!

    View Slide

  28. I could see proper C-level
    backtrace by breaking in rb_raise
    • Cause:
    • Unexpected nil reference in Data_Get_Struct
    • I fixed to return before Data_Get_Struct in
    hooked function
    • I managed to pass all tests!

    View Slide

  29. Second enemy:
    dead locks

    View Slide

  30. Dead locks
    • My fix was not enough
    • While all tests are passing, it didn't work with
    Rails app

    View Slide

  31. Dead locks
    • I became crazy

    View Slide

  32. Debug hard!
    • Using gdb, I checked the state of not-
    debugging threads
    • I found that it breaked inside hooked
    function

    View Slide

  33. What's wrong?
    • Since byebug supports multi-thread
    debugging, it locks not-debugging threads
    • byebug releases it after debugging
    • My patch prevented thread release by
    byebug...

    View Slide

  34. How to release locks safely?
    • I asked byebug author how to release locks safely

    View Slide

  35. How to release locks safely?
    • NO HELP FROM AUTHOR

    View Slide

  36. How to release locks safely?
    • So I read code deeply and managed to
    find a way to achieve it

    View Slide

  37. THE SOLUTION
    • ALL WE NEED WAS JUST DELETING ONE LINE

    View Slide

  38. THE SOLUTION
    • ALL WE NEED WAS JUST DELETING ONE LINE

    View Slide

  39. Final boss:
    Random SEGV by ruby's bug

    View Slide

  40. • It worked with our Rails app and
    sometimes it could pass all tests
    • But randomly crashed
    Random SEGV

    View Slide

  41. SEGV debugging
    • Load core file with gdb
    • Search which variable caused a crash

    View Slide

  42. SEGV debugging
    • How to do?: http://techlife.cookpad.com

    View Slide

  43. See backtrace
    #4 0x00007f7e2c74a452 in sigsegv (sig=11, info=0x7f7e2e0ccbb0,
    ctx=0x7f7e2e0cca80) at signal.c:879
    #5
    #6 0x0000000000000000 in ?? ()
    #7 0x00007f7e2c7db126 in exec_hooks_body (th=0x7f7e2e05e5e0,
    list=0x7f7e2e05e4e8, trace_arg=0x7fff6f732880)
    at vm_trace.c:256
    #8 0x00007f7e2c7db327 in exec_hooks_protected (th=0x7f7e2e05e5e0,
    list=0x7f7e2e05e4e8, trace_arg=0x7fff6f732880)
    at vm_trace.c:299

    View Slide

  44. Debug and find unexpected xfree

    View Slide

  45. Sent patch to fix it

    View Slide

  46. Sent patch to fix it

    View Slide

  47. Merged!!!

    View Slide

  48. All problems are resolved!!!
    • Released as byebug v7.0.0!

    View Slide

  49. Conclusion
    • I made byebug over 10x faster in v7.0.0
    • Use latest byebug and pry-byebug

    View Slide