How I debugged debugger #RubyKaigi 2015

How I debugged debugger #RubyKaigi 2015

RubyKaigi 2015 Lightning Talk
"How I debugged debugger"

http://rubykaigi.org/2015

08d5432a5bc31e6d9edec87b94cb1db1?s=128

Takashi Kokubun

December 12, 2015
Tweet

Transcript

  1. How I debugged debugger @k0kubun

  2. Hi, I'm @k0kubun • Takashi Kokubun • Twitter, GitHub: @k0kubun

    • Developer Productivity Group, Cookpad Inc.
  3. Debugger

  4. What's debugger? • "A tool to test and debug other

    programs" • Typically it has: • breakpoint • step execution • tracking values of variables
  5. Ruby's debuggers • Ruby 1.9.2+ • debugger, pry-debugger • Ruby

    2.0+ • byebug, pry-byebug • partially working but not supported in 2.0
  6. byebug's performance issue

  7. Ruby process got over 10x slower after using it •

    Rails' response: 41ms -> 685ms
  8. But why did it matter for us? • We also

    use it in rails • The issue is just 40ms -> 685ms, isn't it? • Why don't you restart rails server?
  9. But why did it matter for us? • We also

    use it in rails • The issue is just 40ms -> 685ms, isn't it? • Why don't you restart rails server? No
  10. Our rails application is "World's Largest Rails Monolith" https://speakerdeck.com/a_matsuda/the-recipe-for-the-worlds-largest-rails-monolith

  11. Our rails application is "World's Largest Rails Monolith"

  12. Our rails application is "World's Largest Rails Monolith" • It

    was TOO hard for us... • Some heavy req: 6,000ms -> 80,000ms • Restarting `rails s` takes 40,000ms
  13. Our rails application is "World's Largest Rails Monolith" 80,053ms

  14. Let's optimize

  15. Optimization idea • byebug author commented about how to optimize

    on the issue • "disable the debugger when user hits continue"
  16. Optimization idea • byebug author commented about how to optimize

    on the issue • but he hadn't got to it for a long time • so I got to it
  17. First enemy: uncertain backtrace

  18. Many test failures • Just disabling tracepoints (breakpoints for Ruby)

    after `continue` made byebug test suite broken and it had never finished running # Running: ...........FFEE.......EEE.E...EE.......F.........................F........... .........F............................................................S..F..F .............................E.EFEEEEE....S....E..E..E...F...............F.F. ........................EE...FF..F.FF.........E.EEEE...........E..EE.EEE..EE. ...F........................S......EE.....E.E.......... No output has been received in the last 10 minutes, this potentially https://travis-ci.org/deivid-rodriguez/byebug/jobs/74734077
  19. Reproduce a test failure 1. module Byebug 2. Context.interface =

    TestInterface.new 3. Context.interface.input << 'continue' 4. byebug 5. b = 5 5. end • Minimum code to reproduce failure of a test case
  20. Execution result 1. module Byebug 2. Context.interface = TestInterface.new 3.

    Context.interface.input << 'continue' 4. byebug 5. b = 5 5. end $ bundle exec ruby -Ilib -rbyebug test.rb test.rb:5:in `<module:Byebug>': wrong argument type nil (expected Data) (TypeError) from test.rb:1:in `<main>'
  21. ???? 1. module Byebug 2. Context.interface = TestInterface.new 3. Context.interface.input

    << 'continue' 4. byebug 5. b = 5 5. end $ bundle exec ruby -Ilib -rbyebug test.rb test.rb:5:in `<module:Byebug>': wrong argument type nil (expected Data) (TypeError) from test.rb:1:in `<main>'
  22. No information in backtrace • I could understand nothing in

    Ruby-level backtrace...
  23. I tried to debug with gdb • Since byebug is

    written in C, byebug can't debug itself • I had to use gdb to debug byebug
  24. Still no luck (gdb) run -Ilib -rbyebug test.rb Starting program:

    /usr/local/bin/ruby -Ilib -rbyebug test.rb [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux- gnu/libthread_db.so.1". [New Thread 0x7ffff7ff8700 (LWP 14327)] test.rb:5:in `<module:Byebug>': wrong argument type nil (expected Data) (TypeError) from test.rb:1:in `<main>' [Thread 0x7ffff7ff8700 (LWP 14327) exited] [Inferior 1 (process 14324) exited with code 01]
  25. Debugging C extension • I could understand nothing with gdb

    too • Since it's not SEGV, gdb does not stop • Data? But where is it raised from?
  26. Break with rb_raise • My solution was to break in

    rb_raise • This was my first experience of using gdb and debugging C extension, so please tell me if there is smarter solution
  27. Breakpoint 1, rb_raise (exc=93824997653520, fmt=0x555555760d70 "wrong argument type %s (expected

    %s)") at error.c:1939 1939 va_start(args, fmt); (gdb) bt #0 rb_raise (exc=93824997653520, fmt=0x555555760d70 "wrong argument type %s (expected %s)") at error.c:1939 #1 0x000055555571e598 in rb_check_type (x=8, t=12) at error.c:567 #2 0x00007ffff5fa4d79 in rb_data_object_get (obj=8) at /usr/local/include/ruby-2.2.0/ ruby/ruby.h:1191 #3 0x00007ffff5fa4f91 in cleanup_dead_threads () at ../../../../ext/byebug/threads.c:105 #4 0x00007ffff5fa50c5 in release_lock () at ../../../../ext/byebug/threads.c:157 #5 0x00007ffff5fa0e96 in cleanup (dc=0x555556204310) at ../../../../ext/byebug/byebug.c: 132 Break with rb_raise • A place raising an error is found!!
  28. I could see proper C-level backtrace by breaking in rb_raise

    • Cause: • Unexpected nil reference in Data_Get_Struct • I fixed to return before Data_Get_Struct in hooked function • I managed to pass all tests!
  29. Second enemy: dead locks

  30. Dead locks • My fix was not enough • While

    all tests are passing, it didn't work with Rails app
  31. Dead locks • I became crazy

  32. Debug hard! • Using gdb, I checked the state of

    not- debugging threads • I found that it breaked inside hooked function
  33. What's wrong? • Since byebug supports multi-thread debugging, it locks

    not-debugging threads • byebug releases it after debugging • My patch prevented thread release by byebug...
  34. How to release locks safely? • I asked byebug author

    how to release locks safely
  35. How to release locks safely? • NO HELP FROM AUTHOR

  36. How to release locks safely? • So I read code

    deeply and managed to find a way to achieve it
  37. THE SOLUTION • ALL WE NEED WAS JUST DELETING ONE

    LINE
  38. THE SOLUTION • ALL WE NEED WAS JUST DELETING ONE

    LINE
  39. Final boss: Random SEGV by ruby's bug

  40. • It worked with our Rails app and sometimes it

    could pass all tests • But randomly crashed Random SEGV
  41. SEGV debugging • Load core file with gdb • Search

    which variable caused a crash
  42. SEGV debugging • How to do?: http://techlife.cookpad.com

  43. See backtrace #4 0x00007f7e2c74a452 in sigsegv (sig=11, info=0x7f7e2e0ccbb0, ctx=0x7f7e2e0cca80) at

    signal.c:879 #5 <signal handler called> #6 0x0000000000000000 in ?? () #7 0x00007f7e2c7db126 in exec_hooks_body (th=0x7f7e2e05e5e0, list=0x7f7e2e05e4e8, trace_arg=0x7fff6f732880) at vm_trace.c:256 #8 0x00007f7e2c7db327 in exec_hooks_protected (th=0x7f7e2e05e5e0, list=0x7f7e2e05e4e8, trace_arg=0x7fff6f732880) at vm_trace.c:299
  44. Debug and find unexpected xfree

  45. Sent patch to fix it

  46. Sent patch to fix it

  47. Merged!!!

  48. All problems are resolved!!! • Released as byebug v7.0.0!

  49. Conclusion • I made byebug over 10x faster in v7.0.0

    • Use latest byebug and pry-byebug