Slide 1

Slide 1 text

How I debugged debugger @k0kubun

Slide 2

Slide 2 text

Hi, I'm @k0kubun • Takashi Kokubun • Twitter, GitHub: @k0kubun • Developer Productivity Group, Cookpad Inc.

Slide 3

Slide 3 text

Debugger

Slide 4

Slide 4 text

What's debugger? • "A tool to test and debug other programs" • Typically it has: • breakpoint • step execution • tracking values of variables

Slide 5

Slide 5 text

Ruby's debuggers • Ruby 1.9.2+ • debugger, pry-debugger • Ruby 2.0+ • byebug, pry-byebug • partially working but not supported in 2.0

Slide 6

Slide 6 text

byebug's performance issue

Slide 7

Slide 7 text

Ruby process got over 10x slower after using it • Rails' response: 41ms -> 685ms

Slide 8

Slide 8 text

But why did it matter for us? • We also use it in rails • The issue is just 40ms -> 685ms, isn't it? • Why don't you restart rails server?

Slide 9

Slide 9 text

But why did it matter for us? • We also use it in rails • The issue is just 40ms -> 685ms, isn't it? • Why don't you restart rails server? No

Slide 10

Slide 10 text

Our rails application is "World's Largest Rails Monolith" https://speakerdeck.com/a_matsuda/the-recipe-for-the-worlds-largest-rails-monolith

Slide 11

Slide 11 text

Our rails application is "World's Largest Rails Monolith"

Slide 12

Slide 12 text

Our rails application is "World's Largest Rails Monolith" • It was TOO hard for us... • Some heavy req: 6,000ms -> 80,000ms • Restarting `rails s` takes 40,000ms

Slide 13

Slide 13 text

Our rails application is "World's Largest Rails Monolith" 80,053ms

Slide 14

Slide 14 text

Let's optimize

Slide 15

Slide 15 text

Optimization idea • byebug author commented about how to optimize on the issue • "disable the debugger when user hits continue"

Slide 16

Slide 16 text

Optimization idea • byebug author commented about how to optimize on the issue • but he hadn't got to it for a long time • so I got to it

Slide 17

Slide 17 text

First enemy: uncertain backtrace

Slide 18

Slide 18 text

Many test failures • Just disabling tracepoints (breakpoints for Ruby) after `continue` made byebug test suite broken and it had never finished running # Running: ...........FFEE.......EEE.E...EE.......F.........................F........... .........F............................................................S..F..F .............................E.EFEEEEE....S....E..E..E...F...............F.F. ........................EE...FF..F.FF.........E.EEEE...........E..EE.EEE..EE. ...F........................S......EE.....E.E.......... No output has been received in the last 10 minutes, this potentially https://travis-ci.org/deivid-rodriguez/byebug/jobs/74734077

Slide 19

Slide 19 text

Reproduce a test failure 1. module Byebug 2. Context.interface = TestInterface.new 3. Context.interface.input << 'continue' 4. byebug 5. b = 5 5. end • Minimum code to reproduce failure of a test case

Slide 20

Slide 20 text

Execution result 1. module Byebug 2. Context.interface = TestInterface.new 3. Context.interface.input << 'continue' 4. byebug 5. b = 5 5. end $ bundle exec ruby -Ilib -rbyebug test.rb test.rb:5:in `': wrong argument type nil (expected Data) (TypeError) from test.rb:1:in `'

Slide 21

Slide 21 text

???? 1. module Byebug 2. Context.interface = TestInterface.new 3. Context.interface.input << 'continue' 4. byebug 5. b = 5 5. end $ bundle exec ruby -Ilib -rbyebug test.rb test.rb:5:in `': wrong argument type nil (expected Data) (TypeError) from test.rb:1:in `'

Slide 22

Slide 22 text

No information in backtrace • I could understand nothing in Ruby-level backtrace...

Slide 23

Slide 23 text

I tried to debug with gdb • Since byebug is written in C, byebug can't debug itself • I had to use gdb to debug byebug

Slide 24

Slide 24 text

Still no luck (gdb) run -Ilib -rbyebug test.rb Starting program: /usr/local/bin/ruby -Ilib -rbyebug test.rb [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux- gnu/libthread_db.so.1". [New Thread 0x7ffff7ff8700 (LWP 14327)] test.rb:5:in `': wrong argument type nil (expected Data) (TypeError) from test.rb:1:in `' [Thread 0x7ffff7ff8700 (LWP 14327) exited] [Inferior 1 (process 14324) exited with code 01]

Slide 25

Slide 25 text

Debugging C extension • I could understand nothing with gdb too • Since it's not SEGV, gdb does not stop • Data? But where is it raised from?

Slide 26

Slide 26 text

Break with rb_raise • My solution was to break in rb_raise • This was my first experience of using gdb and debugging C extension, so please tell me if there is smarter solution

Slide 27

Slide 27 text

Breakpoint 1, rb_raise (exc=93824997653520, fmt=0x555555760d70 "wrong argument type %s (expected %s)") at error.c:1939 1939 va_start(args, fmt); (gdb) bt #0 rb_raise (exc=93824997653520, fmt=0x555555760d70 "wrong argument type %s (expected %s)") at error.c:1939 #1 0x000055555571e598 in rb_check_type (x=8, t=12) at error.c:567 #2 0x00007ffff5fa4d79 in rb_data_object_get (obj=8) at /usr/local/include/ruby-2.2.0/ ruby/ruby.h:1191 #3 0x00007ffff5fa4f91 in cleanup_dead_threads () at ../../../../ext/byebug/threads.c:105 #4 0x00007ffff5fa50c5 in release_lock () at ../../../../ext/byebug/threads.c:157 #5 0x00007ffff5fa0e96 in cleanup (dc=0x555556204310) at ../../../../ext/byebug/byebug.c: 132 Break with rb_raise • A place raising an error is found!!

Slide 28

Slide 28 text

I could see proper C-level backtrace by breaking in rb_raise • Cause: • Unexpected nil reference in Data_Get_Struct • I fixed to return before Data_Get_Struct in hooked function • I managed to pass all tests!

Slide 29

Slide 29 text

Second enemy: dead locks

Slide 30

Slide 30 text

Dead locks • My fix was not enough • While all tests are passing, it didn't work with Rails app

Slide 31

Slide 31 text

Dead locks • I became crazy

Slide 32

Slide 32 text

Debug hard! • Using gdb, I checked the state of not- debugging threads • I found that it breaked inside hooked function

Slide 33

Slide 33 text

What's wrong? • Since byebug supports multi-thread debugging, it locks not-debugging threads • byebug releases it after debugging • My patch prevented thread release by byebug...

Slide 34

Slide 34 text

How to release locks safely? • I asked byebug author how to release locks safely

Slide 35

Slide 35 text

How to release locks safely? • NO HELP FROM AUTHOR

Slide 36

Slide 36 text

How to release locks safely? • So I read code deeply and managed to find a way to achieve it

Slide 37

Slide 37 text

THE SOLUTION • ALL WE NEED WAS JUST DELETING ONE LINE

Slide 38

Slide 38 text

THE SOLUTION • ALL WE NEED WAS JUST DELETING ONE LINE

Slide 39

Slide 39 text

Final boss: Random SEGV by ruby's bug

Slide 40

Slide 40 text

• It worked with our Rails app and sometimes it could pass all tests • But randomly crashed Random SEGV

Slide 41

Slide 41 text

SEGV debugging • Load core file with gdb • Search which variable caused a crash

Slide 42

Slide 42 text

SEGV debugging • How to do?: http://techlife.cookpad.com

Slide 43

Slide 43 text

See backtrace #4 0x00007f7e2c74a452 in sigsegv (sig=11, info=0x7f7e2e0ccbb0, ctx=0x7f7e2e0cca80) at signal.c:879 #5 #6 0x0000000000000000 in ?? () #7 0x00007f7e2c7db126 in exec_hooks_body (th=0x7f7e2e05e5e0, list=0x7f7e2e05e4e8, trace_arg=0x7fff6f732880) at vm_trace.c:256 #8 0x00007f7e2c7db327 in exec_hooks_protected (th=0x7f7e2e05e5e0, list=0x7f7e2e05e4e8, trace_arg=0x7fff6f732880) at vm_trace.c:299

Slide 44

Slide 44 text

Debug and find unexpected xfree

Slide 45

Slide 45 text

Sent patch to fix it

Slide 46

Slide 46 text

Sent patch to fix it

Slide 47

Slide 47 text

Merged!!!

Slide 48

Slide 48 text

All problems are resolved!!! • Released as byebug v7.0.0!

Slide 49

Slide 49 text

Conclusion • I made byebug over 10x faster in v7.0.0 • Use latest byebug and pry-byebug