Dive into CRuby

9361878d459f1709feec780518946ee5?s=47 NARUSE, Yui
September 10, 2016

Dive into CRuby

RubyKaigi 2016

9361878d459f1709feec780518946ee5?s=128

NARUSE, Yui

September 10, 2016
Tweet

Transcript

  1. 2.

    Who is naruse? • twitter: @nalsh • GitHub: @nurse •

    Ruby committer (M17N, release engineering, etc.) • nkf maintainer • working at Treasure Data, Inc. 2
  2. 3.

    झࢫ • ϢʔεέʔεͷॏཁੑΛ ஌Β͠ΊɺօʹऩूΛ ख఻ͬͯ΋Β͏ • RubyΑΓ௿͍ϨΠϠͷ ཧղΛਂΊΔ Goal •

    Understand how use case is important, and make people gather real examples • Understand layers lower than Ruby 3
  3. 4.

    Through RubyKaigi 2016, You may want to • write a

    new application • improve JRuby or mruby or something • improve CRuby 4
  4. 6.

    Motivations • New features • New methods or extend exist

    one • New platforms • Optimizations • speed • memory • Instruments to help trouble shooting • to ease debugging 6
  5. 7.

    New features • New features • New methods or extend

    exist one • String#scrub • Regexp#match? • etc. • New platforms • clang • Microsoft Visual C++ 2015 7
  6. 8.

    Why new feature? • Existing methods doesn’t fit my use

    case • This “use case” must be a real example. • It shall not be an artificial one. • Current method is too complicated, and it must be more simple for my use case 8
  7. 9.

    –Jonathan Swift, Travels into Several Remote Nations of the World

    (1726) “Necessity is the mother of invention.” but... 9
  8. 11.

    [Feature #6752] Replacing ill- formed subsequence • Repair invalid bytes

    in the given string. • Above snippet didn’t work as expected if `str` is already UTF-8. (it simply returned the original string) • Anyway what kind of strings are used? str = "\xe3\x81\x82\x80\x80" str.encode("utf-8", invalid: :replace) 11
  9. 12.

    Case: Web Forms • If strings come from Web Forms

    are broken, the web application should simply raise error: 400 Bad Request. • Clients should retry with correct request. 12
  10. 13.

    Case: Files • If strings come from files are broken,

    it’s usually because it doesn't specify correct encoding. • Retry reading with correct encoding. 13
  11. 14.

    Case: Web Crawler • They can’t retry. • They need

    to handle broken strings easily and secure. • There’s real pages, for example twitter... • mechanism: • wrongly get substring as bytes (though it must do as chars) • insert such invalid bytes into web page template 14
  12. 15.

    How do I design the interface • transcode base •

    Use Ruby’s encoding conversion engine. • Some people are used to use this way with iconv. • regexp base • Use regular expression engine (Ruby usually uses regexp engine on handling characters) • Better performance and meaning. • naming issue str.encode("utf-8", invalid: :replace) 15
  13. 16.

    Added both APIs • String#encode • tables under enc/trans •

    CSI (Code Set Independent; they doesn’t convert into Unicode if it doesn’t need) • but in 21st century, people rarely convert encodings… • String#scrub • regexp base 16
  14. 17.

    CRuby files • Ruby class name/module name • CRuby component

    name (+suffix) • utility functions • etc. 17
  15. 23.
  16. 28.

    the end of string.c void Init_String(void) { rb_cString = rb_define_class("String",

    rb_cObject); rb_include_module(rb_cString, rb_mComparable); … rb_define_method(rb_cString, "empty?", rb_str_empty, 0); rb_define_method(rb_cString, "=~", rb_str_match, 1); rb_define_method(rb_cString, "getbyte", rb_str_getbyte, 1); rb_define_method(rb_cString, "setbyte", rb_str_setbyte, 2); … rb_define_method(rb_cString, "scrub", str_scrub, -1); rb_define_method(rb_cString, "scrub!", str_scrub_bang, -1); … } define method in String class named “scrub” whose implementation is `str_scrub` function, and the number of its arguments is -1 (variable arguments). 28
  17. 29.

    /* * call-seq: * str.scrub -> new_str * str.scrub(repl) ->

    new_str * str.scrub{|bytes|} -> new_str * * If the string is invalid byte sequence then replace invalid bytes * with given replacement character, else returns self. * If block is given, replace invalid bytes with returned value of the block. * * "abc\u3042\x81".scrub #=> "abc\u3042\uFFFD" * "abc\u3042\x81".scrub("*") #=> "abc\u3042*" * "abc\u3042\xE3\x80".scrub{|bytes| '<'+bytes.unpack('H*')[0]+'>' } #=> "abc \u3042<e380>" */ static VALUE str_scrub(int argc, VALUE *argv, VALUE str) { VALUE repl = argc ? (rb_check_arity(argc, 0, 1), argv[0]) : Qnil; VALUE new = rb_str_scrub(str, repl); return NIL_P(new) ? rb_str_dup(str): new; } 29
  18. 30.

    static VALUE str_scrub(int argc, VALUE *argv, VALUE str) { VALUE

    repl = argc ? (rb_check_arity(argc, 0, 1), argv[0]) : Qnil; VALUE s = rb_str_scrub(str, repl); return NIL_P(s) ? rb_str_dup(str): s; } • VALUE: a pointer type for Ruby object. • remind that Ruby’s data is always an object.
 (there’s no primitive type) • argv: C array of VALUEs • argc: the number of VALUEs in argv • arity: the number of arguments • rb_check_arity(argc,min,max): raise exception if arity doesn’t match. • Qnil: `nil` in CRuby source code • NIL_P(x): `x.nil?`; Like Lisp CRuby source uses `_p` instead of `?` 30
  19. 31.

    /** * @param str the string to be scrubbed *

    @param repl the replacement character * @return If given string is invalid, returns a new string. Otherwise, returns Qnil. */ VALUE rb_str_scrub(VALUE str, VALUE repl) { return rb_enc_str_scrub(STR_ENC_GET(str), str, repl); } • STR_ENC_GET(): returns `rb_encoding *` which is C structure to represent Ruby’s Encoding. • There’s some similar relations between C structures and Ruby class like Hash and st_table, Thread class and rb_thread_t and so on. 31
  20. 32.

    VALUE rb_enc_str_scrub(rb_encoding *enc, VALUE str, VALUE repl) { VALUE buf

    = rb_str_buf_new(RSTRING_LEN(str)); const char *rep = /* replacement char */; long replen = strlen(rep); const char *p = RSTRING_PTR(str); const char *e = RSTRING_END(str); const char *p1 = p; while (p < e) { int ret = rb_enc_precise_mbclen(p, e, enc); if (MBCLEN_NEEDMORE_P(ret)) { break; } else if (MBCLEN_CHARFOUND_P(ret)) { p += MBCLEN_CHARFOUND_LEN(ret); } else if (MBCLEN_INVALID_P(ret)) { ... 32
  21. 33.

    [Feature #7361] Pathname#touch • touch(1) • change file access and

    modification times • NOTE: This ticket is rejected. new_file = Pathname.new('location/for/new/file') FileUtils.touch(new_file) 33
  22. 34.

    NAME touch - change file access and modification times SYNOPSIS

    touch [-acm] [-r ref_file|-t time|-d date_time] file… OPTIONS -a: Change the access time of file. -c: Do not create a specified file if it does not exist. -m: Change the modification time of file. http://pubs.opengroup.org/onlinepubs/9699919799/utilities/touch.html 34
  23. 35.

    But… What is this method? • “Behave like touch(1)” •

    This can’t be a correct answer. • This has two different features: • “change file access and modification times” • “create empty file” • It describes the spec, but doesn’t describe what users truly want to do. 35
  24. 36.

    Conclusion of suggesting a new feature • Collect use cases:

    • take a memo if you find a case which should be written shorter. (only this can be a good feature request) • Time class and Integer bit manipulation seems to be improved… • Design API for the group of use cases • and find a good name ;-) • Read ʮAPIσβΠϯέʔεελσΟ ――Rubyͷ࣮ྫ͔ ΒֶͿɻ໰୊ʹଈͨ͠σβΠϯͱීวͷߟ͑ํʯ • http://gihyo.jp/book/2016/978-4-7741-7802-8 36
  25. 37.

    New platforms • On a brand new environment, Ruby sometimes

    doesn’t work. For example clang (a C language family frontend for LLVM), and Microsoft Visual C++ 2015 (VC14). • clang (including llvm-gcc): • It usually behaves like GCC, but different in some points. • Visual C++ 2015 • It introduced the Universal C Runtime, which breaks many compatibility. 37
  26. 38.

    clang (including llvm-gcc) • It optimizes code more aggressive than

    gcc at some point. It breaks Ruby’s GC (because Ruby uses conservative mark & sweep GC). • The most tough one was r34278, in which llvm optimized out the memory allocation for continuation. • You may think no one uses continuation, but actually uses through enumerator. 38
  27. 39.

    Visual C++ 2015 • It introduced the Universal C Runtime,

    which breaks many compatibility. • The most tough part is around _pioinfo: • _pioinfo is an internal array of IO structure. It stores underlying OS file HANDLE, file position (cursor), file attributes, text mode, and so on. Ruby handles the structure • to associate socket and fd: CRuby creates fd with dummy file handle and set socket • to implement overlapped I/O for Windows 2000/XP • to emulate fcntl(2) • But VC2015 make it private. • NOTE: overlapped I/O is a Microsoft’s term similar to nonblocking I/O. 39
  28. 40.

    static void set_pioinfo_extra(void) { # define FUNCTION_RET 0xc3 /* ret

    */ /* get __pioinfo addr with _isatty */ char *p = (char*)get_proc_address("ucrtbase.dll", "_isatty", NULL); char *pend = p; /* _osfile(fh) & FDEV */ int32_t rel; char *rip; /* add rsp, _ */ # define FUNCTION_BEFORE_RET_MARK "\x48\x83\xc4" # define FUNCTION_SKIP_BYTES 1 /* lea rdx,[__pioinfo's addr in RIP-relative 32bit addr] */ # define PIOINFO_MARK "\x48\x8d\x15" if (p) { for (pend += 10; pend < p + 300; pend++) { // find end of function if (memcmp(pend, FUNCTION_BEFORE_RET_MARK, sizeof(FUNCTION_BEFORE_RET_MARK) - 1) == 0 && *(pend + (sizeof(FUNCTION_BEFORE_RET_MARK) - 1) + FUNCTION_SKIP_BYTES) & FUNCTION_RET == FUNCTION_RET) { // search backwards from end of function for (pend -= (sizeof(PIOINFO_MARK) - 1); pend > p; pend--) { if (memcmp(pend, PIOINFO_MARK, sizeof(PIOINFO_MARK) - 1) == 0) { p = pend; goto found; } } break; } } } fprintf(stderr, "unexpected " UCRTBASE "\n"); _exit(1); found: p += sizeof(PIOINFO_MARK) - 1; rel = *(int32_t*)(p); rip = p + sizeof(int32_t); Even if _pioinfo is private, it exists on the memory, and we can get it. The problem is how to reach it. I use the fact _isatty() refers _pioinfo. I can read the function body of _isatty through the function pointer of it, find the instruction whose argument is _pioinfo, and I get the address of _pioinfo. 40
  29. 41.

    after story • Receive a patch to fix the code

    to support Visual C++ 2015’s DEBUG build. • I felt a deep emotion by the fact there’s a person who use such dirty code, and moreover read the code and contribute a fix. What a great thing OSS is! 41
  30. 42.

    Performance improvements • Speeding up • Optimize Ruby Applications •

    Optimize CRuby • some parts (hash, string, etc.) • Ruby VM • Profiling for speeding up • stackprof • perf(1) 42
  31. 43.

    Theory of improving performance 1.Take a benchmark • Take a

    profile of the target application (better) • for example stackprof, derailed_benchmarks. 2.Find a bottle neck (hotspot). 3.Optimize the bottle neck. 4.win! 43
  32. 45.

    % ls benchmark bm_app_answer.rb bm_vm1_gc_short_lived.rb bm_app_aobench.rb bm_vm1_gc_short_with_complex_long.rb bm_app_erb.rb bm_vm1_gc_short_with_long.rb bm_app_factorial.rb

    bm_vm1_gc_short_with_symbol.rb bm_app_fib.rb bm_vm1_gc_wb_ary.rb bm_app_lc_fizzbuzz.rb bm_vm1_gc_wb_ary_promoted.rb bm_app_mandelbrot.rb bm_vm1_gc_wb_obj.rb bm_app_pentomino.rb bm_vm1_gc_wb_obj_promoted.rb bm_app_raise.rb bm_vm1_ivar.rb bm_app_strconcat.rb bm_vm1_ivar_set.rb bm_app_tak.rb bm_vm1_length.rb bm_app_tarai.rb bm_vm1_lvar_init.rb bm_app_uri.rb bm_vm1_lvar_set.rb bm_array_shift.rb bm_vm1_neq.rb bm_hash_aref_dsym.rb bm_vm1_not.rb bm_hash_aref_dsym_long.rb bm_vm1_rescue.rb bm_hash_aref_fix.rb bm_vm1_simplereturn.rb bm_hash_aref_flo.rb bm_vm1_swap.rb bm_hash_aref_miss.rb bm_vm1_yield.rb bm_hash_aref_str.rb bm_vm2_array.rb bm_hash_aref_sym.rb bm_vm2_bigarray.rb bm_hash_aref_sym_long.rb bm_vm2_bighash.rb bm_hash_flatten.rb bm_vm2_case.rb bm_hash_ident_flo.rb bm_vm2_case_lit.rb bm_hash_ident_num.rb bm_vm2_defined_method.rb bm_hash_ident_obj.rb bm_vm2_dstr.rb bm_hash_ident_str.rb bm_vm2_eval.rb bm_hash_ident_sym.rb bm_vm2_method.rb bm_hash_keys.rb bm_vm2_method_missing.rb bm_hash_shift.rb bm_vm2_method_with_block.rb bm_hash_shift_u16.rb bm_vm2_mutex.rb … … 45
  33. 47.
  34. 51.

    No bottle neck… I wish you try with your applications,

    and confirm there’s no bottle neck. Anyway I talk about String#blank? 51
  35. 52.

    class String BLANK_RE = /\A[[:space:]]*\z/ # A string is blank

    if it's empty or contains whitespaces only: # # ''.blank? # => true # ' '.blank? # => true # "\t\n\r".blank? # => true # ' blah '.blank? # => false # # Unicode whitespace is supported: # # "\u00a0".blank? # => true # # @return [true, false] def blank? # The regexp that matches blank strings is expensive. For the case of empty # strings we can speed up this method (~3.5x) with an empty? call. The # penalty for the rest of strings is marginal. empty? || BLANK_RE.match?(self) end end String#blank? Check Unicode aware string emptiness 52
  36. 53.

    Regexp#match? • It was used Regexp#===. • Usually Regular Expression

    match stores $& (MatchData.last_match); it is slow because of creating MatchData object. • → Regexp#match? don’t update $&
 (Ruby 2.4 feature) • Note: Perl has similar optimization 53
  37. 54.

    SSE 4.2 STTNI • String and Text New Instructions •

    Handle 16 bytes at once. • But all the 16 bytes must be available. If not, it will cause SEGV. • ARM SVE (Scalable Vector Extension), an instruction set extension for HPC, which will be used by Post-K computer. VALUE rb_str_blank(VALUE str) { const unsigned char *p = (const unsigned char *)RSTRING_PTR(str); const unsigned char *e = (const unsigned char *)RSTRING_END(str); intptr_t pe = (intptr_t)e; const __m128i mask = _mm_set_epi8(0,0,0,0,0,0,0,0,0,0,9,10,11,12,13,32); const int masksize = 6; const int mode = _SIDD_CMP_EQUAL_ANY|_SIDD_UBYTE_OPS|_SIDD_MASKED_NEGATIVE_POLARITY; if (RSTRING_LEN(str) == 0) return Qtrue; /* set the edge of a page before the end of string */ if (pe & 0xfff > 0xff1) { pe &= ~0xfff; pe |= 0xff1; } for (; (intptr_t)p < pe; p += sizeof(__m128i)) { int idx, len; ptrdiff_t sz; __m128i m; retry: sz = e - p; len = (int)((sz&INT_MAX) | (sz >> 27)); m = _mm_loadu_si128((__m128i const *)p); /* CF: 1 if there's non spaces * ZF: 1 if reached the end */ if (_mm_cmpestra(mask, masksize, m, len, mode)) { /* CF=0 ZF=0 */ continue; } #if 0 /* GCC 6 wrongly generates cmpestri and cmpestri... */ if (_mm_cmpestrc(mask, masksize, m, len, mode)) { /* CF=0 ZF=1 */ return Qtrue; } idx = _mm_cmpestri(mask, masksize, m, len, mode); #else idx = _mm_cmpestri(mask, masksize, m, len, mode); if (sz < idx) return Qtrue; #endif p += idx; if (!str_blank0(&p, e, FALSE)) return Qfalse; goto retry; } if (!str_blank0(&p, e, TRUE)) return Qfalse; return Qtrue; } 54
  38. 55.

    Improve RubyVM • Anyway if there’s no bottle neck in

    Ruby layer, it needs to dive into C layer • Get a profile with C level profiler • OProfile • perf(1) 55
  39. 56.

    # Samples: 32K of event 'cycles:u' # Event count (approx.):

    19800054829 # # Overhead Command Shared Object Symbol # ........ .............. .................. .................................................... # 66.81% miniruby miniruby [.] vm_exec_core 7.03% miniruby miniruby [.] vm_search_method.isra.97 3.62% miniruby miniruby [.] vm_call_iseq_setup_normal_0start_0params_0locals 2.95% miniruby miniruby [.] vm_call_cfunc 2.18% miniruby miniruby [.] rb_ary_entry 0.95% miniruby miniruby [.] invoke_iseq_block_from_c 0.92% miniruby miniruby [.] vm_call_iseq_setup_normal_0start_1params_1locals 0.91% miniruby miniruby [.] rb_ary_push 0.85% miniruby miniruby [.] vm_call_opt_send 0.82% miniruby miniruby [.] ary_ensure_room_for_push 0.73% miniruby miniruby [.] rb_ary_modify 0.71% miniruby miniruby [.] vm_call_iseq_setup_normal_0start_0params_2locals 0.66% miniruby miniruby [.] vm_call_iseq_setup 0.58% miniruby miniruby [.] rb_ary_rotate_bang 0.56% miniruby miniruby [.] vm_call_iseq_setup_normal_0start_0params_1locals 0.52% miniruby miniruby [.] opt_eq_func 0.41% miniruby miniruby [.] method_entry_get 0.38% miniruby miniruby [.] vm_callee_setup_block_arg 0.37% miniruby miniruby [.] rb_ary_transpose 0.34% miniruby miniruby [.] invoke_block_from_c_splattable.constprop.136 0.34% miniruby miniruby [.] vm_exec 0.33% miniruby miniruby [.] rb_yield perf report of optcarrot 56
  40. 58.

    vm_exec_core • The core function of RubyVM (YARV) • bytecode

    interpreter’s method dispatch routine • very large (30KB~40KB) switch-case • Strict to say, “switch-case” is optimized to jump instructions (Direct Threaded Code) 58
  41. 60.

    static VALUE vm_exec_core(rb_thread_t *th, VALUE initial) { register rb_control_frame_t *reg_cfp;

    const VALUE *reg_pc; #include "vmtc.inc" if (UNLIKELY(th == 0)) { return (VALUE)insns_address_table; } reg_cfp = th->cfp; reg_pc = reg_cfp->pc; first: INSN_DISPATCH(); /*****************/ #include "vm.inc" /*****************/ END_INSNS_DISPATCH(); /* unreachable */ rb_bug("vm_eval: unreachable"); goto first; } vm_exec_core (some macros are removed) keeps method/block local information program counter, What bytecode it executes now 60
  42. 61.

    How bytecode interpreter behave v1 PUSH = 1; ADD =

    2; MINUS = 3 pc = 0 # program counter stack = [] iseq = [PUSH, 1, PUSH, 2, ADD] # instruction sequence while (true) { switch (iseq[pc++]) { case PUSH: stack.push(iseq[pc++]); break; case ADD: stack.push(stack.pop + stack.pop); break; case MINUS: stack.push(stack.pop - stack.pop); break; /* ... */ } } 61
  43. 62.

    How bytecode interpreter behave v1 PUSH = 1; ADD =

    2; MINUS = 3 pc = 0 # program counter stack = [] iseq = [PUSH, 1, PUSH, 2, ADD] # instruction sequence while (true) { switch (iseq[pc++]) { case PUSH: stack.push(iseq[pc++]); break; case ADD: stack.push(stack.pop + stack.pop); break; case MINUS: stack.push(stack.pop - stack.pop); break; /* ... */ } } instruction body 62
  44. 63.

    How bytecode interpreter behave v1 PUSH = 1; ADD =

    2; MINUS = 3 pc = 0 # program counter stack = [] iseq = [PUSH, 1, PUSH, 2, ADD] # instruction sequence while (true) { switch (iseq[pc++]) { case PUSH: stack.push(iseq[pc++]); break; case ADD: stack.push(stack.pop + stack.pop); break; case MINUS: stack.push(stack.pop - stack.pop); break; /* ... */ } } 1. loop overhead 2. fetch an instruction 3. jump to the instruction body 63
  45. 64.

    How bytecode interpreter behave v1 PUSH = 1; ADD =

    2; MINUS = 3 pc = 0 # program counter stack = [] iseq = [PUSH, 1, PUSH, 2, ADD] # instruction sequence while (true) { switch (iseq[pc++]) { # indirect branch case PUSH: stack.push(iseq[pc++]); break; case ADD: stack.push(stack.pop + stack.pop); break; case MINUS: stack.push(stack.pop - stack.pop); break; /* ... */ } } branch misprediction 64
  46. 65.

    How bytecode interpreter behave v2: threading code pc = 0

    # program counter stack = [] iseq = [&&label_PUSH, 1, &&label_PUSH, 2, &&label_ADD] goto iseq[pc++] label_PUSH: stack.push(iseq[pc++]); goto iseq[pc++] # indirect branch label_ADD: stack.push(stack.pop + stack.pop); goto iseq[pc++] # indirect branch label_MINUS: stack.push(stack.pop - stack.pop); goto iseq[pc++] # indirect branch /* ... */ 65
  47. 66.

    How bytecode interpreter behave v2: threading code pc = 0

    # program counter stack = [] iseq = [&&label_PUSH, 1, &&label_PUSH, 2, &&label_ADD] goto iseq[pc++] label_PUSH: stack.push(iseq[pc++]); goto iseq[pc++] # indirect branch label_ADD: stack.push(stack.pop + stack.pop); goto iseq[pc++] # indirect branch label_MINUS: stack.push(stack.pop - stack.pop); goto iseq[pc++] # indirect branch /* ... */• No loop overhead • branch prediction friendly 66
  48. 67.
  49. 68.

    perf stat result of optcarrot on SandyBridge $ perf stat

    ./miniruby-gcc611 -v -Ioptcarrot/lib -r./optcarrot/tools/shim optcarrot/bin/ optcarrot --benchmark optcarrot/examples/Lan_Master.nes ruby 2.4.0dev (2016-08-26 trunk 56013) [x86_64-linux] fps: 13.027601466360375 checksum: 59662 Performance counter stats for './miniruby-gcc611 -v -Ioptcarrot/lib -r./optcarrot/tools/ shim optcarrot/bin/optcarrot --benchmark optcarrot/examples/Lan_Master.nes': 14796.375448 task-clock (msec) # 0.998 CPUs utilized 185 context-switches # 0.013 K/sec 0 cpu-migrations # 0.000 K/sec 14,404 page-faults # 0.973 K/sec 33,417,895,565 cycles # 2.259 GHz [83.29%] 11,061,511,903 stalled-cycles-frontend # 33.10% frontend cycles idle [83.34%] 7,225,689,220 stalled-cycles-backend # 21.62% backend cycles idle [66.70%] 51,166,479,085 instructions # 1.53 insns per cycle # 0.22 stalled cycles per insn [83.38%] 9,698,878,651 branches # 655.490 M/sec [83.35%] 507,996,700 branch-misses # 5.24% of all branches [83.34%] 14.820478694 seconds time elapsed 68
  50. 69.

    perf stat result of optcarrot on Haswell % perf stat

    ./miniruby-gcc611 -v -Ioptcarrot/lib -r./optcarrot/tools/shim optcarrot/bin/ optcarrot --benchmark optcarrot/examples/Lan_Master.nes ruby 2.4.0dev (2016-08-26 trunk 56013) [x86_64-linux] fps: 27.269785989915373 checksum: 59662 Performance counter stats for './miniruby-gcc611 -v -Ioptcarrot/lib -r./optcarrot/tools/ shim optcarrot/bin/optcarrot --benchmark optcarrot/examples/Lan_Master.nes': 7214.329827 task-clock:u (msec) # 0.999 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 14849 page-faults:u # 0.002 M/sec 19775865899 cycles:u # 2.741 GHz 50975216372 instructions:u # 2.58 insn per cycle 9662536352 branches:u # 1339.353 M/sec 67884021 branch-misses:u # 0.70% of all branches 7.218712639 seconds time elapsed 69
  51. 70.

    Free lunch by Moore’s law is still alive $ perf

    stat ./miniruby-gcc611 -v -Ioptcarrot/lib -r./optcarrot/tools/shim optcarrot/bin/ optcarrot --benchmark optcarrot/examples/Lan_Master.nes ruby 2.4.0dev (2016-08-26 trunk 56013) [x86_64-linux] Sandy Bridge: fps: 13.027601466360375 14796.375448 task-clock (msec) # 0.998 CPUs utilized 185 context-switches # 0.013 K/sec 14,404 page-faults # 0.973 K/sec 33,417,895,565 cycles # 2.259 GHz [83.29%] 51,166,479,085 instructions # 1.53 insns per cycle 9,698,878,651 branches # 655.490 M/sec [83.35%] 507,996,700 branch-misses # 5.24% of all branches [83.34%] 14.820478694 seconds time elapsed Haswell: fps: 27.269785989915373 7214.329827 task-clock:u (msec) # 0.999 CPUs utilized 0 context-switches:u # 0.000 K/sec 14849 page-faults:u # 0.002 M/sec 19775865899 cycles:u # 2.741 GHz 50975216372 instructions:u # 2.58 insn per cycle 9662536352 branches:u # 1339.353 M/sec 67884021 branch-misses:u # 0.70% of all branches 7.218712639 seconds time elapsed 70
  52. 71.

    /* -*-c-*- ******************************************************/ /****************************************************************/ /****************************************************************/ /** This file is VM

    main loop. ---- This file is auto generated by insns2vm.rb DO NOT TOUCH! If you want to fix something, you must edit 'insns.def' */ INSN_ENTRY(nop){ { DEBUG_ENTER_INSN("nop"); ADD_PC(1+0); PREFETCH(GET_PC()); #define CURRENT_INSN_nop 1 #define INSN_IS_SC() 0 #define INSN_LABEL(lab) LABEL_nop_##lab #define LABEL_IS_SC(lab) LABEL_##lab##_##t COLLECT_USAGE_INSN(BIN(nop)); { #line 40 "insns.def" /* none */ #line 32 "vm.inc" #undef CURRENT_INSN_nop #undef INSN_IS_SC #undef INSN_LABEL #undef LABEL_IS_SC END_INSN(nop);}}} INSN_ENTRY(getlocal){ { VALUE val; rb_num_t level = (rb_num_t)GET_OPERAND(2); lindex_t idx = (lindex_t)GET_OPERAND(1); DEBUG_ENTER_INSN("getlocal"); ADD_PC(1+2); PREFETCH(GET_PC()); #define CURRENT_INSN_getlocal 1 #define INSN_IS_SC() 0 #define INSN_LABEL(lab) LABEL_getlocal_##lab #define LABEL_IS_SC(lab) LABEL_##lab##_##t COLLECT_USAGE_INSN(BIN(getlocal)); COLLECT_USAGE_OPERAND(BIN(getlocal), 0, idx); COLLECT_USAGE_OPERAND(BIN(getlocal), 1, level); { #line 60 "insns.def" int i, lev = (int)level; const VALUE *ep = GET_EP(); for (i = 0; i < lev; i++) { ep = GET_PREV_EP(ep); } val = *(ep - idx); #line 65 "vm.inc" CHECK_VM_STACK_OVERFLOW_FOR_INSN(REG_CFP, 1); PUSH(val); #undef CURRENT_INSN_getlocal #undef INSN_IS_SC #undef INSN_LABEL #undef LABEL_IS_SC END_INSN(getlocal);}}} 71
  53. 72.

    /* -*-c-*- ******************************************************/ /****************************************************************/ /****************************************************************/ /** This file is VM

    main loop. ---- This file is auto generated by insns2vm.rb DO NOT TOUCH! If you want to fix something, you must edit 'insns.def' */ INSN_ENTRY(nop){ { DEBUG_ENTER_INSN("nop"); ADD_PC(1+0); PREFETCH(GET_PC()); #define CURRENT_INSN_nop 1 #define INSN_IS_SC() 0 #define INSN_LABEL(lab) LABEL_nop_##lab #define LABEL_IS_SC(lab) LABEL_##lab##_##t COLLECT_USAGE_INSN(BIN(nop)); { #line 40 "insns.def" /* none */ #line 32 "vm.inc" #undef CURRENT_INSN_nop #undef INSN_IS_SC #undef INSN_LABEL #undef LABEL_IS_SC END_INSN(nop);}}} /* r14: pc (program counter) */ /* r15: cfp (call frame pointer) */ /* NOTE: cop */ INSN_ENTRY(nop){ { ADD_PC(1+0); /* $rdx = $r14 + 8byte */ 209bb0: lea 0x8(%r14),%rdx /* $rax = $r14 */ 209bb4: mov %r14,%rax /* *$r15 = $rdx */ 209bb7: mov %rdx,(%r15) /* $r14 = $rdx */ 209bba: mov %rdx,%r14 END_INSN(nop);}}} /* goto *($rax + 8byte) */ 209bbd: jmpq *0x8(%rax) typedef struct rb_control_frame_struct { const VALUE *pc; /* cfp[0] */ VALUE *sp; /* cfp[1] */ const rb_iseq_t *iseq; /* cfp[2] */ VALUE self; /* cfp[3] */ const VALUE *ep; /* cfp[4] */ const void *block_code; /* cfp[5] */ } rb_control_frame_t; 72
  54. 73.

    opt_plus INSN_ENTRY(opt_plus){ { VALUE val; CALL_CACHE cc = (CALL_CACHE)GET_OPERAND(2); CALL_INFO

    ci = (CALL_INFO)GET_OPERAND(1); VALUE recv = TOPN(1); VALUE obj = TOPN(0); DEBUG_ENTER_INSN("opt_plus"); ADD_PC(1+2); PREFETCH(GET_PC()); POPN(2); #define CURRENT_INSN_opt_plus 1 #define INSN_IS_SC() 0 #define INSN_LABEL(lab) LABEL_opt_plus_##lab #define LABEL_IS_SC(lab) LABEL_##lab##_##t COLLECT_USAGE_INSN(BIN(opt_plus)); COLLECT_USAGE_OPERAND(BIN(opt_plus), 0, ci); COLLECT_USAGE_OPERAND(BIN(opt_plus), 1, cc); { #line 1377 "insns.def" if (FIXNUM_2_P(recv, obj) && BASIC_OP_UNREDEFINED_P(BOP_PLUS,INTEGER_REDEFINED_OP_FLAG)) { /* fixnum + fixnum */ #ifndef LONG_LONG_VALUE VALUE msb = (VALUE)1 << ((sizeof(VALUE) * CHAR_BIT) - 1); val = recv - 1 + obj; if ((~(recv ^ obj) & (recv ^ val)) & msb) { val = rb_int2big((SIGNED_VALUE)((val>>1) | (recv & msb))); } #else val = LONG2NUM(FIX2LONG(recv) + FIX2LONG(obj)); #endif } else if (FLONUM_2_P(recv, obj) && BASIC_OP_UNREDEFINED_P(BOP_PLUS, FLOAT_REDEFINED_OP_FLAG)) { val = DBL2NUM(RFLOAT_VALUE(recv) + RFLOAT_VALUE(obj)); } else if (!SPECIAL_CONST_P(recv) && !SPECIAL_CONST_P(obj)) { if (RBASIC_CLASS(recv) == rb_cFloat && RBASIC_CLASS(obj) == rb_cFloat && BASIC_OP_UNREDEFINED_P(BOP_PLUS, FLOAT_REDEFINED_OP_FLAG)) { val = DBL2NUM(RFLOAT_VALUE(recv) + RFLOAT_VALUE(obj)); } else if (RBASIC_CLASS(recv) == rb_cString && RBASIC_CLASS(obj) == rb_cString && BASIC_OP_UNREDEFINED_P(BOP_PLUS, STRING_REDEFINED_OP_FLAG)) { val = rb_str_plus(recv, obj); } else if (RBASIC_CLASS(recv) == rb_cArray && BASIC_OP_UNREDEFINED_P(BOP_PLUS, ARRAY_REDEFINED_OP_FLAG)) { val = rb_ary_plus(recv, obj); } else { goto INSN_LABEL(normal_dispatch); } } else { INSN_LABEL(normal_dispatch): PUSH(recv); PUSH(obj); CALL_SIMPLE_METHOD(recv); } #line 2073 "vm.inc" CHECK_VM_STACK_OVERFLOW_FOR_INSN(REG_CFP, 1); PUSH(val); #undef CURRENT_INSN_opt_plus #undef INSN_IS_SC #undef INSN_LABEL #undef LABEL_IS_SC END_INSN(opt_plus);}}} 73
  55. 74.

    opt_plus • Optimized case is a home study • It’s

    a bit pattern magic DEFINE_INSN opt_plus (CALL_INFO ci, CALL_CACHE cc) (VALUE recv, VALUE obj) (VALUE val) { if (FIXNUM_2_P(recv, obj) && BASIC_OP_UNREDEFINED_P(BOP_PLUS,INTEGER_REDEFINED_OP_FLAG)) { /* fixnum + fixnum */ #ifndef LONG_LONG_VALUE VALUE msb = (VALUE)1 << ((sizeof(VALUE) * CHAR_BIT) - 1); val = recv - 1 + obj; if ((~(recv ^ obj) & (recv ^ val)) & msb) { val = rb_int2big((SIGNED_VALUE)((val>>1) | (recv & msb))); } #else val = LONG2NUM(FIX2LONG(recv) + FIX2LONG(obj)); #endif } 74
  56. 75.

    opt_plus | Inst | Resource Delay In Cycles | |

    | Num | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | 6 | 7 | FE | | ------------------------------------------------------------------------- | 0 | | | | | | | | | | | mov rax, qword ptr [r15+0x8] | 1 | | | | | | | | | | | mov r12, qword ptr [r14+0x10] | 2 | | | | | | | | | | | lea rbx, ptr [r14+0x18] | 3 | | | 1 | | | | | | | | mov r13, qword ptr [r14+0x8] | 4 | | | | | | | | | 1 | | mov rbp, r15 | 5 | | | | | | | | | | | mov r14, rbx | 6 | | | | | | | | | | | mov rdi, qword ptr [rax-0x10] | 7 | | | | | | | | | | | mov rsi, qword ptr [rax-0x8] | 8 | | | | | | | | | 2 | | lea rdx, ptr [rax-0x10] | 9 | | | | | | | | | 3 | | mov qword ptr [r15], rbx | 10 | | | | | | | | 1 | 2 | | mov qword ptr [r15+0x8], rdx | 11 | | | | | | | | | | | mov rdx, rdi | 12 | | | | | | | | | 3 | | mov qword ptr [rsp+0x10], rsi | 13 | | | | | | | | | | | and rdx, rsi | 14 | | | | | | | | | | | and edx, 0x1 | 15 | | | | | | | | | | | jz 0x12f8 | 16 | | | | 1 | | | | | 3 | CP | mov rdx, qword ptr [rip+0x2fe3a2] | 17 | | | | | | | | | | CP | test byte ptr [rdx+0x668], 0x1 | 18 | | | | | | | | | | CP | jnz 0x12e4 | 19 | | | | | | | | | 5 | | lea rax, ptr [rdi+rsi*1-0x1] | 20 | | | | | | | | | | | mov rdx, rdi | 21 | | | | | | | | | | | xor rsi, rdi | 22 | | | | | | | | | | | not rsi | 23 | | | | | | | | | | | xor rdx, rax | 24 | | | | | | | | | | | test rsi, rdx | 25 | | | | | | | | | | | js 0x438d Resource Conflict on Critical Paths: ----------------------------------------------------------------- | Port | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | 6 | 7 | ----------------------------------------------------------------- | Cycles | 0 0 | 0 | 0 0 | 1 0 | 0 | 0 | 0 | 0 | ----------------------------------------------------------------- List Of Delays On Critical Paths ------------------------------- 12 --> 16 1 Cycles Delay On PORT3_AGU Intel(R) Architecture Code Analyzer Version - 2.1 Analyzed File - ./miniruby Binary Format - 64Bit Architecture - HSW Analysis Type - Latency Latency Analysis Report --------------------------- Latency: 16 Cycles N - port number or number of cycles resource conflict caused delay, DV - Divider pipe (on port 0) D - Data fetch pipe (on ports 2 and 3), CP - on a critical path F - Macro Fusion with the previous instruction occurred * - instruction micro-ops not bound to a port ^ - Micro Fusion happened # - ESP Tracking sync uop was issued @ - Intel(R) AVX to Intel(R) SSE code switch, dozens of cycles penalty is expected ! - instruction not supported, was not accounted in Analysis The Resource delay is counted since all the sources of the instructions are ready and until the needed resource becomes available https://software.intel.com/en-us/articles/intel-architecture-code-analyzer 75
  57. 76.

    opt_plus | Inst | Resource Delay In Cycles | |

    | Num | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | 6 | 7 | FE | | ------------------------------------------------------------------------- | 0 | | | | | | | | | | | mov rax, qword ptr [r15+0x8] | 1 | | | | | | | | | | | mov r12, qword ptr [r14+0x10] | 2 | | | | | | | | | | | lea rbx, ptr [r14+0x18] | 3 | | | 1 | | | | | | | | mov r13, qword ptr [r14+0x8] | 4 | | | | | | | | | 1 | | mov rbp, r15 | 5 | | | | | | | | | | | mov r14, rbx | 6 | | | | | | | | | | | mov rdi, qword ptr [rax-0x10] | 7 | | | | | | | | | | | mov rsi, qword ptr [rax-0x8] | 8 | | | | | | | | | 2 | | lea rdx, ptr [rax-0x10] | 9 | | | | | | | | | 3 | | mov qword ptr [r15], rbx | 10 | | | | | | | | 1 | 2 | | mov qword ptr [r15+0x8], rdx | 11 | | | | | | | | | | | mov rdx, rdi | 12 | | | | | | | | | 3 | | mov qword ptr [rsp+0x10], rsi | 13 | | | | | | | | | | | and rdx, rsi | 14 | | | | | | | | | | | and edx, 0x1 | 15 | | | | | | | | | | | jz 0x12f8 | 16 | | | | 1 | | | | | 3 | CP | mov rdx, qword ptr [rip+0x2fe3a2] | 17 | | | | | | | | | | CP | test byte ptr [rdx+0x668], 0x1 | 18 | | | | | | | | | | CP | jnz 0x12e4 | 19 | | | | | | | | | 5 | | lea rax, ptr [rdi+rsi*1-0x1] | 20 | | | | | | | | | | | mov rdx, rdi | 21 | | | | | | | | | | | xor rsi, rdi | 22 | | | | | | | | | | | not rsi | 23 | | | | | | | | | | | xor rdx, rax | 24 | | | | | | | | | | | test rsi, rdx | 25 | | | | | | | | | | | js 0x438d Resource Conflict on Critical Paths: ----------------------------------------------------------------- | Port | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | 6 | 7 | ----------------------------------------------------------------- | Cycles | 0 0 | 0 | 0 0 | 1 0 | 0 | 0 | 0 | 0 | ----------------------------------------------------------------- List Of Delays On Critical Paths ------------------------------- 12 --> 16 1 Cycles Delay On PORT3_AGU It happens delay because there’s too many instructions. • Increment PC • Set PC to call frame • Get value from stack To avoid that • JIT • compilation overhead; it’s trade-off with its hotness. (Just I said there’s no hotspot…) • Register machine • [Feature #12589] • tough work https://software.intel.com/en-us/articles/intel-architecture-code-analyzer 76
  58. 77.

    Debugging / Profiling / Tracing • Program works but something

    wrong: • unexpected behavior: normal debugging • slow or too much memory • the case you can kill it with SIGTERM • Get stuck • user level: stuck but can kill with SIGKILL • kernel level: cannot kill even with SIGKILL • Crash • Ruby level crash • SEGV (Segmentation Fault) 77
  59. 78.

    Slow or too much memory • perf-top • can check

    what function is running. • ps(1) • procfs (Linux) • ObjectSpace / objspace • count_objects • GC.stat • frsyuki/sigdump: show thread backtrace and memory usage. This needs to be installed in Gemfile before an incident. 78
  60. 79.

    Get stuck • tool • perf top • pid2line •

    Reason • infinite loop (?) in kernel • regular expression match’s backtrack explosion 79
  61. 80.

    strace with CRuby futex(0x555e283e4464, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x555e283e4460, {FUTEX_OP_SET, 0,

    FUTEX_OP_CMP_GT , 1}) = 1 futex(0x555e283e4430, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x555e283e4498, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x555e283e449c, FUTEX_WAIT_PRIVATE, 91, NULL) = -1 EAGAIN (Resource temporarily unavailable) write(6, "!", 1) = 1 futex(0x555e283e4430, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0x555e283e4464, FUTEX_WAIT_PRIVATE, 95, NULL) = 0 futex(0x555e283e449c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x555e283e4498, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT , 1}) = 1 futex(0x555e283e4430, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x555e283e4464, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x555e283e4460, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT , 1}) = 1 futex(0x555e283e4430, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x555e283e4498, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x555e283e449c, FUTEX_WAIT_PRIVATE, 95, NULL) = -1 EAGAIN (Resource temporarily unavailable) write(6, "!", 1) = 1 futex(0x555e283e4430, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0x555e283e4464, FUTEX_WAIT_PRIVATE, 99, NULL) = 0 futex(0x555e283e449c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x555e283e4498, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT , 1}) = 1 futex(0x555e283e4430, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x555e283e4464, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x555e283e4460, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT , 1}) = 1 futex(0x555e283e4430, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x555e283e4498, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x555e283e449c, FUTEX_WAIT_PRIVATE, 99, NULL) = -1 EAGAIN (Resource temporarily unavailable) It sometimes shows like below: 80
  62. 81.

    strace with CRuby futex(0x555e283e4464, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x555e283e4460, {FUTEX_OP_SET, 0,

    FUTEX_OP_CMP_GT , 1}) = 1 futex(0x555e283e4430, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x555e283e4498, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x555e283e449c, FUTEX_WAIT_PRIVATE, 91, NULL) = -1 EAGAIN (Resource temporarily unavailable) write(6, "!", 1) = 1 futex(0x555e283e4430, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0x555e283e4464, FUTEX_WAIT_PRIVATE, 95, NULL) = 0 futex(0x555e283e449c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x555e283e4498, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT , 1}) = 1 futex(0x555e283e4430, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x555e283e4464, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x555e283e4460, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT , 1}) = 1 futex(0x555e283e4430, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x555e283e4498, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x555e283e449c, FUTEX_WAIT_PRIVATE, 95, NULL) = -1 EAGAIN (Resource temporarily unavailable) write(6, "!", 1) = 1 futex(0x555e283e4430, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0x555e283e4464, FUTEX_WAIT_PRIVATE, 99, NULL) = 0 futex(0x555e283e449c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x555e283e4498, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT , 1}) = 1 futex(0x555e283e4430, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x555e283e4464, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x555e283e4460, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT , 1}) = 1 futex(0x555e283e4430, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x555e283e4498, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x555e283e449c, FUTEX_WAIT_PRIVATE, 99, NULL) = -1 EAGAIN (Resource temporarily unavailable) It sometimes shows like below: This is by timer thread: There’s another reason. It needs to a way to get information
 from a living process. 81
  63. 82.

    procfs(5) • You can get many information (30) kstkeip %lu

    The current EIP (instruction pointer). PROC - PROCESS INFORMATION PSEUDO-FILESYSTEM It is running at 140308168160605 = 0x7f9c0a85b55d 82
  64. 83.

    procfs • Read /proc/<pid>/maps and find which library EIP is

    in. • Convert the address into the library local address considering offsets. • 0x7f9c0a85b55d - 0x7f9c0a84b000 = 0x1055d • See symbol table of the library and find it!!! • (You can also use addr2line) 83
  65. 84.

    pid2line.rb • It does such investigation. • https://gist.github.com/nurse/0619b6af90df140508c2 • Note

    that recent Ubuntu requires privilege even if the target process is mine. • It is a good tool because it can get information even if you cannot attach the process into gdb. (= get stuck in kernel) • It doesn’t require install instruction. easy to use. • Can run even on RHEL5/CentOS5 84
  66. 85.

    An example # /opt/td-agent/embedded/bin/ruby pid2line.rb 11010 LWPID: 11010 EIP: 7faa4e58a462

    -> 142462 Elf vaddr: 142462 -> 142462 match_at /var/cache/omnibus/src/ruby/ruby-2.1.10/regexec.c:2627 LWPID: 11011 EIP: 7faa4d81969d -> ec69d Elf vaddr: ec69d -> ec69d __poll_nocancel :? LWPID: 11014 EIP: 7faa4e237a82 -> ba82 Elf vaddr: ba82 -> ba82 __pthread_cond_timedwait :? LWPID: 11015 EIP: 7faa4e237a82 -> ba82 Elf vaddr: ba82 -> ba82 __pthread_cond_timedwait :? LWPID: 11021 EIP: 7faa4d8242c3 -> f72c3 Elf vaddr: f72c3 -> f72c3 __epoll_wait_nocancel :? LWPID: 11022 EIP: 7faa4d8242c3 -> f72c3 Elf vaddr: f72c3 -> f72c3 __epoll_wait_nocancel :? LWPID: 11023 EIP: 7faa4d81975f -> ec75f Elf vaddr: ec75f -> ec75f __GI_ppoll :? the core of regular expression (Oniguruma/Onigmo) It seems regexp’s backtrack explosion. 85
  67. 86.

    SEGV Segmentation fault • Useful information to identify the issue.

    • Ruby level or C level backtrace • loaded libraries • core file 86
  68. 87.

    Ruby level backtrace • You can read it. • But

    in SEGV case it often far from the cause. 87
  69. 88.

    C level backtrace • It shows both public/static function names.

    • It’s actually difficult to get static function’s name. Ruby fetches it from DWARF by itself. • (it needs to compile `gcc -g`) • Tracing back beyond signal trampoline on OS X was also difficult. • Just before _sigtramp maybe buggy. 88
  70. 89.

    core file • the file named ruby.core (or managed by

    coredumpctl) • Generated on SEGV or when manually dumped • The memory dump of Ruby at that time • You can easily debug C layer with gdb • You can debug Ruby layer with gdb… 89
  71. 91.

    as C structures ruby_current_vm pthread lwp living_threads Ruby
 Thread Ruby


    Thread Ruby
 Thread Ruby call frame glibc Linux Kernel call frame call frame … Ruby VM and threads are represented as shown. If you can track this by gdb, you can get Ruby level information… call frame: a structure which saves that Ruby executes in the function, local variables, previous call frame, and so on 91
  72. 93.

    93

  73. 94.

    .gdbinit for CRuby • It is hard because it needs

    to re- implement Ruby’s C functions with gdb’s tiny scripts… • Slow; rb_ps is enough fast but rb_count_objects, which counts Ruby objects by categories, takes some seconds even if nearly empty process. (it should be written in Python…) • It needs to follow Ruby’s changes. • It can read core file. It means you can get Ruby level backtrace of each Ruby threads even if the process is crashed. (note that you can get core file manually from running process with stopping it short time) 94