Dive into CRuby

Dive into CRuby naruse

Who is naruse? • twitter: @nalsh • GitHub: @nurse •
Ruby committer (M17N, release engineering, etc.) • nkf maintainer • working at Treasure Data, Inc. 2

झࢫ • ϢʔεέʔεͷॏཁੑΛ ஌Β͠ΊɺօʹऩूΛ ख఻ͬͯ΋Β͏ • RubyΑΓ௿͍ϨΠϠͷ ཧղΛਂΊΔ Goal •
Understand how use case is important, and make people gather real examples • Understand layers lower than Ruby 3

Through RubyKaigi 2016, You may want to • write a
new application • improve JRuby or mruby or something • improve CRuby 4

But... What should I do... 5

Motivations • New features • New methods or extend exist
one • New platforms • Optimizations • speed • memory • Instruments to help trouble shooting • to ease debugging 6

New features • New features • New methods or extend
exist one • String#scrub • Regexp#match? • etc. • New platforms • clang • Microsoft Visual C++ 2015 7

Why new feature? • Existing methods doesn’t ﬁt my use
case • This “use case” must be a real example. • It shall not be an artiﬁcial one. • Current method is too complicated, and it must be more simple for my use case 8

–Jonathan Swift, Travels into Several Remote Nations of the World
(1726) “Necessity is the mother of invention.” but... 9

What is the truly necessary one? http://dic.nicovideo.jp/a/ސ٬͕ຊ౰ʹඞཁͩͬͨ΋ͷ 10

[Feature #6752] Replacing ill- formed subsequence • Repair invalid bytes
in the given string. • Above snippet didn’t work as expected if `str` is already UTF-8. (it simply returned the original string) • Anyway what kind of strings are used? str = "\xe3\x81\x82\x80\x80" str.encode("utf-8", invalid: :replace) 11

Case: Web Forms • If strings come from Web Forms
are broken, the web application should simply raise error: 400 Bad Request. • Clients should retry with correct request. 12

Case: Files • If strings come from ﬁles are broken,
it’s usually because it doesn't specify correct encoding. • Retry reading with correct encoding. 13

Case: Web Crawler • They can’t retry. • They need
to handle broken strings easily and secure. • There’s real pages, for example twitter... • mechanism: • wrongly get substring as bytes (though it must do as chars) • insert such invalid bytes into web page template 14

How do I design the interface • transcode base •
Use Ruby’s encoding conversion engine. • Some people are used to use this way with iconv. • regexp base • Use regular expression engine (Ruby usually uses regexp engine on handling characters) • Better performance and meaning. • naming issue str.encode("utf-8", invalid: :replace) 15

Added both APIs • String#encode • tables under enc/trans •
CSI (Code Set Independent; they doesn’t convert into Unicode if it doesn’t need) • but in 21st century, people rarely convert encodings… • String#scrub • regexp base 16

CRuby ﬁles • Ruby class name/module name • CRuby component
name (+sufﬁx) • utility functions • etc. 17

Ruby class/module 18

documents 19

encodings 20

regular expressions Onigmo (Oniguruma fork) 21

Ruby VM (YARV) 22

build a working ﬁeld of nobu 24

platforms NOTE: cygwin has no maintainer 25

tests & benchmarks 26

Impl: String#scrub • This is a method of `String` class. 
→ string.c 27

the end of string.c void Init_String(void) { rb_cString = rb_define_class("String",
rb_cObject); rb_include_module(rb_cString, rb_mComparable); … rb_define_method(rb_cString, "empty?", rb_str_empty, 0); rb_define_method(rb_cString, "=~", rb_str_match, 1); rb_define_method(rb_cString, "getbyte", rb_str_getbyte, 1); rb_define_method(rb_cString, "setbyte", rb_str_setbyte, 2); … rb_define_method(rb_cString, "scrub", str_scrub, -1); rb_define_method(rb_cString, "scrub!", str_scrub_bang, -1); … } deﬁne method in String class named “scrub” whose implementation is `str_scrub` function, and the number of its arguments is -1 (variable arguments). 28

/* * call-seq: * str.scrub -> new_str * str.scrub(repl) ->
new_str * str.scrub{|bytes|} -> new_str * * If the string is invalid byte sequence then replace invalid bytes * with given replacement character, else returns self. * If block is given, replace invalid bytes with returned value of the block. * * "abc\u3042\x81".scrub #=> "abc\u3042\uFFFD" * "abc\u3042\x81".scrub("*") #=> "abc\u3042*" * "abc\u3042\xE3\x80".scrub{|bytes| '<'+bytes.unpack('H*')[0]+'>' } #=> "abc \u3042<e380>" */ static VALUE str_scrub(int argc, VALUE *argv, VALUE str) { VALUE repl = argc ? (rb_check_arity(argc, 0, 1), argv[0]) : Qnil; VALUE new = rb_str_scrub(str, repl); return NIL_P(new) ? rb_str_dup(str): new; } 29

static VALUE str_scrub(int argc, VALUE *argv, VALUE str) { VALUE
repl = argc ? (rb_check_arity(argc, 0, 1), argv[0]) : Qnil; VALUE s = rb_str_scrub(str, repl); return NIL_P(s) ? rb_str_dup(str): s; } • VALUE: a pointer type for Ruby object. • remind that Ruby’s data is always an object.  (there’s no primitive type) • argv: C array of VALUEs • argc: the number of VALUEs in argv • arity: the number of arguments • rb_check_arity(argc,min,max): raise exception if arity doesn’t match. • Qnil: `nil` in CRuby source code • NIL_P(x): `x.nil?`; Like Lisp CRuby source uses `_p` instead of `?` 30

/** * @param str the string to be scrubbed *
@param repl the replacement character * @return If given string is invalid, returns a new string. Otherwise, returns Qnil. */ VALUE rb_str_scrub(VALUE str, VALUE repl) { return rb_enc_str_scrub(STR_ENC_GET(str), str, repl); } • STR_ENC_GET(): returns `rb_encoding *` which is C structure to represent Ruby’s Encoding. • There’s some similar relations between C structures and Ruby class like Hash and st_table, Thread class and rb_thread_t and so on. 31

VALUE rb_enc_str_scrub(rb_encoding *enc, VALUE str, VALUE repl) { VALUE buf
= rb_str_buf_new(RSTRING_LEN(str)); const char *rep = /* replacement char */; long replen = strlen(rep); const char *p = RSTRING_PTR(str); const char *e = RSTRING_END(str); const char *p1 = p; while (p < e) { int ret = rb_enc_precise_mbclen(p, e, enc); if (MBCLEN_NEEDMORE_P(ret)) { break; } else if (MBCLEN_CHARFOUND_P(ret)) { p += MBCLEN_CHARFOUND_LEN(ret); } else if (MBCLEN_INVALID_P(ret)) { ... 32

[Feature #7361] Pathname#touch • touch(1) • change ﬁle access and
modiﬁcation times • NOTE: This ticket is rejected. new_file = Pathname.new('location/for/new/file') FileUtils.touch(new_file) 33

NAME touch - change file access and modification times SYNOPSIS
touch [-acm] [-r ref_file|-t time|-d date_time] file… OPTIONS -a: Change the access time of file. -c: Do not create a specified file if it does not exist. -m: Change the modification time of file. http://pubs.opengroup.org/onlinepubs/9699919799/utilities/touch.html 34

But… What is this method? • “Behave like touch(1)” •
This can’t be a correct answer. • This has two different features: • “change file access and modification times” • “create empty file” • It describes the spec, but doesn’t describe what users truly want to do. 35

Conclusion of suggesting a new feature • Collect use cases:
• take a memo if you ﬁnd a case which should be written shorter. (only this can be a good feature request) • Time class and Integer bit manipulation seems to be improved… • Design API for the group of use cases • and ﬁnd a good name ;-) • Read ʮAPIσβΠϯέʔεελσΟ ――Rubyͷ࣮ྫ͔ ΒֶͿɻ໰୊ʹଈͨ͠σβΠϯͱීวͷߟ͑ํʯ • http://gihyo.jp/book/2016/978-4-7741-7802-8 36

New platforms • On a brand new environment, Ruby sometimes
doesn’t work. For example clang (a C language family frontend for LLVM), and Microsoft Visual C++ 2015 (VC14). • clang (including llvm-gcc): • It usually behaves like GCC, but different in some points. • Visual C++ 2015 • It introduced the Universal C Runtime, which breaks many compatibility. 37

clang (including llvm-gcc) • It optimizes code more aggressive than
gcc at some point. It breaks Ruby’s GC (because Ruby uses conservative mark & sweep GC). • The most tough one was r34278, in which llvm optimized out the memory allocation for continuation. • You may think no one uses continuation, but actually uses through enumerator. 38

Visual C++ 2015 • It introduced the Universal C Runtime,
which breaks many compatibility. • The most tough part is around _pioinfo: • _pioinfo is an internal array of IO structure. It stores underlying OS file HANDLE, file position (cursor), file attributes, text mode, and so on. Ruby handles the structure • to associate socket and fd: CRuby creates fd with dummy file handle and set socket • to implement overlapped I/O for Windows 2000/XP • to emulate fcntl(2) • But VC2015 make it private. • NOTE: overlapped I/O is a Microsoft’s term similar to nonblocking I/O. 39

static void set_pioinfo_extra(void) { # define FUNCTION_RET 0xc3 /* ret
*/ /* get __pioinfo addr with _isatty */ char *p = (char*)get_proc_address("ucrtbase.dll", "_isatty", NULL); char *pend = p; /* _osfile(fh) & FDEV */ int32_t rel; char *rip; /* add rsp, _ */ # define FUNCTION_BEFORE_RET_MARK "\x48\x83\xc4" # define FUNCTION_SKIP_BYTES 1 /* lea rdx,[__pioinfo's addr in RIP-relative 32bit addr] */ # define PIOINFO_MARK "\x48\x8d\x15" if (p) { for (pend += 10; pend < p + 300; pend++) { // find end of function if (memcmp(pend, FUNCTION_BEFORE_RET_MARK, sizeof(FUNCTION_BEFORE_RET_MARK) - 1) == 0 && *(pend + (sizeof(FUNCTION_BEFORE_RET_MARK) - 1) + FUNCTION_SKIP_BYTES) & FUNCTION_RET == FUNCTION_RET) { // search backwards from end of function for (pend -= (sizeof(PIOINFO_MARK) - 1); pend > p; pend--) { if (memcmp(pend, PIOINFO_MARK, sizeof(PIOINFO_MARK) - 1) == 0) { p = pend; goto found; } } break; } } } fprintf(stderr, "unexpected " UCRTBASE "\n"); _exit(1); found: p += sizeof(PIOINFO_MARK) - 1; rel = *(int32_t*)(p); rip = p + sizeof(int32_t); Even if _pioinfo is private, it exists on the memory, and we can get it. The problem is how to reach it. I use the fact _isatty() refers _pioinfo. I can read the function body of _isatty through the function pointer of it, find the instruction whose argument is _pioinfo, and I get the address of _pioinfo. 40

after story • Receive a patch to ﬁx the code
to support Visual C++ 2015’s DEBUG build. • I felt a deep emotion by the fact there’s a person who use such dirty code, and moreover read the code and contribute a ﬁx. What a great thing OSS is! 41

Performance improvements • Speeding up • Optimize Ruby Applications •
Optimize CRuby • some parts (hash, string, etc.) • Ruby VM • Proﬁling for speeding up • stackprof • perf(1) 42

Theory of improving performance 1.Take a benchmark • Take a
proﬁle of the target application (better) • for example stackprof, derailed_benchmarks. 2.Find a bottle neck (hotspot). 3.Optimize the bottle neck. 4.win! 43

benchmark 44

% ls benchmark bm_app_answer.rb bm_vm1_gc_short_lived.rb bm_app_aobench.rb bm_vm1_gc_short_with_complex_long.rb bm_app_erb.rb bm_vm1_gc_short_with_long.rb bm_app_factorial.rb
bm_vm1_gc_short_with_symbol.rb bm_app_fib.rb bm_vm1_gc_wb_ary.rb bm_app_lc_fizzbuzz.rb bm_vm1_gc_wb_ary_promoted.rb bm_app_mandelbrot.rb bm_vm1_gc_wb_obj.rb bm_app_pentomino.rb bm_vm1_gc_wb_obj_promoted.rb bm_app_raise.rb bm_vm1_ivar.rb bm_app_strconcat.rb bm_vm1_ivar_set.rb bm_app_tak.rb bm_vm1_length.rb bm_app_tarai.rb bm_vm1_lvar_init.rb bm_app_uri.rb bm_vm1_lvar_set.rb bm_array_shift.rb bm_vm1_neq.rb bm_hash_aref_dsym.rb bm_vm1_not.rb bm_hash_aref_dsym_long.rb bm_vm1_rescue.rb bm_hash_aref_fix.rb bm_vm1_simplereturn.rb bm_hash_aref_flo.rb bm_vm1_swap.rb bm_hash_aref_miss.rb bm_vm1_yield.rb bm_hash_aref_str.rb bm_vm2_array.rb bm_hash_aref_sym.rb bm_vm2_bigarray.rb bm_hash_aref_sym_long.rb bm_vm2_bighash.rb bm_hash_flatten.rb bm_vm2_case.rb bm_hash_ident_flo.rb bm_vm2_case_lit.rb bm_hash_ident_num.rb bm_vm2_defined_method.rb bm_hash_ident_obj.rb bm_vm2_dstr.rb bm_hash_ident_str.rb bm_vm2_eval.rb bm_hash_ident_sym.rb bm_vm2_method.rb bm_hash_keys.rb bm_vm2_method_missing.rb bm_hash_shift.rb bm_vm2_method_with_block.rb bm_hash_shift_u16.rb bm_vm2_mutex.rb … … 45

Take a proﬁle with New Relic 46

Result 47

Result… 48

bundler rack railties actionpack activesupport actionpack activesupport activerecord 49

String#blank? includes IO wait 50

No bottle neck… I wish you try with your applications,
and conﬁrm there’s no bottle neck. Anyway I talk about String#blank? 51

class String BLANK_RE = /\A[[:space:]]*\z/ # A string is blank
if it's empty or contains whitespaces only: # # ''.blank? # => true # ' '.blank? # => true # "\t\n\r".blank? # => true # ' blah '.blank? # => false # # Unicode whitespace is supported: # # "\u00a0".blank? # => true # # @return [true, false] def blank? # The regexp that matches blank strings is expensive. For the case of empty # strings we can speed up this method (~3.5x) with an empty? call. The # penalty for the rest of strings is marginal. empty? || BLANK_RE.match?(self) end end String#blank? Check Unicode aware string emptiness 52

Regexp#match? • It was used Regexp#===. • Usually Regular Expression
match stores $& (MatchData.last_match); it is slow because of creating MatchData object. • → Regexp#match? don’t update $&  (Ruby 2.4 feature) • Note: Perl has similar optimization 53

SSE 4.2 STTNI • String and Text New Instructions •
Handle 16 bytes at once. • But all the 16 bytes must be available. If not, it will cause SEGV. • ARM SVE (Scalable Vector Extension), an instruction set extension for HPC, which will be used by Post-K computer. VALUE rb_str_blank(VALUE str) { const unsigned char *p = (const unsigned char *)RSTRING_PTR(str); const unsigned char *e = (const unsigned char *)RSTRING_END(str); intptr_t pe = (intptr_t)e; const __m128i mask = _mm_set_epi8(0,0,0,0,0,0,0,0,0,0,9,10,11,12,13,32); const int masksize = 6; const int mode = _SIDD_CMP_EQUAL_ANY|_SIDD_UBYTE_OPS|_SIDD_MASKED_NEGATIVE_POLARITY; if (RSTRING_LEN(str) == 0) return Qtrue; /* set the edge of a page before the end of string */ if (pe & 0xfff > 0xff1) { pe &= ~0xfff; pe |= 0xff1; } for (; (intptr_t)p < pe; p += sizeof(__m128i)) { int idx, len; ptrdiff_t sz; __m128i m; retry: sz = e - p; len = (int)((sz&INT_MAX) | (sz >> 27)); m = _mm_loadu_si128((__m128i const *)p); /* CF: 1 if there's non spaces * ZF: 1 if reached the end */ if (_mm_cmpestra(mask, masksize, m, len, mode)) { /* CF=0 ZF=0 */ continue; } #if 0 /* GCC 6 wrongly generates cmpestri and cmpestri... */ if (_mm_cmpestrc(mask, masksize, m, len, mode)) { /* CF=0 ZF=1 */ return Qtrue; } idx = _mm_cmpestri(mask, masksize, m, len, mode); #else idx = _mm_cmpestri(mask, masksize, m, len, mode); if (sz < idx) return Qtrue; #endif p += idx; if (!str_blank0(&p, e, FALSE)) return Qfalse; goto retry; } if (!str_blank0(&p, e, TRUE)) return Qfalse; return Qtrue; } 54

Improve RubyVM • Anyway if there’s no bottle neck in
Ruby layer, it needs to dive into C layer • Get a profile with C level profiler • OProfile • perf(1) 55

# Samples: 32K of event 'cycles:u' # Event count (approx.):
19800054829 # # Overhead Command Shared Object Symbol # ........ .............. .................. .................................................... # 66.81% miniruby miniruby [.] vm_exec_core 7.03% miniruby miniruby [.] vm_search_method.isra.97 3.62% miniruby miniruby [.] vm_call_iseq_setup_normal_0start_0params_0locals 2.95% miniruby miniruby [.] vm_call_cfunc 2.18% miniruby miniruby [.] rb_ary_entry 0.95% miniruby miniruby [.] invoke_iseq_block_from_c 0.92% miniruby miniruby [.] vm_call_iseq_setup_normal_0start_1params_1locals 0.91% miniruby miniruby [.] rb_ary_push 0.85% miniruby miniruby [.] vm_call_opt_send 0.82% miniruby miniruby [.] ary_ensure_room_for_push 0.73% miniruby miniruby [.] rb_ary_modify 0.71% miniruby miniruby [.] vm_call_iseq_setup_normal_0start_0params_2locals 0.66% miniruby miniruby [.] vm_call_iseq_setup 0.58% miniruby miniruby [.] rb_ary_rotate_bang 0.56% miniruby miniruby [.] vm_call_iseq_setup_normal_0start_0params_1locals 0.52% miniruby miniruby [.] opt_eq_func 0.41% miniruby miniruby [.] method_entry_get 0.38% miniruby miniruby [.] vm_callee_setup_block_arg 0.37% miniruby miniruby [.] rb_ary_transpose 0.34% miniruby miniruby [.] invoke_block_from_c_splattable.constprop.136 0.34% miniruby miniruby [.] vm_exec 0.33% miniruby miniruby [.] rb_yield perf report of optcarrot 56

perf report of ﬂuentd You can conﬁrm you application 57

vm_exec_core • The core function of RubyVM (YARV) • bytecode
interpreter’s method dispatch routine • very large (30KB~40KB) switch-case • Strict to say, “switch-case” is optimized to jump instructions (Direct Threaded Code) 58

vm_exec.c 59

static VALUE vm_exec_core(rb_thread_t *th, VALUE initial) { register rb_control_frame_t *reg_cfp;
const VALUE *reg_pc; #include "vmtc.inc" if (UNLIKELY(th == 0)) { return (VALUE)insns_address_table; } reg_cfp = th->cfp; reg_pc = reg_cfp->pc; first: INSN_DISPATCH(); /*****************/ #include "vm.inc" /*****************/ END_INSNS_DISPATCH(); /* unreachable */ rb_bug("vm_eval: unreachable"); goto first; } vm_exec_core (some macros are removed) keeps method/block local information program counter, What bytecode it executes now 60

How bytecode interpreter behave v1 PUSH = 1; ADD =
2; MINUS = 3 pc = 0 # program counter stack = [] iseq = [PUSH, 1, PUSH, 2, ADD] # instruction sequence while (true) { switch (iseq[pc++]) { case PUSH: stack.push(iseq[pc++]); break; case ADD: stack.push(stack.pop + stack.pop); break; case MINUS: stack.push(stack.pop - stack.pop); break; /* ... */ } } 61

2; MINUS = 3 pc = 0 # program counter stack = [] iseq = [PUSH, 1, PUSH, 2, ADD] # instruction sequence while (true) { switch (iseq[pc++]) { case PUSH: stack.push(iseq[pc++]); break; case ADD: stack.push(stack.pop + stack.pop); break; case MINUS: stack.push(stack.pop - stack.pop); break; /* ... */ } } instruction body 62

2; MINUS = 3 pc = 0 # program counter stack = [] iseq = [PUSH, 1, PUSH, 2, ADD] # instruction sequence while (true) { switch (iseq[pc++]) { case PUSH: stack.push(iseq[pc++]); break; case ADD: stack.push(stack.pop + stack.pop); break; case MINUS: stack.push(stack.pop - stack.pop); break; /* ... */ } } 1. loop overhead 2. fetch an instruction 3. jump to the instruction body 63

2; MINUS = 3 pc = 0 # program counter stack = [] iseq = [PUSH, 1, PUSH, 2, ADD] # instruction sequence while (true) { switch (iseq[pc++]) { # indirect branch case PUSH: stack.push(iseq[pc++]); break; case ADD: stack.push(stack.pop + stack.pop); break; case MINUS: stack.push(stack.pop - stack.pop); break; /* ... */ } } branch misprediction 64

How bytecode interpreter behave v2: threading code pc = 0
# program counter stack = [] iseq = [&&label_PUSH, 1, &&label_PUSH, 2, &&label_ADD] goto iseq[pc++] label_PUSH: stack.push(iseq[pc++]); goto iseq[pc++] # indirect branch label_ADD: stack.push(stack.pop + stack.pop); goto iseq[pc++] # indirect branch label_MINUS: stack.push(stack.pop - stack.pop); goto iseq[pc++] # indirect branch /* ... */ 65

How bytecode interpreter behave v2: threading code pc = 0
# program counter stack = [] iseq = [&&label_PUSH, 1, &&label_PUSH, 2, &&label_ADD] goto iseq[pc++] label_PUSH: stack.push(iseq[pc++]); goto iseq[pc++] # indirect branch label_ADD: stack.push(stack.pop + stack.pop); goto iseq[pc++] # indirect branch label_MINUS: stack.push(stack.pop - stack.pop); goto iseq[pc++] # indirect branch /* ... */• No loop overhead • branch prediction friendly 66

But some people learned that branch prediction for indirect branches
are hard work… Let’s get a proﬁle 67

perf stat result of optcarrot on SandyBridge $ perf stat
./miniruby-gcc611 -v -Ioptcarrot/lib -r./optcarrot/tools/shim optcarrot/bin/ optcarrot --benchmark optcarrot/examples/Lan_Master.nes ruby 2.4.0dev (2016-08-26 trunk 56013) [x86_64-linux] fps: 13.027601466360375 checksum: 59662 Performance counter stats for './miniruby-gcc611 -v -Ioptcarrot/lib -r./optcarrot/tools/ shim optcarrot/bin/optcarrot --benchmark optcarrot/examples/Lan_Master.nes': 14796.375448 task-clock (msec) # 0.998 CPUs utilized 185 context-switches # 0.013 K/sec 0 cpu-migrations # 0.000 K/sec 14,404 page-faults # 0.973 K/sec 33,417,895,565 cycles # 2.259 GHz [83.29%] 11,061,511,903 stalled-cycles-frontend # 33.10% frontend cycles idle [83.34%] 7,225,689,220 stalled-cycles-backend # 21.62% backend cycles idle [66.70%] 51,166,479,085 instructions # 1.53 insns per cycle # 0.22 stalled cycles per insn [83.38%] 9,698,878,651 branches # 655.490 M/sec [83.35%] 507,996,700 branch-misses # 5.24% of all branches [83.34%] 14.820478694 seconds time elapsed 68

perf stat result of optcarrot on Haswell % perf stat
./miniruby-gcc611 -v -Ioptcarrot/lib -r./optcarrot/tools/shim optcarrot/bin/ optcarrot --benchmark optcarrot/examples/Lan_Master.nes ruby 2.4.0dev (2016-08-26 trunk 56013) [x86_64-linux] fps: 27.269785989915373 checksum: 59662 Performance counter stats for './miniruby-gcc611 -v -Ioptcarrot/lib -r./optcarrot/tools/ shim optcarrot/bin/optcarrot --benchmark optcarrot/examples/Lan_Master.nes': 7214.329827 task-clock:u (msec) # 0.999 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 14849 page-faults:u # 0.002 M/sec 19775865899 cycles:u # 2.741 GHz 50975216372 instructions:u # 2.58 insn per cycle 9662536352 branches:u # 1339.353 M/sec 67884021 branch-misses:u # 0.70% of all branches 7.218712639 seconds time elapsed 69

Free lunch by Moore’s law is still alive $ perf
stat ./miniruby-gcc611 -v -Ioptcarrot/lib -r./optcarrot/tools/shim optcarrot/bin/ optcarrot --benchmark optcarrot/examples/Lan_Master.nes ruby 2.4.0dev (2016-08-26 trunk 56013) [x86_64-linux] Sandy Bridge: fps: 13.027601466360375 14796.375448 task-clock (msec) # 0.998 CPUs utilized 185 context-switches # 0.013 K/sec 14,404 page-faults # 0.973 K/sec 33,417,895,565 cycles # 2.259 GHz [83.29%] 51,166,479,085 instructions # 1.53 insns per cycle 9,698,878,651 branches # 655.490 M/sec [83.35%] 507,996,700 branch-misses # 5.24% of all branches [83.34%] 14.820478694 seconds time elapsed Haswell: fps: 27.269785989915373 7214.329827 task-clock:u (msec) # 0.999 CPUs utilized 0 context-switches:u # 0.000 K/sec 14849 page-faults:u # 0.002 M/sec 19775865899 cycles:u # 2.741 GHz 50975216372 instructions:u # 2.58 insn per cycle 9662536352 branches:u # 1339.353 M/sec 67884021 branch-misses:u # 0.70% of all branches 7.218712639 seconds time elapsed 70

/* -*-c-*- ******************************************************/ /****************************************************************/ /****************************************************************/ /** This file is VM
main loop. ---- This file is auto generated by insns2vm.rb DO NOT TOUCH! If you want to fix something, you must edit 'insns.def' */ INSN_ENTRY(nop){ { DEBUG_ENTER_INSN("nop"); ADD_PC(1+0); PREFETCH(GET_PC()); #define CURRENT_INSN_nop 1 #define INSN_IS_SC() 0 #define INSN_LABEL(lab) LABEL_nop_##lab #define LABEL_IS_SC(lab) LABEL_##lab##_##t COLLECT_USAGE_INSN(BIN(nop)); { #line 40 "insns.def" /* none */ #line 32 "vm.inc" #undef CURRENT_INSN_nop #undef INSN_IS_SC #undef INSN_LABEL #undef LABEL_IS_SC END_INSN(nop);}}} INSN_ENTRY(getlocal){ { VALUE val; rb_num_t level = (rb_num_t)GET_OPERAND(2); lindex_t idx = (lindex_t)GET_OPERAND(1); DEBUG_ENTER_INSN("getlocal"); ADD_PC(1+2); PREFETCH(GET_PC()); #define CURRENT_INSN_getlocal 1 #define INSN_IS_SC() 0 #define INSN_LABEL(lab) LABEL_getlocal_##lab #define LABEL_IS_SC(lab) LABEL_##lab##_##t COLLECT_USAGE_INSN(BIN(getlocal)); COLLECT_USAGE_OPERAND(BIN(getlocal), 0, idx); COLLECT_USAGE_OPERAND(BIN(getlocal), 1, level); { #line 60 "insns.def" int i, lev = (int)level; const VALUE *ep = GET_EP(); for (i = 0; i < lev; i++) { ep = GET_PREV_EP(ep); } val = *(ep - idx); #line 65 "vm.inc" CHECK_VM_STACK_OVERFLOW_FOR_INSN(REG_CFP, 1); PUSH(val); #undef CURRENT_INSN_getlocal #undef INSN_IS_SC #undef INSN_LABEL #undef LABEL_IS_SC END_INSN(getlocal);}}} 71

/* -*-c-*- ******************************************************/ /****************************************************************/ /****************************************************************/ /** This file is VM
main loop. ---- This file is auto generated by insns2vm.rb DO NOT TOUCH! If you want to fix something, you must edit 'insns.def' */ INSN_ENTRY(nop){ { DEBUG_ENTER_INSN("nop"); ADD_PC(1+0); PREFETCH(GET_PC()); #define CURRENT_INSN_nop 1 #define INSN_IS_SC() 0 #define INSN_LABEL(lab) LABEL_nop_##lab #define LABEL_IS_SC(lab) LABEL_##lab##_##t COLLECT_USAGE_INSN(BIN(nop)); { #line 40 "insns.def" /* none */ #line 32 "vm.inc" #undef CURRENT_INSN_nop #undef INSN_IS_SC #undef INSN_LABEL #undef LABEL_IS_SC END_INSN(nop);}}} /* r14: pc (program counter) */ /* r15: cfp (call frame pointer) */ /* NOTE: cop */ INSN_ENTRY(nop){ { ADD_PC(1+0); /* $rdx = $r14 + 8byte */ 209bb0: lea 0x8(%r14),%rdx /* $rax = $r14 */ 209bb4: mov %r14,%rax /* *$r15 = $rdx */ 209bb7: mov %rdx,(%r15) /* $r14 = $rdx */ 209bba: mov %rdx,%r14 END_INSN(nop);}}} /* goto *($rax + 8byte) */ 209bbd: jmpq *0x8(%rax) typedef struct rb_control_frame_struct { const VALUE *pc; /* cfp[0] */ VALUE *sp; /* cfp[1] */ const rb_iseq_t *iseq; /* cfp[2] */ VALUE self; /* cfp[3] */ const VALUE *ep; /* cfp[4] */ const void *block_code; /* cfp[5] */ } rb_control_frame_t; 72

opt_plus INSN_ENTRY(opt_plus){ { VALUE val; CALL_CACHE cc = (CALL_CACHE)GET_OPERAND(2); CALL_INFO
ci = (CALL_INFO)GET_OPERAND(1); VALUE recv = TOPN(1); VALUE obj = TOPN(0); DEBUG_ENTER_INSN("opt_plus"); ADD_PC(1+2); PREFETCH(GET_PC()); POPN(2); #define CURRENT_INSN_opt_plus 1 #define INSN_IS_SC() 0 #define INSN_LABEL(lab) LABEL_opt_plus_##lab #define LABEL_IS_SC(lab) LABEL_##lab##_##t COLLECT_USAGE_INSN(BIN(opt_plus)); COLLECT_USAGE_OPERAND(BIN(opt_plus), 0, ci); COLLECT_USAGE_OPERAND(BIN(opt_plus), 1, cc); { #line 1377 "insns.def" if (FIXNUM_2_P(recv, obj) && BASIC_OP_UNREDEFINED_P(BOP_PLUS,INTEGER_REDEFINED_OP_FLAG)) { /* fixnum + fixnum */ #ifndef LONG_LONG_VALUE VALUE msb = (VALUE)1 << ((sizeof(VALUE) * CHAR_BIT) - 1); val = recv - 1 + obj; if ((~(recv ^ obj) & (recv ^ val)) & msb) { val = rb_int2big((SIGNED_VALUE)((val>>1) | (recv & msb))); } #else val = LONG2NUM(FIX2LONG(recv) + FIX2LONG(obj)); #endif } else if (FLONUM_2_P(recv, obj) && BASIC_OP_UNREDEFINED_P(BOP_PLUS, FLOAT_REDEFINED_OP_FLAG)) { val = DBL2NUM(RFLOAT_VALUE(recv) + RFLOAT_VALUE(obj)); } else if (!SPECIAL_CONST_P(recv) && !SPECIAL_CONST_P(obj)) { if (RBASIC_CLASS(recv) == rb_cFloat && RBASIC_CLASS(obj) == rb_cFloat && BASIC_OP_UNREDEFINED_P(BOP_PLUS, FLOAT_REDEFINED_OP_FLAG)) { val = DBL2NUM(RFLOAT_VALUE(recv) + RFLOAT_VALUE(obj)); } else if (RBASIC_CLASS(recv) == rb_cString && RBASIC_CLASS(obj) == rb_cString && BASIC_OP_UNREDEFINED_P(BOP_PLUS, STRING_REDEFINED_OP_FLAG)) { val = rb_str_plus(recv, obj); } else if (RBASIC_CLASS(recv) == rb_cArray && BASIC_OP_UNREDEFINED_P(BOP_PLUS, ARRAY_REDEFINED_OP_FLAG)) { val = rb_ary_plus(recv, obj); } else { goto INSN_LABEL(normal_dispatch); } } else { INSN_LABEL(normal_dispatch): PUSH(recv); PUSH(obj); CALL_SIMPLE_METHOD(recv); } #line 2073 "vm.inc" CHECK_VM_STACK_OVERFLOW_FOR_INSN(REG_CFP, 1); PUSH(val); #undef CURRENT_INSN_opt_plus #undef INSN_IS_SC #undef INSN_LABEL #undef LABEL_IS_SC END_INSN(opt_plus);}}} 73

opt_plus • Optimized case is a home study • It’s
a bit pattern magic DEFINE_INSN opt_plus (CALL_INFO ci, CALL_CACHE cc) (VALUE recv, VALUE obj) (VALUE val) { if (FIXNUM_2_P(recv, obj) && BASIC_OP_UNREDEFINED_P(BOP_PLUS,INTEGER_REDEFINED_OP_FLAG)) { /* fixnum + fixnum */ #ifndef LONG_LONG_VALUE VALUE msb = (VALUE)1 << ((sizeof(VALUE) * CHAR_BIT) - 1); val = recv - 1 + obj; if ((~(recv ^ obj) & (recv ^ val)) & msb) { val = rb_int2big((SIGNED_VALUE)((val>>1) | (recv & msb))); } #else val = LONG2NUM(FIX2LONG(recv) + FIX2LONG(obj)); #endif } 74

opt_plus | Inst | Resource Delay In Cycles | |
| Num | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | 6 | 7 | FE | | ------------------------------------------------------------------------- | 0 | | | | | | | | | | | mov rax, qword ptr [r15+0x8] | 1 | | | | | | | | | | | mov r12, qword ptr [r14+0x10] | 2 | | | | | | | | | | | lea rbx, ptr [r14+0x18] | 3 | | | 1 | | | | | | | | mov r13, qword ptr [r14+0x8] | 4 | | | | | | | | | 1 | | mov rbp, r15 | 5 | | | | | | | | | | | mov r14, rbx | 6 | | | | | | | | | | | mov rdi, qword ptr [rax-0x10] | 7 | | | | | | | | | | | mov rsi, qword ptr [rax-0x8] | 8 | | | | | | | | | 2 | | lea rdx, ptr [rax-0x10] | 9 | | | | | | | | | 3 | | mov qword ptr [r15], rbx | 10 | | | | | | | | 1 | 2 | | mov qword ptr [r15+0x8], rdx | 11 | | | | | | | | | | | mov rdx, rdi | 12 | | | | | | | | | 3 | | mov qword ptr [rsp+0x10], rsi | 13 | | | | | | | | | | | and rdx, rsi | 14 | | | | | | | | | | | and edx, 0x1 | 15 | | | | | | | | | | | jz 0x12f8 | 16 | | | | 1 | | | | | 3 | CP | mov rdx, qword ptr [rip+0x2fe3a2] | 17 | | | | | | | | | | CP | test byte ptr [rdx+0x668], 0x1 | 18 | | | | | | | | | | CP | jnz 0x12e4 | 19 | | | | | | | | | 5 | | lea rax, ptr [rdi+rsi*1-0x1] | 20 | | | | | | | | | | | mov rdx, rdi | 21 | | | | | | | | | | | xor rsi, rdi | 22 | | | | | | | | | | | not rsi | 23 | | | | | | | | | | | xor rdx, rax | 24 | | | | | | | | | | | test rsi, rdx | 25 | | | | | | | | | | | js 0x438d Resource Conflict on Critical Paths: ----------------------------------------------------------------- | Port | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | 6 | 7 | ----------------------------------------------------------------- | Cycles | 0 0 | 0 | 0 0 | 1 0 | 0 | 0 | 0 | 0 | ----------------------------------------------------------------- List Of Delays On Critical Paths ------------------------------- 12 --> 16 1 Cycles Delay On PORT3_AGU Intel(R) Architecture Code Analyzer Version - 2.1 Analyzed File - ./miniruby Binary Format - 64Bit Architecture - HSW Analysis Type - Latency Latency Analysis Report --------------------------- Latency: 16 Cycles N - port number or number of cycles resource conﬂict caused delay, DV - Divider pipe (on port 0) D - Data fetch pipe (on ports 2 and 3), CP - on a critical path F - Macro Fusion with the previous instruction occurred * - instruction micro-ops not bound to a port ^ - Micro Fusion happened # - ESP Tracking sync uop was issued @ - Intel(R) AVX to Intel(R) SSE code switch, dozens of cycles penalty is expected ! - instruction not supported, was not accounted in Analysis The Resource delay is counted since all the sources of the instructions are ready and until the needed resource becomes available https://software.intel.com/en-us/articles/intel-architecture-code-analyzer 75

opt_plus | Inst | Resource Delay In Cycles | |
| Num | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | 6 | 7 | FE | | ------------------------------------------------------------------------- | 0 | | | | | | | | | | | mov rax, qword ptr [r15+0x8] | 1 | | | | | | | | | | | mov r12, qword ptr [r14+0x10] | 2 | | | | | | | | | | | lea rbx, ptr [r14+0x18] | 3 | | | 1 | | | | | | | | mov r13, qword ptr [r14+0x8] | 4 | | | | | | | | | 1 | | mov rbp, r15 | 5 | | | | | | | | | | | mov r14, rbx | 6 | | | | | | | | | | | mov rdi, qword ptr [rax-0x10] | 7 | | | | | | | | | | | mov rsi, qword ptr [rax-0x8] | 8 | | | | | | | | | 2 | | lea rdx, ptr [rax-0x10] | 9 | | | | | | | | | 3 | | mov qword ptr [r15], rbx | 10 | | | | | | | | 1 | 2 | | mov qword ptr [r15+0x8], rdx | 11 | | | | | | | | | | | mov rdx, rdi | 12 | | | | | | | | | 3 | | mov qword ptr [rsp+0x10], rsi | 13 | | | | | | | | | | | and rdx, rsi | 14 | | | | | | | | | | | and edx, 0x1 | 15 | | | | | | | | | | | jz 0x12f8 | 16 | | | | 1 | | | | | 3 | CP | mov rdx, qword ptr [rip+0x2fe3a2] | 17 | | | | | | | | | | CP | test byte ptr [rdx+0x668], 0x1 | 18 | | | | | | | | | | CP | jnz 0x12e4 | 19 | | | | | | | | | 5 | | lea rax, ptr [rdi+rsi*1-0x1] | 20 | | | | | | | | | | | mov rdx, rdi | 21 | | | | | | | | | | | xor rsi, rdi | 22 | | | | | | | | | | | not rsi | 23 | | | | | | | | | | | xor rdx, rax | 24 | | | | | | | | | | | test rsi, rdx | 25 | | | | | | | | | | | js 0x438d Resource Conflict on Critical Paths: ----------------------------------------------------------------- | Port | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | 6 | 7 | ----------------------------------------------------------------- | Cycles | 0 0 | 0 | 0 0 | 1 0 | 0 | 0 | 0 | 0 | ----------------------------------------------------------------- List Of Delays On Critical Paths ------------------------------- 12 --> 16 1 Cycles Delay On PORT3_AGU It happens delay because there’s too many instructions. • Increment PC • Set PC to call frame • Get value from stack To avoid that • JIT • compilation overhead; it’s trade-off with its hotness. (Just I said there’s no hotspot…) • Register machine • [Feature #12589] • tough work https://software.intel.com/en-us/articles/intel-architecture-code-analyzer 76

Debugging / Proﬁling / Tracing • Program works but something
wrong: • unexpected behavior: normal debugging • slow or too much memory • the case you can kill it with SIGTERM • Get stuck • user level: stuck but can kill with SIGKILL • kernel level: cannot kill even with SIGKILL • Crash • Ruby level crash • SEGV (Segmentation Fault) 77

Slow or too much memory • perf-top • can check
what function is running. • ps(1) • procfs (Linux) • ObjectSpace / objspace • count_objects • GC.stat • frsyuki/sigdump: show thread backtrace and memory usage. This needs to be installed in Gemﬁle before an incident. 78

Get stuck • tool • perf top • pid2line •
Reason • inﬁnite loop (?) in kernel • regular expression match’s backtrack explosion 79

strace with CRuby futex(0x555e283e4464, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x555e283e4460, {FUTEX_OP_SET, 0,
FUTEX_OP_CMP_GT , 1}) = 1 futex(0x555e283e4430, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x555e283e4498, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x555e283e449c, FUTEX_WAIT_PRIVATE, 91, NULL) = -1 EAGAIN (Resource temporarily unavailable) write(6, "!", 1) = 1 futex(0x555e283e4430, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0x555e283e4464, FUTEX_WAIT_PRIVATE, 95, NULL) = 0 futex(0x555e283e449c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x555e283e4498, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT , 1}) = 1 futex(0x555e283e4430, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x555e283e4464, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x555e283e4460, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT , 1}) = 1 futex(0x555e283e4430, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x555e283e4498, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x555e283e449c, FUTEX_WAIT_PRIVATE, 95, NULL) = -1 EAGAIN (Resource temporarily unavailable) write(6, "!", 1) = 1 futex(0x555e283e4430, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0x555e283e4464, FUTEX_WAIT_PRIVATE, 99, NULL) = 0 futex(0x555e283e449c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x555e283e4498, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT , 1}) = 1 futex(0x555e283e4430, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x555e283e4464, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x555e283e4460, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT , 1}) = 1 futex(0x555e283e4430, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x555e283e4498, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x555e283e449c, FUTEX_WAIT_PRIVATE, 99, NULL) = -1 EAGAIN (Resource temporarily unavailable) It sometimes shows like below: 80

strace with CRuby futex(0x555e283e4464, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x555e283e4460, {FUTEX_OP_SET, 0,
FUTEX_OP_CMP_GT , 1}) = 1 futex(0x555e283e4430, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x555e283e4498, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x555e283e449c, FUTEX_WAIT_PRIVATE, 91, NULL) = -1 EAGAIN (Resource temporarily unavailable) write(6, "!", 1) = 1 futex(0x555e283e4430, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0x555e283e4464, FUTEX_WAIT_PRIVATE, 95, NULL) = 0 futex(0x555e283e449c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x555e283e4498, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT , 1}) = 1 futex(0x555e283e4430, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x555e283e4464, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x555e283e4460, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT , 1}) = 1 futex(0x555e283e4430, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x555e283e4498, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x555e283e449c, FUTEX_WAIT_PRIVATE, 95, NULL) = -1 EAGAIN (Resource temporarily unavailable) write(6, "!", 1) = 1 futex(0x555e283e4430, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0x555e283e4464, FUTEX_WAIT_PRIVATE, 99, NULL) = 0 futex(0x555e283e449c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x555e283e4498, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT , 1}) = 1 futex(0x555e283e4430, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x555e283e4464, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x555e283e4460, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT , 1}) = 1 futex(0x555e283e4430, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x555e283e4498, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x555e283e449c, FUTEX_WAIT_PRIVATE, 99, NULL) = -1 EAGAIN (Resource temporarily unavailable) It sometimes shows like below: This is by timer thread: There’s another reason. It needs to a way to get information  from a living process. 81

procfs(5) • You can get many information (30) kstkeip %lu
The current EIP (instruction pointer). PROC - PROCESS INFORMATION PSEUDO-FILESYSTEM It is running at 140308168160605 = 0x7f9c0a85b55d 82

procfs • Read /proc/<pid>/maps and find which library EIP is
in. • Convert the address into the library local address considering offsets. • 0x7f9c0a85b55d - 0x7f9c0a84b000 = 0x1055d • See symbol table of the library and find it!!! • (You can also use addr2line) 83

pid2line.rb • It does such investigation. • https://gist.github.com/nurse/0619b6af90df140508c2 • Note
that recent Ubuntu requires privilege even if the target process is mine. • It is a good tool because it can get information even if you cannot attach the process into gdb. (= get stuck in kernel) • It doesn’t require install instruction. easy to use. • Can run even on RHEL5/CentOS5 84

An example # /opt/td-agent/embedded/bin/ruby pid2line.rb 11010 LWPID: 11010 EIP: 7faa4e58a462
-> 142462 Elf vaddr: 142462 -> 142462 match_at /var/cache/omnibus/src/ruby/ruby-2.1.10/regexec.c:2627 LWPID: 11011 EIP: 7faa4d81969d -> ec69d Elf vaddr: ec69d -> ec69d __poll_nocancel :? LWPID: 11014 EIP: 7faa4e237a82 -> ba82 Elf vaddr: ba82 -> ba82 __pthread_cond_timedwait :? LWPID: 11015 EIP: 7faa4e237a82 -> ba82 Elf vaddr: ba82 -> ba82 __pthread_cond_timedwait :? LWPID: 11021 EIP: 7faa4d8242c3 -> f72c3 Elf vaddr: f72c3 -> f72c3 __epoll_wait_nocancel :? LWPID: 11022 EIP: 7faa4d8242c3 -> f72c3 Elf vaddr: f72c3 -> f72c3 __epoll_wait_nocancel :? LWPID: 11023 EIP: 7faa4d81975f -> ec75f Elf vaddr: ec75f -> ec75f __GI_ppoll :? the core of regular expression (Oniguruma/Onigmo) It seems regexp’s backtrack explosion. 85

SEGV Segmentation fault • Useful information to identify the issue.
• Ruby level or C level backtrace • loaded libraries • core ﬁle 86

Ruby level backtrace • You can read it. • But
in SEGV case it often far from the cause. 87

C level backtrace • It shows both public/static function names.
• It’s actually difﬁcult to get static function’s name. Ruby fetches it from DWARF by itself. • (it needs to compile `gcc -g`) • Tracing back beyond signal trampoline on OS X was also difﬁcult. • Just before _sigtramp maybe buggy. 88

core ﬁle • the ﬁle named ruby.core (or managed by
coredumpctl) • Generated on SEGV or when manually dumped • The memory dump of Ruby at that time • You can easily debug C layer with gdb • You can debug Ruby layer with gdb… 89

Inside Ruby Process (Linux) RubyVM Thread Thread OS pthread pthread
lwp lwp 1:1 thread 90

as C structures ruby_current_vm pthread lwp living_threads Ruby  Thread Ruby 
Thread Ruby  Thread Ruby call frame glibc Linux Kernel call frame call frame … Ruby VM and threads are represented as shown. If you can track this by gdb, you can get Ruby level information… call frame: a structure which saves that Ruby executes in the function, local variables, previous call frame, and so on 91

.gdbinit a GDB initialization script, which have many utility functions
to debug CRuby 92

.gdbinit for CRuby • It is hard because it needs
to re- implement Ruby’s C functions with gdb’s tiny scripts… • Slow; rb_ps is enough fast but rb_count_objects, which counts Ruby objects by categories, takes some seconds even if nearly empty process. (it should be written in Python…) • It needs to follow Ruby’s changes. • It can read core ﬁle. It means you can get Ruby level backtrace of each Ruby threads even if the process is crashed. (note that you can get core ﬁle manually from running process with stopping it short time) 94

·ͱΊ • Ϣʔεέʔεʢݱ࣮ͷྫʣ Λऩू͠Α͏ • ݱ࣮ͷΞϓϦέʔγϣϯ ͷϘτϧωοΫΛ஌Ζ͏ Conclusion • Gather
use cases (real snippets) • Proﬁle/Benchmark real applications 95

Dive into CRuby

Dive into CRuby

More Decks by NARUSE, Yui

Other Decks in Programming

Featured

Transcript