case • This “use case” must be a real example. • It shall not be an artificial one. • Current method is too complicated, and it must be more simple for my use case 8
in the given string. • Above snippet didn’t work as expected if `str` is already UTF-8. (it simply returned the original string) • Anyway what kind of strings are used? str = "\xe3\x81\x82\x80\x80" str.encode("utf-8", invalid: :replace) 11
to handle broken strings easily and secure. • There’s real pages, for example twitter... • mechanism: • wrongly get substring as bytes (though it must do as chars) • insert such invalid bytes into web page template 14
Use Ruby’s encoding conversion engine. • Some people are used to use this way with iconv. • regexp base • Use regular expression engine (Ruby usually uses regexp engine on handling characters) • Better performance and meaning. • naming issue str.encode("utf-8", invalid: :replace) 15
CSI (Code Set Independent; they doesn’t convert into Unicode if it doesn’t need) • but in 21st century, people rarely convert encodings… • String#scrub • regexp base 16
new_str * str.scrub{|bytes|} -> new_str * * If the string is invalid byte sequence then replace invalid bytes * with given replacement character, else returns self. * If block is given, replace invalid bytes with returned value of the block. * * "abc\u3042\x81".scrub #=> "abc\u3042\uFFFD" * "abc\u3042\x81".scrub("*") #=> "abc\u3042*" * "abc\u3042\xE3\x80".scrub{|bytes| '<'+bytes.unpack('H*')[0]+'>' } #=> "abc \u3042<e380>" */ static VALUE str_scrub(int argc, VALUE *argv, VALUE str) { VALUE repl = argc ? (rb_check_arity(argc, 0, 1), argv[0]) : Qnil; VALUE new = rb_str_scrub(str, repl); return NIL_P(new) ? rb_str_dup(str): new; } 29
repl = argc ? (rb_check_arity(argc, 0, 1), argv[0]) : Qnil; VALUE s = rb_str_scrub(str, repl); return NIL_P(s) ? rb_str_dup(str): s; } • VALUE: a pointer type for Ruby object. • remind that Ruby’s data is always an object. (there’s no primitive type) • argv: C array of VALUEs • argc: the number of VALUEs in argv • arity: the number of arguments • rb_check_arity(argc,min,max): raise exception if arity doesn’t match. • Qnil: `nil` in CRuby source code • NIL_P(x): `x.nil?`; Like Lisp CRuby source uses `_p` instead of `?` 30
@param repl the replacement character * @return If given string is invalid, returns a new string. Otherwise, returns Qnil. */ VALUE rb_str_scrub(VALUE str, VALUE repl) { return rb_enc_str_scrub(STR_ENC_GET(str), str, repl); } • STR_ENC_GET(): returns `rb_encoding *` which is C structure to represent Ruby’s Encoding. • There’s some similar relations between C structures and Ruby class like Hash and st_table, Thread class and rb_thread_t and so on. 31
touch [-acm] [-r ref_file|-t time|-d date_time] file… OPTIONS -a: Change the access time of file. -c: Do not create a specified file if it does not exist. -m: Change the modification time of file. http://pubs.opengroup.org/onlinepubs/9699919799/utilities/touch.html 34
This can’t be a correct answer. • This has two different features: • “change file access and modification times” • “create empty file” • It describes the spec, but doesn’t describe what users truly want to do. 35
• take a memo if you find a case which should be written shorter. (only this can be a good feature request) • Time class and Integer bit manipulation seems to be improved… • Design API for the group of use cases • and find a good name ;-) • Read ʮAPIσβΠϯέʔεελσΟ ――Rubyͷ࣮ྫ͔ ΒֶͿɻʹଈͨ͠σβΠϯͱීวͷߟ͑ํʯ • http://gihyo.jp/book/2016/978-4-7741-7802-8 36
doesn’t work. For example clang (a C language family frontend for LLVM), and Microsoft Visual C++ 2015 (VC14). • clang (including llvm-gcc): • It usually behaves like GCC, but different in some points. • Visual C++ 2015 • It introduced the Universal C Runtime, which breaks many compatibility. 37
gcc at some point. It breaks Ruby’s GC (because Ruby uses conservative mark & sweep GC). • The most tough one was r34278, in which llvm optimized out the memory allocation for continuation. • You may think no one uses continuation, but actually uses through enumerator. 38
which breaks many compatibility. • The most tough part is around _pioinfo: • _pioinfo is an internal array of IO structure. It stores underlying OS file HANDLE, file position (cursor), file attributes, text mode, and so on. Ruby handles the structure • to associate socket and fd: CRuby creates fd with dummy file handle and set socket • to implement overlapped I/O for Windows 2000/XP • to emulate fcntl(2) • But VC2015 make it private. • NOTE: overlapped I/O is a Microsoft’s term similar to nonblocking I/O. 39
*/ /* get __pioinfo addr with _isatty */ char *p = (char*)get_proc_address("ucrtbase.dll", "_isatty", NULL); char *pend = p; /* _osfile(fh) & FDEV */ int32_t rel; char *rip; /* add rsp, _ */ # define FUNCTION_BEFORE_RET_MARK "\x48\x83\xc4" # define FUNCTION_SKIP_BYTES 1 /* lea rdx,[__pioinfo's addr in RIP-relative 32bit addr] */ # define PIOINFO_MARK "\x48\x8d\x15" if (p) { for (pend += 10; pend < p + 300; pend++) { // find end of function if (memcmp(pend, FUNCTION_BEFORE_RET_MARK, sizeof(FUNCTION_BEFORE_RET_MARK) - 1) == 0 && *(pend + (sizeof(FUNCTION_BEFORE_RET_MARK) - 1) + FUNCTION_SKIP_BYTES) & FUNCTION_RET == FUNCTION_RET) { // search backwards from end of function for (pend -= (sizeof(PIOINFO_MARK) - 1); pend > p; pend--) { if (memcmp(pend, PIOINFO_MARK, sizeof(PIOINFO_MARK) - 1) == 0) { p = pend; goto found; } } break; } } } fprintf(stderr, "unexpected " UCRTBASE "\n"); _exit(1); found: p += sizeof(PIOINFO_MARK) - 1; rel = *(int32_t*)(p); rip = p + sizeof(int32_t); Even if _pioinfo is private, it exists on the memory, and we can get it. The problem is how to reach it. I use the fact _isatty() refers _pioinfo. I can read the function body of _isatty through the function pointer of it, find the instruction whose argument is _pioinfo, and I get the address of _pioinfo. 40
to support Visual C++ 2015’s DEBUG build. • I felt a deep emotion by the fact there’s a person who use such dirty code, and moreover read the code and contribute a fix. What a great thing OSS is! 41
profile of the target application (better) • for example stackprof, derailed_benchmarks. 2.Find a bottle neck (hotspot). 3.Optimize the bottle neck. 4.win! 43
if it's empty or contains whitespaces only: # # ''.blank? # => true # ' '.blank? # => true # "\t\n\r".blank? # => true # ' blah '.blank? # => false # # Unicode whitespace is supported: # # "\u00a0".blank? # => true # # @return [true, false] def blank? # The regexp that matches blank strings is expensive. For the case of empty # strings we can speed up this method (~3.5x) with an empty? call. The # penalty for the rest of strings is marginal. empty? || BLANK_RE.match?(self) end end String#blank? Check Unicode aware string emptiness 52
match stores $& (MatchData.last_match); it is slow because of creating MatchData object. • → Regexp#match? don’t update $& (Ruby 2.4 feature) • Note: Perl has similar optimization 53
Handle 16 bytes at once. • But all the 16 bytes must be available. If not, it will cause SEGV. • ARM SVE (Scalable Vector Extension), an instruction set extension for HPC, which will be used by Post-K computer. VALUE rb_str_blank(VALUE str) { const unsigned char *p = (const unsigned char *)RSTRING_PTR(str); const unsigned char *e = (const unsigned char *)RSTRING_END(str); intptr_t pe = (intptr_t)e; const __m128i mask = _mm_set_epi8(0,0,0,0,0,0,0,0,0,0,9,10,11,12,13,32); const int masksize = 6; const int mode = _SIDD_CMP_EQUAL_ANY|_SIDD_UBYTE_OPS|_SIDD_MASKED_NEGATIVE_POLARITY; if (RSTRING_LEN(str) == 0) return Qtrue; /* set the edge of a page before the end of string */ if (pe & 0xfff > 0xff1) { pe &= ~0xfff; pe |= 0xff1; } for (; (intptr_t)p < pe; p += sizeof(__m128i)) { int idx, len; ptrdiff_t sz; __m128i m; retry: sz = e - p; len = (int)((sz&INT_MAX) | (sz >> 27)); m = _mm_loadu_si128((__m128i const *)p); /* CF: 1 if there's non spaces * ZF: 1 if reached the end */ if (_mm_cmpestra(mask, masksize, m, len, mode)) { /* CF=0 ZF=0 */ continue; } #if 0 /* GCC 6 wrongly generates cmpestri and cmpestri... */ if (_mm_cmpestrc(mask, masksize, m, len, mode)) { /* CF=0 ZF=1 */ return Qtrue; } idx = _mm_cmpestri(mask, masksize, m, len, mode); #else idx = _mm_cmpestri(mask, masksize, m, len, mode); if (sz < idx) return Qtrue; #endif p += idx; if (!str_blank0(&p, e, FALSE)) return Qfalse; goto retry; } if (!str_blank0(&p, e, TRUE)) return Qfalse; return Qtrue; } 54
interpreter’s method dispatch routine • very large (30KB~40KB) switch-case • Strict to say, “switch-case” is optimized to jump instructions (Direct Threaded Code) 58
wrong: • unexpected behavior: normal debugging • slow or too much memory • the case you can kill it with SIGTERM • Get stuck • user level: stuck but can kill with SIGKILL • kernel level: cannot kill even with SIGKILL • Crash • Ruby level crash • SEGV (Segmentation Fault) 77
what function is running. • ps(1) • procfs (Linux) • ObjectSpace / objspace • count_objects • GC.stat • frsyuki/sigdump: show thread backtrace and memory usage. This needs to be installed in Gemfile before an incident. 78
in. • Convert the address into the library local address considering offsets. • 0x7f9c0a85b55d - 0x7f9c0a84b000 = 0x1055d • See symbol table of the library and find it!!! • (You can also use addr2line) 83
that recent Ubuntu requires privilege even if the target process is mine. • It is a good tool because it can get information even if you cannot attach the process into gdb. (= get stuck in kernel) • It doesn’t require install instruction. easy to use. • Can run even on RHEL5/CentOS5 84
• It’s actually difficult to get static function’s name. Ruby fetches it from DWARF by itself. • (it needs to compile `gcc -g`) • Tracing back beyond signal trampoline on OS X was also difficult. • Just before _sigtramp maybe buggy. 88
coredumpctl) • Generated on SEGV or when manually dumped • The memory dump of Ruby at that time • You can easily debug C layer with gdb • You can debug Ruby layer with gdb… 89
Thread Ruby Thread Ruby call frame glibc Linux Kernel call frame call frame … Ruby VM and threads are represented as shown. If you can track this by gdb, you can get Ruby level information… call frame: a structure which saves that Ruby executes in the function, local variables, previous call frame, and so on 91
to re- implement Ruby’s C functions with gdb’s tiny scripts… • Slow; rb_ps is enough fast but rb_count_objects, which counts Ruby objects by categories, takes some seconds even if nearly empty process. (it should be written in Python…) • It needs to follow Ruby’s changes. • It can read core file. It means you can get Ruby level backtrace of each Ruby threads even if the process is crashed. (note that you can get core file manually from running process with stopping it short time) 94