Upgrade to Pro — share decks privately, control downloads, hide ads and more …

GC in Ruby 2.2

Zete
December 17, 2014

GC in Ruby 2.2

Slides of my local tech share in Beijing, 2014-12-06

Zete

December 17, 2014
Tweet

More Decks by Zete

Other Decks in Programming

Transcript

  1. Memory Management Application View • stack • alloca • heap

    • with regard to stack: RTTI, auto_ptr, __attribute__(destructor), … • manual with help: arena, buddy memory, memory pool, … • reference counted (shared_ptr, regexp, IO) • GC
  2. Memory Management Operating System View • Virtual memory • Segment

    - segment fault is serious error… however segment is rarely used now • Page - page fault is not error, but may recall very slow disk access • The translation table is stored in TLB (movl cr3, eax) • IPC memory • Pipes • Process shared memory • With regard to IPC memory management, raptor uses mbuf and many ways to avoid copying
  3. Memory Management Hardware View • CPU talks to memory through

    Address Bus and Data Bus — Bus clock cycle is several times slower than CPU clock • SDRAM and RDRAM are High bandwidth (throughput), High latency (100+ CPU cycles) • L1, L2, L3 caches — 90% of memory access is through cache • Multi-way cache lines: the more “ways” the more precise and more complicated circuit • DMA (direct memory access) mode: read from or write memory to device directly • Memory fences: loadload, loadsave, saveload, savesave, volatile (rb_gc_guarded_ptr_val)
  4. Implemenation Considers… • CPU interruptions (Boehm GC page-fault) • Locality

    (heap allocations) • Predicting performance (G1GC -XX:MaxGCPauseMillis) • Debugging (how to debug a segfault in GC?) • Pointer compressing (Jikes VM, LLVM compressing on linked list) • Language features (Erlang and Haskell take advantage of immutability) • Internal of C APIs (tcmalloc, jemalloc, … which to use?) • OS APIs (mmap) • (Disable) Compiler optimisations (volatile) • CPU arch (memory fence to ensure execution sequence)
  5. Many GCs conservative mark sweep generational CMS (Java) N Y

    Y G1GC (Java 7) N Y Infinite CPython N N Y Rubinius N Y Y Lua N Y N Go Y Y N Boehm GC Y Y N
  6. CRuby GC • Conservative • Bit marking • Lazy sweep

    • Generational • Incremental marking
  7. Implementation Choices • Ruby is not fast, but easy to

    optimize with C-ext — conservative GC makes C-ext easier to write • GIL, GC don’t need to add locks or spinlocks yet • Cross-architect requirement and code simplicity • GC provides tools for C-ext use
  8. Parallel GC • Many threads mark and sweep • Java

    6 - (CMS) concurrent mark and sweep is in fact parallel GC… (-XX:+CMSIncrementalMode)
  9. Concurrent GC • Low to zero stop time • Usually

    achieved by incremental mark/sweep or a separate GC thread • No STW (stop-the-world)
  10. How CRuby Achieves “Concurrent” • Trade throughput (~10%) and code

    complexity to reduce pause time • one mark -> one sweep • one mark -> many sweeps (lazy sweep) • many marks -> many sweeps (tri-color marking)
  11. Generational • Based on heuristics: young objects die young •

    Can not do semispace or mark-compact GC for conservative GC, hard to make efficient pointer- rewriting for platforms, hard for C-Ext
  12. Other Optimizations: Bit Marking • There was bit Marking for

    Copy-On-Write friendly • To represent colors, 2 bits per object is used (the result is 4 bits)
  13. Ways to Control GC • OOB GC (out of band

    GC) in unicorn, passenger • GC.stress • GC.stop … GC.start • rb_gc_mark() rb_gc_register()
  14. Performance Tools • gdb/lldb • rbtrace • tmm1/stackprof • ko1/gc_tracer

    • require 'gc_tracer' • GC::Tracer.start_logging("log.txt")