Slide 1

Slide 1 text

GC in Ruby 2.2 [email protected]

Slide 2

Slide 2 text

Memory Management Application View • stack • alloca • heap • with regard to stack: RTTI, auto_ptr, __attribute__(destructor), … • manual with help: arena, buddy memory, memory pool, … • reference counted (shared_ptr, regexp, IO) • GC

Slide 3

Slide 3 text

Memory Management Operating System View • Virtual memory • Segment - segment fault is serious error… however segment is rarely used now • Page - page fault is not error, but may recall very slow disk access • The translation table is stored in TLB (movl cr3, eax) • IPC memory • Pipes • Process shared memory • With regard to IPC memory management, raptor uses mbuf and many ways to avoid copying

Slide 4

Slide 4 text

Memory Management Hardware View • CPU talks to memory through Address Bus and Data Bus — Bus clock cycle is several times slower than CPU clock • SDRAM and RDRAM are High bandwidth (throughput), High latency (100+ CPU cycles) • L1, L2, L3 caches — 90% of memory access is through cache • Multi-way cache lines: the more “ways” the more precise and more complicated circuit • DMA (direct memory access) mode: read from or write memory to device directly • Memory fences: loadload, loadsave, saveload, savesave, volatile (rb_gc_guarded_ptr_val)

Slide 5

Slide 5 text

Simple to use Hard to implement

Slide 6

Slide 6 text

Implemenation Considers… • CPU interruptions (Boehm GC page-fault) • Locality (heap allocations) • Predicting performance (G1GC -XX:MaxGCPauseMillis) • Debugging (how to debug a segfault in GC?) • Pointer compressing (Jikes VM, LLVM compressing on linked list) • Language features (Erlang and Haskell take advantage of immutability) • Internal of C APIs (tcmalloc, jemalloc, … which to use?) • OS APIs (mmap) • (Disable) Compiler optimisations (volatile) • CPU arch (memory fence to ensure execution sequence)

Slide 7

Slide 7 text

Many GCs conservative mark sweep generational CMS (Java) N Y Y G1GC (Java 7) N Y Infinite CPython N N Y Rubinius N Y Y Lua N Y N Go Y Y N Boehm GC Y Y N

Slide 8

Slide 8 text

CRuby GC • Conservative • Bit marking • Lazy sweep • Generational • Incremental marking

Slide 9

Slide 9 text

Implementation Choices • Ruby is not fast, but easy to optimize with C-ext — conservative GC makes C-ext easier to write • GIL, GC don’t need to add locks or spinlocks yet • Cross-architect requirement and code simplicity • GC provides tools for C-ext use

Slide 10

Slide 10 text

Experiments for GC • Rubinius GC • MRuby GC (root_scan, incremental_mark, incremental_sweep)

Slide 11

Slide 11 text

Parallel GC • Many threads mark and sweep • Java 6 - (CMS) concurrent mark and sweep is in fact parallel GC… (-XX:+CMSIncrementalMode)

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

Concurrent GC • Low to zero stop time • Usually achieved by incremental mark/sweep or a separate GC thread • No STW (stop-the-world)

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

How CRuby Achieves “Concurrent” • Trade throughput (~10%) and code complexity to reduce pause time • one mark -> one sweep • one mark -> many sweeps (lazy sweep) • many marks -> many sweeps (tri-color marking)

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

Generational • Based on heuristics: young objects die young • Can not do semispace or mark-compact GC for conservative GC, hard to make efficient pointer- rewriting for platforms, hard for C-Ext

Slide 18

Slide 18 text

Concurrent and Generational • Problem 1: object changed during sweeping • Solve: write barriers

Slide 19

Slide 19 text

Inserting Write Barriers • WB_OBJ • most macros are covered already

Slide 20

Slide 20 text

Concurrent and Generational • Problem 2: marked object changed • Solve: Tri-color invariant

Slide 21

Slide 21 text

Other Optimizations: Bit Marking • There was bit Marking for Copy-On-Write friendly • To represent colors, 2 bits per object is used (the result is 4 bits)

Slide 22

Slide 22 text

When GC? • new object • rb_gc_malloc() • method returned • rb_gc_start() • GC.stress

Slide 23

Slide 23 text

Ways to Control GC • OOB GC (out of band GC) in unicorn, passenger • GC.stress • GC.stop … GC.start • rb_gc_mark() rb_gc_register()

Slide 24

Slide 24 text

Performance Tools • gdb/lldb • rbtrace • tmm1/stackprof • ko1/gc_tracer • require 'gc_tracer' • GC::Tracer.start_logging("log.txt")

Slide 25

Slide 25 text

Useful Methods • GC.latest_gc_info • GC.stat • GC::INTERNAL_CONSTANTS • “string”.freeze

Slide 26

Slide 26 text

References • http://www.atdot.net/~ko1/activities/#idx4 • https://speakerdeck.com/samsaffron/why-ruby-2-dot-1- excites-me • https://speakerdeck.com/pat_shaughnessy/visualizing- garbage-collection-in-rubinius-jruby-and-ruby-2-dot-0 • పఈղ๤ʮG1GCʯ • github’s ruby