with Java I worked in a big company. β This work was similar to assembly line work.. I made a part of a product. I didn't understand whole product. β β 13/207
current work Currently, I work at NaCl. β matz and shyouhei and takaokouji are my co-workers. β shugo is my boss. They are CRuby committers. β β 17/207
Collection for me GC technology is very interesting for me. β GC is a garbage collecting machine. β I've been creating it since then. It's very fun!! β 21/207
It treats long-life objects as a special case. similar to Generational GC. β β LonglifeGC was rejected in CRuby 1.9.2 by some reason. :'( β β 32/207
Traditional M&S GC executes mark and sweep atomically. Ruby application stops during GC (stop-the-world). β β In Lazy sweeping, sweeping is lazy. β 37/207
is a dead object? A dead object is an object that is never referenced by the program. β In GC terms, we say a that dead object is unreachable from Roots. β 52/207
is Parallel Marking? Collector run several marking processes in parallel by using native threads. β β We will be happy on multi-core machine. β 70/207
not perform sweeping in parallel The sweeping is much faster than the marking. You can see ko1's research β <URL:http://www.atdot.net/~ko1/ diary/201011.html#d4> β β 73/207
means.. Tasks are distributed to multiple threads. β The task of marking the entire heap is divided into several tasks, each marking a single branch. β 84/207
law is used to find the maximum expected improvement to an overall system when only part of the system is improved. [cited from `Amdahl's law - Wikipedia'] 102/207
law is used in parallel computing If parallel portion of the system is X% β And number of processors is Y, β How much speedup can we expect? β 103/207
conclusion so far We should consider how we can efficiently balance workloads. So, we use Task Stealing. β β We should eliminate non-parallel parts by using wait-free algorithm. β β 109/207
Deque Deque stands for the Double- Ended Queue. β In Arora's Deque, the deque contains tasks as elements. β It's a wait-free data structure. β 113/207
what ways could shift() cause contention problems? e.g... shift() and pop() could be called at the same time when deque has only one element. β β 123/207
for Arora's Deque A simple data structure for Task Stealing. β Each worker has a single deque. β Stealing (shift operation) is wait- free! β 128/207
Marker uses Arora's Deque as a marking stack. β A "task" means an object. The granularity of the task is very fine. β β This is a naive implementation. β 140/207
point & Bad point Number of calls to Deque's operations was reduced. Marking speed of the worker is improved. β β However, Coarse-grained tasks decrease parallelism. β 155/207
for large Array objects and Hash objects Each marker has a special deque to manage them. β A marker divides them into fixed size tasks. e.g. 0-9 elements of Array, 10-19 elements of Array... β β 162/207
The naive implementation was slow. Grain of the task was too fine. β β A "task" means a branch in Roots Grain of the task is coarse. β β It's faster!! β 164/207
benchmark program is make rdoc make rdoc generates the Ruby documentation. β This benchmark measures execution time and the GC execution time of make rdoc. β β 173/207
many core environment I expect we get a large improvement. e.g. 8 core, 16 core... β β But, my machine has just 2 cores. I can't see it :( β β 178/207
case for Parallel GC If the objects are many. In this case, mark targets is also many. β β If the objects are long-lived. Server-side application? β β 179/207
characteristics of SUPER NARIO GC GC is running in fixed intervals. β A lot of objects are generated to increase GC's burden. Burden = Game Level β β 187/207
to compare Original GC and Parallel GC Original GC pause time is long. This game will be difficult. β β Parallel GC pause time is short. This game will be easy. β β 188/207
OS is not supported Mark Worker uses pthread as native thread. β And, uses some gcc built-in functions. β But, I'll support for Windows eventually. β 198/207