Slide 1

Slide 1 text

To Know A Garbage Collector Michael R. Bernstein Gotham Ruby Conference New York, New York June 8th, 2013

Slide 2

Slide 2 text

Talk Outline • Who Am I? • Introduce Our Goals • Share Some Influences • Pursue Our Goals • Conclusion

Slide 3

Slide 3 text

Who Am I? • This is my 6th GoRuCo • Professional “Programmer” • Former Computer Science Teacher • MFA From Parsons in Design & Technology • Salad Thought Leader @ Paperless Post

Slide 4

Slide 4 text

I’m obsessed.

Slide 5

Slide 5 text

Introduction: Goals • Get excited about GC, hopefully learn a few things • Think about the connection between programming languages and GC

Slide 6

Slide 6 text

In!luences • The Garbage Collection Handbook by Jones et al.

Slide 7

Slide 7 text

In!luences "The undecidability of liveness is a corollary of the halting problem" - Jones et al.

Slide 8

Slide 8 text

In!luences • Teaching Garbage Collection without Implementing Compilers or Interpreters by Cooper et al. [Findler et al]

Slide 9

Slide 9 text

In!luences • A Uni!ied Theory of Garbage Collection by Bacon et al.

Slide 10

Slide 10 text

Get Excited About Garbage Collection

Slide 11

Slide 11 text

Garbage Collection is a form of automatic memory management which gives a program the appearance of infinite memory by reclaiming allocated objects which are no longer in use.

Slide 12

Slide 12 text

Terminology • Garbage Collection • Heap • Mutator • Collector • Roots • Barriers

Slide 13

Slide 13 text

Heap • A data structure in which objects may be allocated or deallocated in any order

Slide 14

Slide 14 text

Heap

Slide 15

Slide 15 text

Mutator • The part of a running program which executes application code

Slide 16

Slide 16 text

Mutator class Book < ActiveRecord::Base has_many :authors has_many :citations has_many :references has_many :subscriptions has_many :subscribers end

Slide 17

Slide 17 text

Collector • The part of a running program responsible for Garbage Collection

Slide 18

Slide 18 text

Collector static void gc_mark_locations(rb_objspace_t *objspace, VALUE *start, VALUE *end) { long n; if (end <= start) return; n = end - start; mark_locations_array(objspace, start, n); } void rb_gc_mark_locations(VALUE *start, VALUE *end) { gc_mark_locations(&rb_objspace, start, end); } #define rb_gc_mark_locations(start, end) gc_mark_locations(objspace, (start), (end)) struct mark_tbl_arg { rb_objspace_t *objspace; };

Slide 19

Slide 19 text

Garbage collection is automatic memory management. While the mutator runs, it routinely allocates memory from the heap. If more memory than available is needed, the collector reclaims unused memory and returns it to the heap.

Slide 20

Slide 20 text

1960: A Good Year For Garbage Collectors • John McCarthy, Recursive Functions of Symbolic Expressions and Their Computation by Machine, Part I 1960 • George Collins, A Method for Overlapping and Erasure of Lists 1960

Slide 21

Slide 21 text

• McCarthy - Mark and Sweep (Tracing) • Collins - Reference Counting 1960: A Good Year For Garbage Collectors

Slide 22

Slide 22 text

Roots • References that are directly accessible to the Mutator without going through other objects

Slide 23

Slide 23 text

Mark and Sweep (Tracing) def new ref = allocate if ref.nil? mark sweep ref = allocate if ref.nil? raise "Out of memory" end end ref end

Slide 24

Slide 24 text

def mark worklist = Worklist.new heap_roots.each do |root| ref = root.address if ref && !ref.is_marked? ref.set_marked worklist << ref recursive_mark(worklist) end end end Mark and Sweep (Tracing)

Slide 25

Slide 25 text

def sweep object_cursor = heap_start while object_cursor < heap_end if object_cursor.is_marked? object_cursor.unset_marked else object_cursor.free end object_cursor.next end end Mark and Sweep (Tracing)

Slide 26

Slide 26 text

Barrier • Code that runs as a result of accessing or mutating an object on the heap

Slide 27

Slide 27 text

Reference Counting def new ref = allocate if ref.nil? raise "Out of memory" end ref.ref_count = 0 ref end def write(src, i, ref) # A Write Barrier add_reference(ref) delete_reference(src[i]) src[i] = ref end

Slide 28

Slide 28 text

Reference Counting def add_reference(ref) rc.ref_count = rc.ref_count + 1 end def delete_reference(ref) rc.ref_count = rc.ref_count - 1 if rc.ref_count == 0 ref.pointers.each do |field| delete_reference(field.address) end free(ref) end end

Slide 29

Slide 29 text

Pros And Cons • Pro: Reference Counting is incremental. As it works, it frees memory • Con: Reference Counting cannot easily collect cycles, or objects on the heap which reference themselves • Pro: Mark & Sweep can collect cycles • Con: Mark & Sweep can exhibit long pauses and exhibits poor locality

Slide 30

Slide 30 text

The Uni!ied Theory • Tracing and Reference Counting are “duals” of the same operation • In optimized form, they are very similar • Most successful GCs are hybrid Tracer-Counters • Formalized Garbage Collectors with a uniform cost-model

Slide 31

Slide 31 text

• Subtly tweak Reference Counting by buffering calls to free() • Subtly tweak Mark & Sweep by maintaining a true reference count instead of a “live” bit in the mark phase • Hybrid Example: Generational GC The Uni!ied Theory

Slide 32

Slide 32 text

• Design of Garbage Collectors can be made more methodical • Three main decisions: • Partition • Traversal • Trade-offs The Uni!ied Theory

Slide 33

Slide 33 text

Exhale. I know that was exciting.

Slide 34

Slide 34 text

Programming Languages and GC

Slide 35

Slide 35 text

Programming Languages • How are they developed? • How are they designed? • How do they interact with system memory? • What aspects of their design are pertinent to the discussion of GC?

Slide 36

Slide 36 text

Ruby • Dynamic, Multiple Implementations, we all <3 it • MRI - Simple Beginnings, Advanced Future • Rubinius - Thoroughly Modern • JRuby - The Power of the JVM

Slide 37

Slide 37 text

Java • Massive amounts of research into the JVM’s GC • Adaptive • Tunable • Hybrid approach

Slide 38

Slide 38 text

Haskell • Strictly, statically typed • Compiler informs GC • Design of language makes certain aspects of GC simpler • Design of GHC has allowed incremental improvements to the GC

Slide 39

Slide 39 text

Programming Languages • Most great programming languages have worked on their GCs over time • The design of the language heavily influences what is possible in GC

Slide 40

Slide 40 text

“GC is not a generic solution for memory leaks, but a (correct) GC is a generic solution for 'dangling pointers'. Just as there is no general solution for 'loops' (due to undecidability), there is no general solution for 'leaks'.” - Henry Baker

Slide 41

Slide 41 text

In Conclusion • Garbage Collection is a fascinating discipline • Deep knowledge of your tools is very helpful • If we understand GC better, we can make Ruby better

Slide 42

Slide 42 text

Works Cited • David F. Bacon, Perry Cheng, and V. T. Rajan. A uni!ied theory of garbage collection. In OOPSLA 2004, 2004, pages 50-68. • Cooper et. al. Teaching Garbage Collection without Implementing Compilers or Interpreters. In SIGCSE 2013, 2013, pages 385-390. • Robby Findler “The Many Faces of Dr. Scheme” http://www.eecs.northwestern.edu/~robby/ logos/ • Richard Jones, Antony Hosking, and Eliot Moss. The Garbage Collection Handbook: The Art of Automatic Memory Management. CRC Applied Algorithms and Data Structures. Chapman & Hall, August 2012, pages 375-416. • Richard Jones: Garbage Collection Bibliography http://www.cs.kent.ac.uk/people/staff/rej/gcbib/ gcbib.html • Henry Lieberman and Carl E. Hewitt. A real-time garbage collector based on the lifetimes of objects. AI Memo 569a, MIT, April 1981.

Slide 43

Slide 43 text

Thanks! w @ gh michaelrbernste.in mrb_bk mrb