GoRuCo 2013

GoRuCo 2013

My talk at GoRuCo 2013


Michael Bernstein

June 08, 2013


  1. To Know A Garbage Collector Michael R. Bernstein Gotham Ruby

    Conference New York, New York June 8th, 2013
  2. Talk Outline • Who Am I? • Introduce Our Goals

    • Share Some Influences • Pursue Our Goals • Conclusion
  3. Who Am I? • This is my 6th GoRuCo •

    Professional “Programmer” • Former Computer Science Teacher • MFA From Parsons in Design & Technology • Salad Thought Leader @ Paperless Post
  4. I’m obsessed.

  5. Introduction: Goals • Get excited about GC, hopefully learn a

    few things • Think about the connection between programming languages and GC
  6. In!luences • The Garbage Collection Handbook by Jones et al.

  7. In!luences "The undecidability of liveness is a corollary of the

    halting problem" - Jones et al.
  8. In!luences • Teaching Garbage Collection without Implementing Compilers or Interpreters

    by Cooper et al. [Findler et al]
  9. In!luences • A Uni!ied Theory of Garbage Collection by Bacon

    et al.
  10. Get Excited About Garbage Collection

  11. Garbage Collection is a form of automatic memory management which

    gives a program the appearance of infinite memory by reclaiming allocated objects which are no longer in use.
  12. Terminology • Garbage Collection • Heap • Mutator • Collector

    • Roots • Barriers
  13. Heap • A data structure in which objects may be

    allocated or deallocated in any order
  14. Heap

  15. Mutator • The part of a running program which executes

    application code
  16. Mutator class Book < ActiveRecord::Base has_many :authors has_many :citations has_many

    :references has_many :subscriptions has_many :subscribers end
  17. Collector • The part of a running program responsible for

    Garbage Collection
  18. Collector static void gc_mark_locations(rb_objspace_t *objspace, VALUE *start, VALUE *end) {

    long n; if (end <= start) return; n = end - start; mark_locations_array(objspace, start, n); } void rb_gc_mark_locations(VALUE *start, VALUE *end) { gc_mark_locations(&rb_objspace, start, end); } #define rb_gc_mark_locations(start, end) gc_mark_locations(objspace, (start), (end)) struct mark_tbl_arg { rb_objspace_t *objspace; };
  19. Garbage collection is automatic memory management. While the mutator runs,

    it routinely allocates memory from the heap. If more memory than available is needed, the collector reclaims unused memory and returns it to the heap.
  20. 1960: A Good Year For Garbage Collectors • John McCarthy,

    Recursive Functions of Symbolic Expressions and Their Computation by Machine, Part I 1960 • George Collins, A Method for Overlapping and Erasure of Lists 1960
  21. • McCarthy - Mark and Sweep (Tracing) • Collins -

    Reference Counting 1960: A Good Year For Garbage Collectors
  22. Roots • References that are directly accessible to the Mutator

    without going through other objects
  23. Mark and Sweep (Tracing) def new ref = allocate if

    ref.nil? mark sweep ref = allocate if ref.nil? raise "Out of memory" end end ref end
  24. def mark worklist = Worklist.new heap_roots.each do |root| ref =

    root.address if ref && !ref.is_marked? ref.set_marked worklist << ref recursive_mark(worklist) end end end Mark and Sweep (Tracing)
  25. def sweep object_cursor = heap_start while object_cursor < heap_end if

    object_cursor.is_marked? object_cursor.unset_marked else object_cursor.free end object_cursor.next end end Mark and Sweep (Tracing)
  26. Barrier • Code that runs as a result of accessing

    or mutating an object on the heap
  27. Reference Counting def new ref = allocate if ref.nil? raise

    "Out of memory" end ref.ref_count = 0 ref end def write(src, i, ref) # A Write Barrier add_reference(ref) delete_reference(src[i]) src[i] = ref end
  28. Reference Counting def add_reference(ref) rc.ref_count = rc.ref_count + 1 end

    def delete_reference(ref) rc.ref_count = rc.ref_count - 1 if rc.ref_count == 0 ref.pointers.each do |field| delete_reference(field.address) end free(ref) end end
  29. Pros And Cons • Pro: Reference Counting is incremental. As

    it works, it frees memory • Con: Reference Counting cannot easily collect cycles, or objects on the heap which reference themselves • Pro: Mark & Sweep can collect cycles • Con: Mark & Sweep can exhibit long pauses and exhibits poor locality
  30. The Uni!ied Theory • Tracing and Reference Counting are “duals”

    of the same operation • In optimized form, they are very similar • Most successful GCs are hybrid Tracer-Counters • Formalized Garbage Collectors with a uniform cost-model
  31. • Subtly tweak Reference Counting by buffering calls to free()

    • Subtly tweak Mark & Sweep by maintaining a true reference count instead of a “live” bit in the mark phase • Hybrid Example: Generational GC The Uni!ied Theory
  32. • Design of Garbage Collectors can be made more methodical

    • Three main decisions: • Partition • Traversal • Trade-offs The Uni!ied Theory
  33. Exhale. I know that was exciting.

  34. Programming Languages and GC

  35. Programming Languages • How are they developed? • How are

    they designed? • How do they interact with system memory? • What aspects of their design are pertinent to the discussion of GC?
  36. Ruby • Dynamic, Multiple Implementations, we all <3 it •

    MRI - Simple Beginnings, Advanced Future • Rubinius - Thoroughly Modern • JRuby - The Power of the JVM
  37. Java • Massive amounts of research into the JVM’s GC

    • Adaptive • Tunable • Hybrid approach
  38. Haskell • Strictly, statically typed • Compiler informs GC •

    Design of language makes certain aspects of GC simpler • Design of GHC has allowed incremental improvements to the GC
  39. Programming Languages • Most great programming languages have worked on

    their GCs over time • The design of the language heavily influences what is possible in GC
  40. “GC is not a generic solution for memory leaks, but

    a (correct) GC is a generic solution for 'dangling pointers'. Just as there is no general solution for 'loops' (due to undecidability), there is no general solution for 'leaks'.” - Henry Baker
  41. In Conclusion • Garbage Collection is a fascinating discipline •

    Deep knowledge of your tools is very helpful • If we understand GC better, we can make Ruby better
  42. Works Cited • David F. Bacon, Perry Cheng, and V.

    T. Rajan. A uni!ied theory of garbage collection. In OOPSLA 2004, 2004, pages 50-68. • Cooper et. al. Teaching Garbage Collection without Implementing Compilers or Interpreters. In SIGCSE 2013, 2013, pages 385-390. • Robby Findler “The Many Faces of Dr. Scheme” http://www.eecs.northwestern.edu/~robby/ logos/ • Richard Jones, Antony Hosking, and Eliot Moss. The Garbage Collection Handbook: The Art of Automatic Memory Management. CRC Applied Algorithms and Data Structures. Chapman & Hall, August 2012, pages 375-416. • Richard Jones: Garbage Collection Bibliography http://www.cs.kent.ac.uk/people/staff/rej/gcbib/ gcbib.html • Henry Lieberman and Carl E. Hewitt. A real-time garbage collector based on the lifetimes of objects. AI Memo 569a, MIT, April 1981.
  43. Thanks! w @ gh michaelrbernste.in mrb_bk mrb