Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Andrew Turley on Incremental Mature Garbage Collection Using the Train Algorithm

Andrew Turley on Incremental Mature Garbage Collection Using the Train Algorithm

Automatic garbage collection has spared programmers from an entire class of programming errors related to memory leaks and attempting to access objects that were incorrectly freed. As programs have grown in size and complexity, so have the systems that manage garbage collection. Each algorithm makes a different set of tradeoffs between factors such as the space used by objects, the space used by bookkeeping, the number of unused objects (garbage) that remain uncollected, the time spent in allocation, and the time spent in
collection.

The Train Algorithm is an incremental generational garbage collector that was designed to deal with the long and unpredictable pause times caused by other algorithms. It does this by grouping objects together on "cars" in "trains". The algorithm provides a strategy for moving objects from the younger generation into different cars, moving objects from one car to another, and collecting cars and trains. It was first described by Hudson and Moss in the paper "Incremental
Collection of Mature Objects".

In their paper "Incremental Mature Garbage Collection Using the Train Algorithm," Seligmann and Grarup describe what they believe is the first implementation of the Train Algorithm, which they used to replace the traditional mark-and-sweep collector that was used in the BETA programming language. They discuss their implementation and present information about the performance characteristics of the new algorithm as compared to the original.

Papers_We_Love

February 23, 2015
Tweet

More Decks by Papers_We_Love

Other Decks in Programming

Transcript

  1. About  Me   Andrew  Turley   [email protected]   @casio_juarez  

    Junior  SoEware  Engineer  at  TheLadders     (we’re  hiring!)  
  2. About  The  Paper   Incremental  Mature  Garbage  Collec2on  Using  

    the  Train  Algorithm   Jacob  Seligmann  and  Steffen  Grarup     Presented  at  the  Ninth  European  Conference  on   Object-­‐Oriented  Programming  in  1995  
  3. Overview   Incremental  Mature  Garbage  Collec2on  Using   the  Train

     Algorithm   •  Garbage  Collec2on   •  Mature   •  Incremental   •  The  Train  Algorithm  
  4. Garbage  Collec2on  -­‐  What  Is  Garbage?   // a has

    a member p! a = new ExcellentClass();! ! // create an object and assign it to a.p! a.p = new VeryExcellentClass();! ! // create another object and assign it to a.p! a.p = new VeryExcellentClass();! ! !
  5. Garbage  Collec2on  –  What  Is   Collec2on?   •  Make

     memory  used  by  dead  objects  available   for  exis2ng  or  new  objects  
  6. Garbage  Collec2on  -­‐  Manual  vs   Automa2c   •  Manual

     –  the  programmer  explicitly  tells  the   system  when  an  object  is  no  longer  being  used   •  Automa2c  –  the  system  automa2cally   determines  when  an  object  is  no  longer  being   used   – “no  longer  being  used”  ==  unreachable  from  roots  
  7. Garbage  Collec2on  –  Why  Bother?   •  Might  eventually  run

     out  of  memory   •  Put  related  objects  closer  together  in  memory  
  8. Garbage  Collec2on  -­‐  Two  Main  Types   •  Coun2ng  –

     keep  track  of  how  many  other   objects  reference  an  object,  if  the  number  of   references  reaches  0  then  collect  the  object   •  Tracing  –  when  you  run  out  of  memory,  find   all  the  reachable  objects,  collect  all  the  other   (unreachable)  objects  
  9. Garbage  Collec2on  -­‐  When  Do  You   Collect?   Generally

     (naively)  …   •  Coun2ng  -­‐-­‐  Collect  as  soon  as  an  object   becomes  unreachable  (count  =  0)   •  Tracing  -­‐-­‐  Collect  when  there's  no  more   memory  leE  
  10. Garbage  Collec2on  -­‐  Compac2ng   •  Once  you  get  rid

     of  dead  objects,  move  live   objects  closer  together   •  Can  improve  locality   •  There’s  a  cost  
  11. Garbage  Collec2on  -­‐  Considera2ons   •  Throughput  -­‐-­‐  2me  outside

     of  gc  /  total   running  2me   •  Pause  Times  -­‐-­‐  2me  the  applica2on  is  paused   for  collec2on   •  Memory  Overhead  -­‐-­‐  memory  used  to  track   objects  for  garbage  collec2on  
  12. Mature  -­‐  Separate  Genera2ons  of   Objects   •  Allocate

     objects  in  one  part  of  the  heap,   promote  them  to  another  part  if  they  live  long   enough  
  13. Mature  -­‐  Don't  Need  Collect  All   Genera2ons  At  Once

      •  By  keeping  the  younger  genera2on  smaller,   there  are  fewer  objects  to  trace,  so  collec2ng   this  genera2on  is  fast  
  14. Mature  -­‐  Different  Algorithms  For   Different  Genera2ons   • 

    Younger  genera2ons  are  small  and  unstable   •  Older  genera2ons  are  larger  and  stable   •  Make  different  2me  and  space  tradeoffs  
  15. Mature  -­‐  Remembered  Sets   •  Record  references  between  genera2ons,

     use   old-­‐to-­‐young  as  roots  when  tracing   •  Collector  doesn’t  need  to  scan  Old  when   collec2ng  Young  
  16. Mature  -­‐  You  S2ll  Pay  A  Price   •  If

     the  old  genera2on  is  large,  collec2ng  it  can   take  ...  a  very  long  2me  
  17. Incremental  -­‐  Don’t  Do  Everything  At   Once   • 

    Only  collect  a  subset  of  the  heap  at  a  2me   •  Reduces  pause  2mes  
  18. Incremental  -­‐  Picking  Subsets   •  Collect  related  objects  at

     the  same  2me   (Baker,  1977)   •  How  do  we  figure  out  which  objects  are   "related"?  
  19. Train  Algorithm  -­‐  A  Real   Implementa2on   Jacob  Seligmann

     and  Steffen  Grarup   Goal:  non-­‐disrup2ve  GC  for  interac2ve  systems  
  20. Train  Algorithm  -­‐  Target  Plalorm   •  Mjølner  (me-­‐'ol-­‐near)  BETA

     System   •  Basically  an  IDE  wrinen  in  and  suppor2ng  the   BETA  programming  language   •  Sun  SPARC  IPX  4/50,  40  Mhz,  32MB  RAM  
  21. Train  Algorithm  -­‐  Structure  Of  The   Heap   • 

    Young  Genera2on  (Infant  Object  Area)  -­‐-­‐  semi-­‐ space  copying  collector   •  Old  Genera2on  (Mature  Object  Area)  -­‐-­‐  the   Train  Algorithm  (previously  used  mark-­‐sweep)   •  Large  Value  Area  -­‐-­‐  free  lists,  periodic   compac2on  
  22. Train  Algorithm  -­‐  Structure  Of  The   Mature  Space  

    •  Collec2ons  of  fixed-­‐size  areas  into  which   objects  are  copied   •  fixed-­‐size  area  =  car   •  collec2on  of  cars  =  train   •  Cars  are  ordered  in  trains,  trains  are  ordered   by  crea2on  2me    
  23. Train  Algorithm  -­‐  Tenuring  Strategy   "Objects  promoted  from  younger

     genera2ons   may  be  stored  in  any  train  except  the  one   currently  being  collected,  or  one  or  more  new   trains  may  be  created  to  hold  them."  
  24. Train  Algorithm  -­‐  Evacua2on  Strategy   •  How  are  live

     objects  moved  out  of  the  car   being  collected?   •  Evacuate  to  the  last  car  of  the  first  train  seen  
  25. Train  Algorithm  -­‐  Fu2le  Collec2ons   •  A  “fu2le  collec2on”

     is  a  collec2on  that  does   not  result  in  an  evacua2on,  nor  does  it  reclaim   objects  
  26. Train  Algorithm  -­‐  Why  It  Works   •  Groups  of

     connected  objects  end  up  in  the   same  train   •  Live  objects  are  moved  to  later  trains,  dead   objects  stay  put  and  eventually  their  train  is   collected   •  “It’s  like  the  train  is  headed  for  a  cliff,  and  live   objects  get  pulled  into  other  cars  or  other   trains  by  their  friends.”  –  Franco  Barbeite  
  27. Train  Algorithm  -­‐  Remembered  Sets   •  Cars  have  remembered

     sets  that  contain   informa2on  about  all  references  residing   outside  the  car  poin2ng  into  it   – Only  need  to  remember  reference  from  higher   numbered  to  lower  numbered  cars   •  Cars  have  remembered  sets  that  contain   references  to  the  young  genera2on  
  28. Train  Algorithm  -­‐  Invoca2on  Frequency   •  Trade  off  between

     keeping  the  mature  space   rela2vely  garbage-­‐free  and  not  was2ng  too   much  2me  moving  live  objects   •  "Acceptable  Garbage  Percentage"  and  upper   limit  on  number  of  young  genera2on   collec2ons  between  old  genera2on  collec2ons   •  Garbage  Ra2o  is  es2mated  aEer  each  train   collec2on  by  keeping  track  of  the  size  of  data   directly  tenured  in  each  train  (see  thesis)  
  29. Train  Algorithm  -­‐  Popular  Objects   •  All  Mjølner  BETA

     programs  have  an   environment  object  which  is  referenced  by   lots  of  other  objects   •  Store  this  object  in  a  special  car  with  no   remembered  set,  when  this  car  comes  up  for   collec2on  move  it  to  a  new  train   •  Chea2ng!  (not  really  general  purpose)  
  30. Train  Algorithm  -­‐  Results   •  We  won!   • 

    Reduced  max  and  average  pause  2mes  (3.21s  vs  0.12s  and   1.71s  vs  0.04s  in  the  editor)   •  Minimal  impact  on  GC  overhead  (pointer  book  keeping,   etc)   •  20%  overall  increase  in  old  genera2on  collec2on  2me   •  Small  increase  in  storage  requirements  (larger   remembered  sets,  train  accoun2ng)   •  Slight  increase  in  garbage  overhead  (uncollected  garbage)   •  Popular  object  treatment  was  a  really  good  idea  (30%   worse  without)   •  Improved  locality  
  31. Train  Algorithm  -­‐  And  Then  What?   •  Was  added

     to  version  1.3.1  of  the  Hotspot   JVM  in  1998   – Lots  of  work  by  Alex  Garthwaite   •  Was  later  deprecated  because  of  issues  with   cycle  detec2on  and  high  overhead  with   respect  to  throughput   •  The  ideas  live  on  in  the  G1  (garbage  first)   collector  
  32. Why  I  Love  This  Paper   •  Trains!   • 

    Engineering  over  theory   •  The  Train  Algorithm  is  a  good  illustra2on  of   how  fiendishly  difficult  garbage  collec2on  can   be