Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Teaching Garbage Collection

csaunders
February 04, 2015

Teaching Garbage Collection

Papers we Love Presentation on "Teaching Garbage Collection without Implementing Compilers or Interpreters"

You can find the paper here: http://cs.brown.edu/~sk/Publications/Papers/Published/cgkmf-teach-gc/

csaunders

February 04, 2015
Tweet

More Decks by csaunders

Other Decks in Technology

Transcript

  1. Teaching Garbage Collection Hello I’d like to start off by

    thanking everyone for coming out. I’m going to be covering the paper “Teaching Garbage Collection without Implementing Compilers or Interpreters” by Cooper et al
  2. Fundamentals What is Garbage Collection? Obstacles to Learning Tools and

    Demonstration Reflection and Closing Remarks - Before we can dig right into the paper we’ll need to cover a couple of fundamentals - I’ll then cover what the broad idea of Garbage Collection is - We’ll then step into the subjects that Cooper and his team found to be obstacles for building Collectors - Afterwards I’ll cover some of the ways that the tools they built aid in the learning process, this will include a demonstration of a very simple mark and sweep garbage collector that I built. - We’ll finish off with some remarks about the collector and how in some situations it might fall short and what we could do to improve it
  3. Memory? First off, before we can talk about collectors. Let’s

    clarify what memory is. In a computer the memory is where the data we are working with lives. I am going to be working with a simplified model of memory to make explaining and hopefully understanding a bit easier. There are two kinds of ways we can access memory
  4. Garbage? Garbage? When we are talking about computers and garbage

    what do we mean? When our programs run we create things, these could be objects, lists, arrays, you name it. Typically we create it and hold onto it for the duration of a computation. When we are done with out computation, we return the result stop looking at those things we created. Some languages make you responsible for ensuring that the data is destroyed when you are done with it. Though some languages such as Ruby and Lisp take care of these details for us. Until the language has done something about this lost data, it will be somewhere in our program, we just can’t find it.
  5. The Stack We can think of the stack as a

    Pez dispenser. Whenever we want to add data to it we take our little data pez’s and push them onto the stack. The stack is memory that is directly allocated. If we are working with known amounts of data we can easily place this data on the stack. When we are done with our data we remove them from the stack by popping them off. The stack is super important for garbage collection because the stack holds onto very important pieces of information. Without the stack we wouldn’t be able to find garbage.
  6. The Heap Though what do we use if we don’t

    know how big something will be at compile time? This is where the heap comes in. The heap is kind of like a whiteboard, when we need store something whose size we don’t know at compile time we try to grab some space from the heap and take note of where it is. Initially the whiteboard starts off completely blank, but over time it becomes more and more cluttered as we allocate things onto it. It’s the garbage job of the garbage collect to go through the heap and remove unused data.
  7. Allocation? In order to use the heap we need to

    allocate some space on it. We keep little notes about where the data is on the heap by putting keeping track of it on the stack.
  8. 8 a 9 c So here we have an example

    of what could be going on. The top is our heap and the bottom area is our stack. Our stack grows from left to right, and the little arrow points to the end of our stack, which is called our Stack Pointer. Everything up to that stack pointer is considered live alive, these living things are our roots. In this heap we have some extra data in there that nobody is using. In some languages it is our responsibility to ensure that never happens. If it does it means we have a problem, also known as a memory leak!
  9. Garbage Collector We have come up with an approach to

    this, which is called Garbage Collection. That is, we build a program that will take care of keeping note of what items are still important and which ones nobody cares about anymore. When there is data that nobody cares about then it is garbage and the program goes in and removes it. This clears up the heap so that we can use it again for something else.
  10. 8 a 9 c PLO PLO There are many approaches

    to garbage collection. We’ll use an approach called Mark and Sweep in this example. First we we go through all of the objects on our heap and see what bits of data have a reference in the stack. If they do we mark them (signified by the checkmark).
  11. 8 a 9 c PLO PLO Then we go through

    the stack again, and if the chunk of memory hasn’t been marked we know that it is garbage and free it.
  12. Theory So we can easily cover the theory behind what

    Garbage Collection is and what it’s part is. Though, knowing something in theory and understanding it enough to implement it are an entirely different thing.
  13. Parsers Often in order to build a language you need

    to have a strong enough understanding of parsers. This is required to build out a thing called a syntax tree. These are used to validate that the words written down are correct. If they aren’t the program cannot run.
  14. Compilers or Interpreters Just parsing a program is one part

    of building a language. You also need to know how to turn that syntax tree into code that the computer can actually run. Your programming language could be interpreted or compiled, what’s important is this is the part that would be responsible for managing the data that goes onto the stack and the heap.
  15. Cooper et al. “Implementing a GC is taxing because it

    requires low-level concepts … and most real- world runtime systems … are not designed to enable easy modification” We could take a look into a garbage collector like the one in the Java Virtual Machine or perhaps Ruby, but most systems aren’t built to make it easy to change these. So students are stuck with very few friendly options.
  16. Teaching with Toy Languages Educators don’t want these prerequisites to

    get in the way of their teaching. They’ve come up with solutions that do help their students implement a garbage collector, though they aren’t always perfect. Often the tools that exist to cover this subject are quite limited in scope and lack the completeness that would allow students to uncover flaws in their garbage collector implementations.
  17. Breaking Down the Barriers The professors in the paper present

    a tool that they’ve built that removes many of the obstacles and even provides better tools for inspecting the heap. They’ve done this by building two systems: plai/collector and plai/mutator
  18. Mutator? Mutators are your programs. In order to perform computations

    you need to mutate the heap, which is done by a collector who takes care of giving these programs the memory they require.
  19. Here is a basic program that takes in a list

    and builds a sum of all the items in the list. This program is our mutator because the operations such as declaring values such 1, 2, 3 and the list that contains those numbers all result in mutations to our heap.
  20. Collector? Alright so we’ve been talking about some data structures

    and what a mutator is, but how about the actual garbage collection aspect of this system they’ve built?
  21. There’s a number of requirements that need to be met

    before our program qualifies as a collector. The following methods are what are required to meet the plai/collector interface. Our collector program can do whatever it wants, but unless it implements these functions it doesn’t qualify as a plai/collector. The collector also needs to work on two different types of data, cons cells and flat values.
  22. Data Types Our collectors are going to be working with

    two basic data types. These are the kinds of data that our programs (the mutators) will be able to work with.
  23. Cons Cells first (aka - head, car) rest (aka -

    tail, cdr) (a refresher) A cons cell is the basic data structure of Lisps. They consist of two pointers, one for the value and another which points to another cons cell or nil. Using this data structure we can represent lists of arbitrary size.
  24. Flat Values With a basic lisp the other type of

    data we can have are the flat values, these encompass things such as symbols, numbers, procedures and so on. Our cons cells are lists of references to data, these are the other kind of data that they can point to.
  25. Cooper et al. “What exactly is at location 14? …

    While such activity builds character, it may not fit the constraints of some teaching schedules” Heap Visualization More traditional Garbage Collector approach it requires working with raw heap memory. Often this memory is pretty hard to read and requires dumping the heaps contents.
  26. Here we can see what the execution of a program

    looks like. We have a large amount of heap space for our program so in this case we didn’t end up actually having to run our garbage collector.
  27. Simple Programs Can be Done on Paper Because we are

    working with really simple examples and lisp, it’s really easy for one to figure how the heap mutates over time. We can write down a heap on some paper and step through the evaluation of our functions to see how data is getting stored onto the heap. This can help explain the visualization and make sense of how to implement our own collectors. Here is what I did to step through a really simple sum list program using an example collector.
  28. ‘(3) Here is an example of how our cons cell

    behaves in our garbage collectors heap. There is a cons cell at memory address 10, which contains two values. Because we are working with a cons cell we aren’t pointing to a value, but a reference to some other data such as flat values or other cons cells. Our references are to the head of this cell at address 6 and the tail at address 8. By resolving this we end up with a list of one element.
  29. Managing our Memory The best way to get started with

    the collection tools in Racket is to simply grab the example collector and work from there. This gives us a baseline to start from, the interface was implemented and we are free to tweak it to make it better. The example collector has a lot of issues with it, the main one being that it never frees up memory. It’s great because it gives us an immediate goal, to make our programs live at least a little bit longer by cleaning up all the trash they create.
  30. Mark and Sweep is triggered when we try to obtain

    an amount of memory and the response from our allocator is a nil reference. This means that no memory could be obtained and we are out of free slots. We use this information to trigger our mark_and_sweep. After we’ve finished our cleanup of memory we try to allocate the amount of memory again. If it fails this time, we are really out of memory and throw an exception. Otherwise all is well and we return the reference.
  31. Mark and Sweep Unreferenced Cells (Garbage) Live Cells with Unchecked

    References Checked Live Cells Mark and Sweep is one of the oldest (if not the oldest) technique for crawling through our heap to figure out which pieces of data are garbage and which ones aren’t. The first step is finding out what isn’t garbage, we can approach this problem by using what is known as the “Tricolour Abstraction”. That is, we can think of our heap objects in one of three states: - Possibly Dead objects who have yet to be traversed. Cells still in this state after the mark phase is completed means they aren’t referenced and can be freed. (Red Rectangles) - Live Objects whose descendants have yet to be traversed. (Purple Diamonds) - Live Objects whose descendants have been traversed. (Green Circles) On the left is our stack. This is important so we can start our traversal through the heap.
  32. Mark and Sweep When we have determined that a mark

    phase needs to happen, we start off by assuming all of our objects are unreferenced.
  33. Mark and Sweep The stack contains pointers to our roots.

    So we go through our stack, look at what those stack values point to and mark them as items we need to look at. We then go through the objects that we need to check, adding anything they reference to the list of items we need to check.
  34. Mark and Sweep Once we’ve added all the things a

    heap object references we mark that object as in live. We know that this data is still being referenced so it won’t be getting swept away.
  35. Mark and Sweep Now that we are done marking, we

    look at all the items that haven’t been marked and free up that data. This is the “Sweeping” of Mark and Sweep.
  36. Demonstration /Applications/Racket\ v6.0.1/DrRacket.app/Contents/MacOS/DrRacket my_simple_collector.rkt newtons_method_iter.rkt & Here I’m going to

    do a demonstration of a program that induced garbage collection and we’ll step through the evolution of our heap during the execution of the program.
  37. Implementation Concerns While we might have built a garbage collector

    there’s still plenty of edge cases that can come up if we run certain kinds of programs. What happens if we have a program that does some weird allocations that results in lots of fragmented memory. We could have plenty of space available, but because there isn’t a slot large enough we have effectively run out of memory. We could add memory compacting to our Garbage Collector which would squish all the currently used memory together, and as a result giving us those chunks of lost memory back.
  38. The GC World is Your Oyster This just covers one

    of the kinds of garbage collectors you can build. There’s plenty of other garbage collection strategies out there that you can implement in plai/collector. If you’re interested in digging further into Garbage Collectors, they’ll be covering the topic of Garbage Collection at the Comp Sci cabal throughout Februrary. The first one is this upcoming Friday.
  39. Resources • Uniprocessor Garbage Collection Techniques Paul R. Wilson •

    The Garbage Collection Handbook R. Jones, A. Hosking, E. Moss ISBN: 978-1420082791 •To Know A Garbage Collector: GoRuCo 2013 Michael Berstein
  40. Credits • Berlin Wall // Getty Images • Pez Dispenser

    // Stéfan https://www.flickr.com/photos/st3f4n/ • Scrap Heap // Sean Ganann https://www.flickr.com/photos/essgee/ • Mag Core Memory // Dennis van Zuijlekom https://www.flickr.com/photos/dvanzuijlekom/ • Shell Fragments in the Sand // reader walker https://www.flickr.com/photos/readerwalker/ • Tree // sub flux https://www.flickr.com/photos/subflux/ • Assembly Line // Wired https://www.flickr.com/photos/ wiredphotostream/ • Oysters // Min Lee https://www.flickr.com/photos/mlee/