thanking everyone for coming out. I’m going to be covering the paper “Teaching Garbage Collection without Implementing Compilers or Interpreters” by Cooper et al
Demonstration Reflection and Closing Remarks - Before we can dig right into the paper we’ll need to cover a couple of fundamentals - I’ll then cover what the broad idea of Garbage Collection is - We’ll then step into the subjects that Cooper and his team found to be obstacles for building Collectors - Afterwards I’ll cover some of the ways that the tools they built aid in the learning process, this will include a demonstration of a very simple mark and sweep garbage collector that I built. - We’ll finish off with some remarks about the collector and how in some situations it might fall short and what we could do to improve it
clarify what memory is. In a computer the memory is where the data we are working with lives. I am going to be working with a simplified model of memory to make explaining and hopefully understanding a bit easier. There are two kinds of ways we can access memory
what do we mean? When our programs run we create things, these could be objects, lists, arrays, you name it. Typically we create it and hold onto it for the duration of a computation. When we are done with out computation, we return the result stop looking at those things we created. Some languages make you responsible for ensuring that the data is destroyed when you are done with it. Though some languages such as Ruby and Lisp take care of these details for us. Until the language has done something about this lost data, it will be somewhere in our program, we just can’t find it.
Pez dispenser. Whenever we want to add data to it we take our little data pez’s and push them onto the stack. The stack is memory that is directly allocated. If we are working with known amounts of data we can easily place this data on the stack. When we are done with our data we remove them from the stack by popping them off. The stack is super important for garbage collection because the stack holds onto very important pieces of information. Without the stack we wouldn’t be able to find garbage.
know how big something will be at compile time? This is where the heap comes in. The heap is kind of like a whiteboard, when we need store something whose size we don’t know at compile time we try to grab some space from the heap and take note of where it is. Initially the whiteboard starts off completely blank, but over time it becomes more and more cluttered as we allocate things onto it. It’s the garbage job of the garbage collect to go through the heap and remove unused data.
of what could be going on. The top is our heap and the bottom area is our stack. Our stack grows from left to right, and the little arrow points to the end of our stack, which is called our Stack Pointer. Everything up to that stack pointer is considered live alive, these living things are our roots. In this heap we have some extra data in there that nobody is using. In some languages it is our responsibility to ensure that never happens. If it does it means we have a problem, also known as a memory leak!
this, which is called Garbage Collection. That is, we build a program that will take care of keeping note of what items are still important and which ones nobody cares about anymore. When there is data that nobody cares about then it is garbage and the program goes in and removes it. This clears up the heap so that we can use it again for something else.
to garbage collection. We’ll use an approach called Mark and Sweep in this example. First we we go through all of the objects on our heap and see what bits of data have a reference in the stack. If they do we mark them (signified by the checkmark).
Garbage Collection is and what it’s part is. Though, knowing something in theory and understanding it enough to implement it are an entirely different thing.
to have a strong enough understanding of parsers. This is required to build out a thing called a syntax tree. These are used to validate that the words written down are correct. If they aren’t the program cannot run.
of building a language. You also need to know how to turn that syntax tree into code that the computer can actually run. Your programming language could be interpreted or compiled, what’s important is this is the part that would be responsible for managing the data that goes onto the stack and the heap.
requires low-level concepts … and most real- world runtime systems … are not designed to enable easy modification” We could take a look into a garbage collector like the one in the Java Virtual Machine or perhaps Ruby, but most systems aren’t built to make it easy to change these. So students are stuck with very few friendly options.
get in the way of their teaching. They’ve come up with solutions that do help their students implement a garbage collector, though they aren’t always perfect. Often the tools that exist to cover this subject are quite limited in scope and lack the completeness that would allow students to uncover flaws in their garbage collector implementations.
a tool that they’ve built that removes many of the obstacles and even provides better tools for inspecting the heap. They’ve done this by building two systems: plai/collector and plai/mutator
and builds a sum of all the items in the list. This program is our mutator because the operations such as declaring values such 1, 2, 3 and the list that contains those numbers all result in mutations to our heap.
before our program qualifies as a collector. The following methods are what are required to meet the plai/collector interface. Our collector program can do whatever it wants, but unless it implements these functions it doesn’t qualify as a plai/collector. The collector also needs to work on two different types of data, cons cells and flat values.
tail, cdr) (a refresher) A cons cell is the basic data structure of Lisps. They consist of two pointers, one for the value and another which points to another cons cell or nil. Using this data structure we can represent lists of arbitrary size.
data we can have are the flat values, these encompass things such as symbols, numbers, procedures and so on. Our cons cells are lists of references to data, these are the other kind of data that they can point to.
While such activity builds character, it may not fit the constraints of some teaching schedules” Heap Visualization More traditional Garbage Collector approach it requires working with raw heap memory. Often this memory is pretty hard to read and requires dumping the heaps contents.
working with really simple examples and lisp, it’s really easy for one to figure how the heap mutates over time. We can write down a heap on some paper and step through the evaluation of our functions to see how data is getting stored onto the heap. This can help explain the visualization and make sense of how to implement our own collectors. Here is what I did to step through a really simple sum list program using an example collector.
behaves in our garbage collectors heap. There is a cons cell at memory address 10, which contains two values. Because we are working with a cons cell we aren’t pointing to a value, but a reference to some other data such as flat values or other cons cells. Our references are to the head of this cell at address 6 and the tail at address 8. By resolving this we end up with a list of one element.
the collection tools in Racket is to simply grab the example collector and work from there. This gives us a baseline to start from, the interface was implemented and we are free to tweak it to make it better. The example collector has a lot of issues with it, the main one being that it never frees up memory. It’s great because it gives us an immediate goal, to make our programs live at least a little bit longer by cleaning up all the trash they create.
an amount of memory and the response from our allocator is a nil reference. This means that no memory could be obtained and we are out of free slots. We use this information to trigger our mark_and_sweep. After we’ve finished our cleanup of memory we try to allocate the amount of memory again. If it fails this time, we are really out of memory and throw an exception. Otherwise all is well and we return the reference.
References Checked Live Cells Mark and Sweep is one of the oldest (if not the oldest) technique for crawling through our heap to figure out which pieces of data are garbage and which ones aren’t. The first step is finding out what isn’t garbage, we can approach this problem by using what is known as the “Tricolour Abstraction”. That is, we can think of our heap objects in one of three states: - Possibly Dead objects who have yet to be traversed. Cells still in this state after the mark phase is completed means they aren’t referenced and can be freed. (Red Rectangles) - Live Objects whose descendants have yet to be traversed. (Purple Diamonds) - Live Objects whose descendants have been traversed. (Green Circles) On the left is our stack. This is important so we can start our traversal through the heap.
So we go through our stack, look at what those stack values point to and mark them as items we need to look at. We then go through the objects that we need to check, adding anything they reference to the list of items we need to check.
there’s still plenty of edge cases that can come up if we run certain kinds of programs. What happens if we have a program that does some weird allocations that results in lots of fragmented memory. We could have plenty of space available, but because there isn’t a slot large enough we have effectively run out of memory. We could add memory compacting to our Garbage Collector which would squish all the currently used memory together, and as a result giving us those chunks of lost memory back.
of the kinds of garbage collectors you can build. There’s plenty of other garbage collection strategies out there that you can implement in plai/collector. If you’re interested in digging further into Garbage Collectors, they’ll be covering the topic of Garbage Collection at the Comp Sci cabal throughout Februrary. The first one is this upcoming Friday.
// Stéfan https://www.flickr.com/photos/st3f4n/ • Scrap Heap // Sean Ganann https://www.flickr.com/photos/essgee/ • Mag Core Memory // Dennis van Zuijlekom https://www.flickr.com/photos/dvanzuijlekom/ • Shell Fragments in the Sand // reader walker https://www.flickr.com/photos/readerwalker/ • Tree // sub flux https://www.flickr.com/photos/subflux/ • Assembly Line // Wired https://www.flickr.com/photos/ wiredphotostream/ • Oysters // Min Lee https://www.flickr.com/photos/mlee/