Slide 1

Slide 1 text

Big Code Algorithms: A more practical approach Richard Wong Saturday, August 31, 13

Slide 2

Slide 2 text

GC: move you sight to real world. Saturday, August 31, 13

Slide 3

Slide 3 text

simple compiler tricks Saturday, August 31, 13

Slide 4

Slide 4 text

Code can be reordered. int doMath(int x, int y, int z) { int a = x + y; int b = x - y; int c = z + x; return a + b; } int doMath(int x, int y, int z) { int c = z + x; int b = x - y; int a = x + y; return a + b; } Saturday, August 31, 13

Slide 5

Slide 5 text

Dead code can be removed int doMath(int x, int y, int z) { int a = x + y; int b = x - y; int c = z + x; return a + b; } int doMath(int x, int y, int z) { int a = x + y; int b = x - y; return a + b; } Saturday, August 31, 13

Slide 6

Slide 6 text

int doMath(int x, int y, int z) { int a = x + y; int b = x - y; int c = z + x; return a + b; } int doMath(int x, int y, int z) { return x + y + x - y; } Values can be propagated Saturday, August 31, 13

Slide 7

Slide 7 text

int doMath(int x, int y, int z) { int a = x + y; int b = x - y; int c = z + x; return a + b; } int doMath(int x, int y, int z) { return x + x; } Math can be simplified Saturday, August 31, 13

Slide 8

Slide 8 text

largestValueLog = Math.log(largestValueWithSingleUnitResolution); magnitude = (int) Math.ceil(largestValueLog/ Math.log(2.0)); subBucketMagnitude = (magnitude > 1) ? magnitude : 1; subBucketCount = (int) Math.pow(2, subBucketMagnitude); subBucketMask = subBucketCount - 1; Hard enough to follow as it is No value in “optimizing” human-readable meaning away Compiled code will end up the same anyway. So Why does this matter Saturday, August 31, 13

Slide 9

Slide 9 text

int distanceRatio(Object a) { int distanceTo = a.getX() - start; int distanceAfter = end - a.getX(); return distanceTo/distanceAfter; } int distanceRatio(Object a) { int x = a.getX(); int distanceTo = x - start; int distanceAfter = end - x; return distanceTo/ distanceAfter; } Reads can be cached Saturday, August 31, 13

Slide 10

Slide 10 text

void loopUntilFlagSet(Object a) { while (!a.flagIsSet()) { loopcount++; } } void loopUntilFlagSet(Object a) { boolean flagIsSet = a.flagIsSet(); while (!flagIsSet) { loopcount++; } } Reads can be cached Saturday, August 31, 13

Slide 11

Slide 11 text

Intermediate values might never be visible void updateDistance(Object a) { int distance = 100; a.setX(distance); a.setX(distance * 2); a.setX(distance * 3); } void updateDistance(Object a) { a.setX(300); } Writes can be eliminated Saturday, August 31, 13

Slide 12

Slide 12 text

Intermediate values might never be visible void updateDistance(Object a) { a. setVisibleValue(0); for (int i = 0; i < 1000000; i++) { a.setInternalValue(i); } a.setVisibleValue(a.getInternalValue()); } void updateDistance(Object a) { a.setInternalValue(1000000); a.setVisibleValue(1000000); } Writes can be eliminated++ Saturday, August 31, 13

Slide 13

Slide 13 text

public class Thing { private int x; public final int getX() { return x }; } ... myX = thing.getX(); Class Thing { int x; } ... myX = thing.x; Inlining Saturday, August 31, 13

Slide 14

Slide 14 text

Adaptive compilation make cleaner code practical Reduces need to trade off clean design against speed E.g. “final” should be used on methods only when you want to prohibit extension, overriding. Has no effect on speed. E.g. branching can be written “naturally” Saturday, August 31, 13

Slide 15

Slide 15 text

Why should you care about GC? Saturday, August 31, 13

Slide 16

Slide 16 text

Why should you care about GC? A good architect must, first and foremost, be able to impose their architectural choices on the project. Find the root cause. Saturday, August 31, 13

Slide 17

Slide 17 text

Trying to solve GC problems in application architecture is like throwing knives It takes practice and understanding to get it right You can get very good at it, but do you really want to? Will all the code you leverage be as good as yours? Saturday, August 31, 13

Slide 18

Slide 18 text

Most of what People seem to "know" about Garbage Collection is wrong In many cases, it’s much better than you may think GC is extremely efficient. Much more so that malloc() Dead objects cost nothing to collect GC will find all the dead objects (including cyclic graphs) ... In many cases, it’s much worse than you may think Yes, it really does stop for ~1 sec per live GB (in most JVMs). No, GC does not mean you can’t have memory leaks No, those pauses you eliminated from your 20 minute test are not gone ... Saturday, August 31, 13

Slide 19

Slide 19 text

A Basic Terminology example: What is a concurrent collector? A Concurrent Collector performs garbage collection work concurrently with the application’s own execution A Parallel Collector uses multiple CPUs to perform garbage collection Saturday, August 31, 13

Slide 20

Slide 20 text

Concurrency over parallelism language trends Saturday, August 31, 13

Slide 21

Slide 21 text

Terminology Garbage Collection/Garbage Collector/ collector. Finalization/Finalizer resurrection heap ... Saturday, August 31, 13

Slide 22

Slide 22 text

Garbage Objects that are not live, but are not free either, are called garbage. With explicit deallocation, garbage cannot be reused: its space has leaked away. Saturday, August 31, 13

Slide 23

Slide 23 text

Garbage Collection This Garbage collection (GC) is a form of automatic memory management. The garbage collector is the program attempts to reclaim garbage, or memory occupied by objects that are no longer in use by the program. Saturday, August 31, 13

Slide 24

Slide 24 text

automatic memory management Saturday, August 31, 13

Slide 25

Slide 25 text

Finalizer/Finalization A finalizer is a special method that is executed when an object is garbage collected. It is similar in function to a destructor. Finalizers are usually not deterministic. a finalizer is executed when the internal garbage collection system frees the object. Saturday, August 31, 13

Slide 26

Slide 26 text

Resurrection Resurrection occurs when an object's finalizer causes the object to become reachable (that is, not garbage). The garbage collector must determine if the object has been resurrected by the finalizer or risk creating a dangling reference. The JVM will not invoke finalize() method again after resurrection Saturday, August 31, 13

Slide 27

Slide 27 text

Saturday, August 31, 13

Slide 28

Slide 28 text

Heap(Data structure) In computer science, a heap is a specialized tree-based data structure that satisfies the heap property: If A is a parent node of B then key(A) is ordered with respect to key(B) with the same ordering applying across the heap Saturday, August 31, 13

Slide 29

Slide 29 text

Heap(Memory Block) In Dynamic memory allocation, Heap is Memory requests are satisfied by allocating portions from a large pool of memory. Saturday, August 31, 13

Slide 30

Slide 30 text

Reference & Value Type Saturday, August 31, 13

Slide 31

Slide 31 text

Reference Type Reference type is a Data type that can be only be accessed by references. Objects of reference types cannot be directly embedded into composite objects and are always dynamically allocated. They are usually destroyed automatically after they become unreachable. Saturday, August 31, 13

Slide 32

Slide 32 text

Value Type Types of values or Types of objects with deep copy semantics. Value type that use the term value type to refer to the types of objects for which assignment has deep copy semantics (as opposed to reference types, which have shallow copy semantics) Saturday, August 31, 13

Slide 33

Slide 33 text

Immutable Objects A reference type variable to an immutable object behaves with the same semantics as a value type variable. Saturday, August 31, 13

Slide 34

Slide 34 text

Strong reference A strong reference is a normal reference that protects the referred object from collection by a garbage collector. The term is used to distinguish the reference from weak references. Saturday, August 31, 13

Slide 35

Slide 35 text

Weak reference a weak reference is a reference that does not protect the referenced object from collection by a garbage collector; unlike a strong reference. An object referenced only by weak references is considered unreachable (or weakly reachable) and so may be collected at any time. Some garbage-collected languages feature or support various levels of weak references, Saturday, August 31, 13

Slide 36

Slide 36 text

Weak references (references which are not counted in reference counting) may be used to solve the problem of circular references if the reference cycles are avoided by using weak references for some of the references within the group. For example, Apple's Cocoa framework recommends this approach, by using strong references for parent-to-child references, and weak references for child- to-parent references, thus avoiding cycles. Weak reference distill Saturday, August 31, 13

Slide 37

Slide 37 text

Weak reference distill In the case of C++, normal pointers are weak and smart pointers are strong; although pointers are not true weak references, as weak references are supposed to know when the object becomes unreachable. Saturday, August 31, 13

Slide 38

Slide 38 text

Weak ref in python ./cpython/Modules/gc_weakref.txt Saturday, August 31, 13

Slide 39

Slide 39 text

In languages Python and Ruby, all types are reference types, including those that appear as primitive types. On the Java platform, all composite and user- defined types are reference types. Only primitive types are value types. The .NET Framework makes a clear distinction between value and reference types, and allows creation of user-defined types for both kinds. Saturday, August 31, 13

Slide 40

Slide 40 text

Resource Acquisition Is Initialization(RAII) Saturday, August 31, 13

Slide 41

Slide 41 text

Dispose pattern The dispose pattern is a design pattern which is used to handle resource cleanup in runtime environments that use automatic garbage collection. The fundamental problem that the dispose pattern aims to solve is that, because objects in a garbage-collected environment have finalizers rather than destructors, there is no guarantee that an object will be destroyed at any deterministic point in time. The dispose pattern works around this by giving an object a method (usually called Dispose or similar) which frees any resources the object is holding onto. Saturday, August 31, 13

Slide 42

Slide 42 text

Code Code Code Saturday, August 31, 13

Slide 43

Slide 43 text

Expose pattern effects on programming langs One disadvantage of this approach is that it requires the programmer to explicitly add cleanup code in a finally block. This leads to code size bloat, and failure to do so will lead to resource leakage in the program. Saturday, August 31, 13

Slide 44

Slide 44 text

On C# Saturday, August 31, 13

Slide 45

Slide 45 text

On Python and PEP343 Saturday, August 31, 13

Slide 46

Slide 46 text

On Java The Java language introduced a new syntax called try-with-resources in Java version 7. Saturday, August 31, 13

Slide 47

Slide 47 text

Classifying a collector’s operation A Concurrent Collector performs garbage collection work concurrently with the application’s own execution A Parallel Collector uses multiple CPUs to perform garbage collection A Stop-the-World collector performs garbage collection while the application is completely stopped An Incremental collector performs a garbage collection operation or phase as a series of smaller discrete operations with (potentially long) gaps in between Mostly means sometimes it isn’t (usually means a different fall back mechanism exists) Saturday, August 31, 13

Slide 48

Slide 48 text

Precise vs. Conservative Collection A Collector is Conservative if it is unaware of some object references at collection time, or is unsure about whether a field is a reference or not A Collector is Precise if it can fully identify and process all object references at the time of collection A collector MUST be precise in order to move objects The COMPILERS need to produce a lot of information (oopmaps) All commercial server JVMs use precise collectors All commercial server JVMs use some form of a moving collector Saturday, August 31, 13

Slide 49

Slide 49 text

Memory Use How many of you use heap sizes of: more than 1/2 GB? more than 1 GB? more than 2 GB? more than 4 GB? more than 10 GB? more than 20 GB? more than 50 GB? Saturday, August 31, 13

Slide 50

Slide 50 text

GC pros and cons Saturday, August 31, 13

Slide 51

Slide 51 text

pros Dangling pointer bugs Double free bugs Certain kinds of memory leaks Efficient implementations of persistent data structures Saturday, August 31, 13

Slide 52

Slide 52 text

cons GC consumes computing resources. The moment when the garbage is actually collected can be unpredictable, resulting in stalls scattered throughout a session. that is STW. Non-deterministic GC is incompatible with RAII based management of non GCed resources. As a result, the need for explicit manual resource management (release/close) for non-GCed resources becomes transitive to composition. Garbage collection is rarely used on embedded or real- time systems because of the perceived need for very tight control over the use of limited resources. However, garbage collectors compatible with such limited environments have been developed. Saturday, August 31, 13

Slide 53

Slide 53 text

Main type of GC Reference counting. Tracing Saturday, August 31, 13

Slide 54

Slide 54 text

Reference counting It is also the method used by many operating systems to determine whether a file may be deleted from the file-store. Saturday, August 31, 13

Slide 55

Slide 55 text

Reference counting Management of active and garbage cells is interleaved with the execution of the user program. Reference counting may therefore be a suitable method if a smoother response time is important. reference count becomes zero can be reclaimed without access to cells in other pages of the heap. Disadvantage is the high processing cost paid to update counters to maintain the reference count invariant. Saturday, August 31, 13

Slide 56

Slide 56 text

Q:Cyclic data structure Many implementations of lazy functional languages based on graph reduction use cycles to handle recursion. Saturday, August 31, 13

Slide 57

Slide 57 text

Garbage Collector in Cpython Saturday, August 31, 13

Slide 58

Slide 58 text

Tradeoffs Since Python makes heavy use of malloc() and free(), it needs a strategy to avoid memory leaks as well as the use of freed memory. The chosen method is called reference counting. The principle is simple: every object contains a counter, which is incremented when a reference to the object is stored somewhere, and which is decremented when a reference to it is deleted. When the counter reaches zero, the last reference to the object has been deleted and the object is freed. Saturday, August 31, 13

Slide 59

Slide 59 text

Warm up for codes There are two macros, Py_INCREF(x) and Py_DECREF(x), Py_DECREF() also frees the object when the count reaches zero. For flexibility, it doesn't call free() directly -- rather, it makes a call through a function pointer in the object's type object. For this purpose (and others), every object also contains a pointer to its type object. http:/ /docs.python.org/release/2.5.2/ext/refcounts.html Saturday, August 31, 13

Slide 60

Slide 60 text

When to use Py_INCREF(x)/Py_DECREF(x)? Saturday, August 31, 13

Slide 61

Slide 61 text

When to use Py_INCREF(x)/Py_DECREF(x)? Nobody ``owns'' an object; however, you can own a reference to an object. An object's reference count is now defined as the number of owned references to it. The owner of a reference is responsible for calling Py_DECREF() when the reference is no longer needed. Ownership of a reference can be transferred. There are three ways to dispose of an owned reference: pass it on, store it, or call Py_DECREF(). Forgetting to dispose of an owned reference creates a memory leak. Saturday, August 31, 13

Slide 62

Slide 62 text

source ./cpython/Include/object.h:759: Saturday, August 31, 13

Slide 63

Slide 63 text

Tracing Saturday, August 31, 13

Slide 64

Slide 64 text

Mark-Sweep/scan Algorihm Mark-Sweep Algorithm relies on a global traversal of all live objects to determine which cells are available for reclamation. Saturday, August 31, 13

Slide 65

Slide 65 text

Mark-Sweep Algorihm Mark phase identifies all actives cells. Sweep phase returns garbage collection cells to the free pool. Saturday, August 31, 13

Slide 66

Slide 66 text

Mark (aka "Trace") Start from "roots" (thread stacks, statics, etc.) "Paint" anything you can reach as “live” At the end of a mark pass: all reachable objects will be marked "live" all non-reachable objects will be marked "dead" (aka "non-live"). Note: work is generally linear to "live set" Saturday, August 31, 13

Slide 67

Slide 67 text

demo mark Saturday, August 31, 13

Slide 68

Slide 68 text

Sweep Scan through the heap, identify "dead" objects and track them somehow (usually in some form of free list) Note: work is generally linear to heap size Saturday, August 31, 13

Slide 69

Slide 69 text

demo sweep Saturday, August 31, 13

Slide 70

Slide 70 text

Garbage Collector in Go Saturday, August 31, 13

Slide 71

Slide 71 text

Why do garbage collection? One of the biggest sources of bookkeeping in systems programs is memory management. We feel it's critical to eliminate that programmer overhead, and advances in garbage collection technology in the last few years give us confidence that we can implement it with low enough overhead and no significant latency. http:/ /golang.org/doc/faq#garbage_collection Saturday, August 31, 13

Slide 72

Slide 72 text

Source ./go/src/pkg/runtime/mgc0.c Saturday, August 31, 13

Slide 73

Slide 73 text

The copying algorithm Saturday, August 31, 13

Slide 74

Slide 74 text

Copy GC A copying collector moves all lives objects from a "from" space to a "to" space & reclaims "from" space At start of copy, all objects are in "from" space and all references point to "from" space. Start from "root" references, copy any reachable object to "to" space, correcting references as we go At end of copy, all objects are in "to" space, and all references point to "to" space Note: work generally linear to "live set". Saturday, August 31, 13

Slide 75

Slide 75 text

Mechanism Saturday, August 31, 13

Slide 76

Slide 76 text

pros and cons Allocation costs are extremely low: the out-of-space check is a simple pointer comparison; new memory is acquired simple by incrementing the free space pointer. Saturday, August 31, 13

Slide 77

Slide 77 text

Mark-compact algorithm two-finger algorithm Saturday, August 31, 13

Slide 78

Slide 78 text

Generational/ephemeral GC Generational Hypothesis: most objects die young Focus collection efforts on young generation: Use a moving collector: work is linear to the live set The live set in the young generation is a small % of the space Promote objects that live long enough to older generations Only collect older generations as they fill up “Generational filter” reduces rate of allocation into older generations Tends to be (order of magnitude) more efficient Great way to keep up with high allocation rate Practical necessity for keeping up with processor throughput Saturday, August 31, 13

Slide 79

Slide 79 text

Incremental Compaction Track cross-region remembered sets (which region points to which) To compact a single region, only need to scan regions that point into it to remap all potential references identify regions sets that fit in limited time Each such set of regions is a Stop-the-World increment Safe to run application between (but not within) increments Note: work can grow with the square of the heap size The number of regions pointing into a single region is generally linear to the heap size (the number of regions in the heap) Saturday, August 31, 13

Slide 80

Slide 80 text

Garbage Collector in v8 V8 compiles JavaScript to native machine code (IA-32, x86-64, ARM, or MIPS CPUs)[3][6] before executing it, instead of more traditional techniques such as executing bytecode or interpreting it. Optimization techniques used include inlining, elision of expensive runtime properties, and inline caching, among many others. The garbage collector of V8 is a generational incremental collector Saturday, August 31, 13

Slide 81

Slide 81 text

Source ./v8/src/mark-compact.cc Saturday, August 31, 13

Slide 82

Slide 82 text

Comparison Copy requires 2x the max. live set to be reliable Mark/Compact [typically] requires 2x the max. live set in order to fully recover garbage in each cycle Mark/Sweep/Compact only requires 1x (plus some) Copy and Mark/Compact are linear only to live set Mark/Sweep/Compact linear (in sweep) to heap size Mark/Sweep/(Compact) may be able to avoid some moving work Copying is [typically] "monolithic". Saturday, August 31, 13

Slide 83

Slide 83 text

Saturday, August 31, 13

Slide 84

Slide 84 text

Intuitive If we had exactly 1 byte of empty memory at all times, the collector would have to work “very hard”, and GC would take 100% of the CPU time If we had infinite empty memory, we would never have to collect, and GC would take 0% of the CPU time GC CPU % will follow a rough 1/x curve between these two limit points, dropping as the amount of memory increases. Saturday, August 31, 13

Slide 85

Slide 85 text

Empty memory needs (empty memory == CPU power) The amount of empty memory in the heap is the dominant factor controlling the amount of GC work For both Copy and Mark/Compact collectors, the amount of work per cycle is linear to live set The amount of memory recovered per cycle is equal to the amount of unused memory (heap size) - (live set) The collector has to perform a GC cycle when the empty memory runs out A Copy or Mark/Compact collector’s efficiency doubles with every doubling of the empty memory Saturday, August 31, 13

Slide 86

Slide 86 text

What empty memory controls Empty memory controls efficiency (amount of collector work needed per amount of application work performed) Empty memory controls the frequency of pauses (if the collector performs any Stop-the-world operations) Empty memory DOES NOT control pause times (only their frequency) In Mark/Sweep/Compact collectors that pause for sweeping, more empty memory means less frequent but LARGER pauses Saturday, August 31, 13

Slide 87

Slide 87 text

Garbage Collector in JVM Saturday, August 31, 13

Slide 88

Slide 88 text

G1GC(aka “Garbage first”) Monolithic Stop-the-world copying NewGen Mostly Concurrent, OldGen marker Mostly concurrent marking. Stop-the-world to catch up on mutations, ref processing, etc. Tracks inter-region relationships in remembered sets Stop-the-world mostly incremental compacting old gen Objective: “Avoid, as much as possible, having a Full GC...” Compact sets of regions that can be scanned in limited time Delay compaction of popular objects, popular regions Fallback to Full Collection ( Monolithic Stop the world). Used for compacting popular objects, popular regions, etc. Saturday, August 31, 13

Slide 89

Slide 89 text

Why clojure hanging for 3 secs when start nrepl? Saturday, August 31, 13

Slide 90

Slide 90 text

Why clojure hanging for 3 secs when start nrepl? I guess, Stop the world, and then flush... Saturday, August 31, 13

Slide 91

Slide 91 text

The real world is complex Saturday, August 31, 13

Slide 92

Slide 92 text

Thank you Saturday, August 31, 13

Slide 93

Slide 93 text

Speacial thanks Using Princeton/coursera slides as skeleton materials. Design patterns. Algorithms (4ed ver.) Gil Tene’s presentation in infoq. Garbage Collection Code Complete (2nd ver.) Saturday, August 31, 13

Slide 94

Slide 94 text

directed Graph Aux GC Saturday, August 31, 13