Slide 1

Slide 1 text

Garbage Collection Benjamin Peterson

Slide 2

Slide 2 text

Outline ➔ How GC works in various Python implementations ◆ optimizations ➔ GC semantics subtleties

Slide 3

Slide 3 text

Part 1 Implementation Basics

Slide 4

Slide 4 text

Bias & Disclaimer

Slide 5

Slide 5 text

What is GC in Python? ➔ Unused objects are finalized and deallocated. ➔ When? ◆ “eventually”... or never! ◆ not running out of memory is good

Slide 6

Slide 6 text

CPython

Slide 7

Slide 7 text

CPython: reference counting ➔ Every object has a count of how many other objects want to keep it alive. ➔ New objects have ref count 1. ➔ When ref count is 0, the object can be deleted.

Slide 8

Slide 8 text

New object Preexisting objects

Slide 9

Slide 9 text

Preexisting objects New object

Slide 10

Slide 10 text

Preexisting objects Dead object

Slide 11

Slide 11 text

Preexisting objects Dead object Dead object

Slide 12

Slide 12 text

refcounting example

Slide 13

Slide 13 text

Reference Counting’s Major Flaw Reference Cycles

Slide 14

Slide 14 text

Preexisting objects

Slide 15

Slide 15 text

Preexisting objects Cycle keeps itself alive!

Slide 16

Slide 16 text

CPython’s cyclic GC ➔ Detects cycles unreachable from the program and deletes them ➔ Runs every once and while on allocation ➔ on CPython

Slide 17

Slide 17 text

Preexisting objects Cyclic GC subtracts internal references

Slide 18

Slide 18 text

Preexisting objects Cycles are now deleted

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

PyPy review ➔ Interpreter written in RPython ➔ RPython translated to low level language (C) ➔ Interpreter is abstracted from low-level details like GC

Slide 21

Slide 21 text

PyPy has pluggable GCs

Slide 22

Slide 22 text

PyPy GC ➔ GC is simply another low-level transform during translation. ➔ GC algorithm itself is written in RPython. ➔ GC implementation can be selected at translation time. ➔ Current default GC: “minmark”

Slide 23

Slide 23 text

Mark and Sweep ➔ Step 1: Starting from known live objects, recursively traverse objects, marking them as reachable. ➔ Step 2: Walk all allocated objects, deleting that ones that aren’t marked as alive. ➔ No need to worry about reference cycles.

Slide 24

Slide 24 text

Preexisting objects Unreachable object Marking

Slide 25

Slide 25 text

Preexisting objects Unreachable object Marking GC traverses references to C.

Slide 26

Slide 26 text

Preexisting objects Unreachable object Sweeping GC notices that D is unreachable and deletes it.

Slide 27

Slide 27 text

Fancy PyPy GC Optimizations

Slide 28

Slide 28 text

Python allocates constantly

Slide 29

Slide 29 text

“High Infant Mortality”

Slide 30

Slide 30 text

The nursery ➔ Store newly allocated objects in a “nursery” ➔ Collect the “nursery” often and move surviving objects elsewhere (minor collection) ➔ Garbage collect old objects less frequently (major collection)

Slide 31

Slide 31 text

GC Pauses ➔ When the GC is running, the program is not. ➔ Inconvenient for many long running programs like servers. ➔ A deal-breaker for real-time applications like video processing.

Slide 32

Slide 32 text

PyPy incremental GC ➔ In PyPy 2.2 ➔ Major collection split into multiple passes, each lasting only a few milliseconds. ➔ “All in all it was relatively painless work.”

Slide 33

Slide 33 text

GC in PyPy summary ➔ Pluggable ➔ Generational ➔ Incremental ➔ Integrated with the JIT

Slide 34

Slide 34 text

Part 2 GC Semantics

Slide 35

Slide 35 text

kills kittens

Slide 36

Slide 36 text

Cycles with finalizers: a conundrum Which finalizer to run first? Reference to the alive world.

Slide 37

Slide 37 text

What to do about cycles with ? ➔ CPython < 3.4: give up ➔ CPython >= 3.4: PEP 442 ➔ PyPy: sort finalizers into a “reasonable” order and run them

Slide 38

Slide 38 text

PEP 442 - “Safe Object Finalization” ➔ Step 1: Run finalizers on unreachable cycles (arbitrary order). Resurrect any cycles that become reachable again. ➔ Step 2: Break references in remaining cycles.

Slide 39

Slide 39 text

Themes ➔ Garbage Collection is hard. ➔ (PyPy’s) GCs are awesome!

Slide 40

Slide 40 text

Queries? (benjamin@python.org) ➔ PyPy Blog (http://morepypy.blogspot.com) ➔ GC module documentation ➔ Wikipedia article on garbage collection ➔ The source ◆ ◆