Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Garbage Collection in Python by Benjamin Peterson

PyCon 2014
April 12, 2014
1.4k

Garbage Collection in Python by Benjamin Peterson

PyCon 2014

April 12, 2014
Tweet

More Decks by PyCon 2014

Transcript

  1. Outline ➔ How GC works in various Python implementations ◆

    optimizations ➔ GC semantics subtleties
  2. What is GC in Python? ➔ Unused objects are finalized

    and deallocated. ➔ When? ◆ “eventually”... or never! ◆ not running out of memory is good
  3. CPython: reference counting ➔ Every object has a count of

    how many other objects want to keep it alive. ➔ New objects have ref count 1. ➔ When ref count is 0, the object can be deleted.
  4. CPython’s cyclic GC ➔ Detects cycles unreachable from the program

    and deletes them ➔ Runs every once and while on allocation ➔ on CPython
  5. PyPy review ➔ Interpreter written in RPython ➔ RPython translated

    to low level language (C) ➔ Interpreter is abstracted from low-level details like GC
  6. PyPy GC ➔ GC is simply another low-level transform during

    translation. ➔ GC algorithm itself is written in RPython. ➔ GC implementation can be selected at translation time. ➔ Current default GC: “minmark”
  7. Mark and Sweep ➔ Step 1: Starting from known live

    objects, recursively traverse objects, marking them as reachable. ➔ Step 2: Walk all allocated objects, deleting that ones that aren’t marked as alive. ➔ No need to worry about reference cycles.
  8. The nursery ➔ Store newly allocated objects in a “nursery”

    ➔ Collect the “nursery” often and move surviving objects elsewhere (minor collection) ➔ Garbage collect old objects less frequently (major collection)
  9. GC Pauses ➔ When the GC is running, the program

    is not. ➔ Inconvenient for many long running programs like servers. ➔ A deal-breaker for real-time applications like video processing.
  10. PyPy incremental GC ➔ In PyPy 2.2 ➔ Major collection

    split into multiple passes, each lasting only a few milliseconds. ➔ “All in all it was relatively painless work.”
  11. What to do about cycles with ? ➔ CPython <

    3.4: give up ➔ CPython >= 3.4: PEP 442 ➔ PyPy: sort finalizers into a “reasonable” order and run them
  12. PEP 442 - “Safe Object Finalization” ➔ Step 1: Run

    finalizers on unreachable cycles (arbitrary order). Resurrect any cycles that become reachable again. ➔ Step 2: Break references in remaining cycles.
  13. Queries? ([email protected]) ➔ PyPy Blog (http://morepypy.blogspot.com) ➔ GC module documentation

    ➔ Wikipedia article on garbage collection ➔ The source ◆ ◆