Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Garbage Collection in Python by Benjamin Peterson

PyCon 2014
April 12, 2014
1.3k

Garbage Collection in Python by Benjamin Peterson

PyCon 2014

April 12, 2014
Tweet

Transcript

  1. Outline ➔ How GC works in various Python implementations ◆

    optimizations ➔ GC semantics subtleties
  2. What is GC in Python? ➔ Unused objects are finalized

    and deallocated. ➔ When? ◆ “eventually”... or never! ◆ not running out of memory is good
  3. CPython: reference counting ➔ Every object has a count of

    how many other objects want to keep it alive. ➔ New objects have ref count 1. ➔ When ref count is 0, the object can be deleted.
  4. CPython’s cyclic GC ➔ Detects cycles unreachable from the program

    and deletes them ➔ Runs every once and while on allocation ➔ on CPython
  5. PyPy review ➔ Interpreter written in RPython ➔ RPython translated

    to low level language (C) ➔ Interpreter is abstracted from low-level details like GC
  6. PyPy GC ➔ GC is simply another low-level transform during

    translation. ➔ GC algorithm itself is written in RPython. ➔ GC implementation can be selected at translation time. ➔ Current default GC: “minmark”
  7. Mark and Sweep ➔ Step 1: Starting from known live

    objects, recursively traverse objects, marking them as reachable. ➔ Step 2: Walk all allocated objects, deleting that ones that aren’t marked as alive. ➔ No need to worry about reference cycles.
  8. The nursery ➔ Store newly allocated objects in a “nursery”

    ➔ Collect the “nursery” often and move surviving objects elsewhere (minor collection) ➔ Garbage collect old objects less frequently (major collection)
  9. GC Pauses ➔ When the GC is running, the program

    is not. ➔ Inconvenient for many long running programs like servers. ➔ A deal-breaker for real-time applications like video processing.
  10. PyPy incremental GC ➔ In PyPy 2.2 ➔ Major collection

    split into multiple passes, each lasting only a few milliseconds. ➔ “All in all it was relatively painless work.”
  11. What to do about cycles with ? ➔ CPython <

    3.4: give up ➔ CPython >= 3.4: PEP 442 ➔ PyPy: sort finalizers into a “reasonable” order and run them
  12. PEP 442 - “Safe Object Finalization” ➔ Step 1: Run

    finalizers on unreachable cycles (arbitrary order). Resurrect any cycles that become reachable again. ➔ Step 2: Break references in remaining cycles.
  13. Queries? ([email protected]) ➔ PyPy Blog (http://morepypy.blogspot.com) ➔ GC module documentation

    ➔ Wikipedia article on garbage collection ➔ The source ◆ ◆