Garbage Collection in Python by Benjamin Peterson

D21717ea76044d31115c573d368e6ff4?s=47 PyCon 2014
April 12, 2014
920

Garbage Collection in Python by Benjamin Peterson

D21717ea76044d31115c573d368e6ff4?s=128

PyCon 2014

April 12, 2014
Tweet

Transcript

  1. Garbage Collection Benjamin Peterson

  2. Outline ➔ How GC works in various Python implementations ◆

    optimizations ➔ GC semantics subtleties
  3. Part 1 Implementation Basics

  4. Bias & Disclaimer

  5. What is GC in Python? ➔ Unused objects are finalized

    and deallocated. ➔ When? ◆ “eventually”... or never! ◆ not running out of memory is good
  6. CPython

  7. CPython: reference counting ➔ Every object has a count of

    how many other objects want to keep it alive. ➔ New objects have ref count 1. ➔ When ref count is 0, the object can be deleted.
  8. New object Preexisting objects

  9. Preexisting objects New object

  10. Preexisting objects Dead object

  11. Preexisting objects Dead object Dead object

  12. refcounting example

  13. Reference Counting’s Major Flaw Reference Cycles

  14. Preexisting objects

  15. Preexisting objects Cycle keeps itself alive!

  16. CPython’s cyclic GC ➔ Detects cycles unreachable from the program

    and deletes them ➔ Runs every once and while on allocation ➔ on CPython
  17. Preexisting objects Cyclic GC subtracts internal references

  18. Preexisting objects Cycles are now deleted

  19. None
  20. PyPy review ➔ Interpreter written in RPython ➔ RPython translated

    to low level language (C) ➔ Interpreter is abstracted from low-level details like GC
  21. PyPy has pluggable GCs

  22. PyPy GC ➔ GC is simply another low-level transform during

    translation. ➔ GC algorithm itself is written in RPython. ➔ GC implementation can be selected at translation time. ➔ Current default GC: “minmark”
  23. Mark and Sweep ➔ Step 1: Starting from known live

    objects, recursively traverse objects, marking them as reachable. ➔ Step 2: Walk all allocated objects, deleting that ones that aren’t marked as alive. ➔ No need to worry about reference cycles.
  24. Preexisting objects Unreachable object Marking

  25. Preexisting objects Unreachable object Marking GC traverses references to C.

  26. Preexisting objects Unreachable object Sweeping GC notices that D is

    unreachable and deletes it.
  27. Fancy PyPy GC Optimizations

  28. Python allocates constantly

  29. “High Infant Mortality”

  30. The nursery ➔ Store newly allocated objects in a “nursery”

    ➔ Collect the “nursery” often and move surviving objects elsewhere (minor collection) ➔ Garbage collect old objects less frequently (major collection)
  31. GC Pauses ➔ When the GC is running, the program

    is not. ➔ Inconvenient for many long running programs like servers. ➔ A deal-breaker for real-time applications like video processing.
  32. PyPy incremental GC ➔ In PyPy 2.2 ➔ Major collection

    split into multiple passes, each lasting only a few milliseconds. ➔ “All in all it was relatively painless work.”
  33. GC in PyPy summary ➔ Pluggable ➔ Generational ➔ Incremental

    ➔ Integrated with the JIT
  34. Part 2 GC Semantics

  35. kills kittens

  36. Cycles with finalizers: a conundrum Which finalizer to run first?

    Reference to the alive world.
  37. What to do about cycles with ? ➔ CPython <

    3.4: give up ➔ CPython >= 3.4: PEP 442 ➔ PyPy: sort finalizers into a “reasonable” order and run them
  38. PEP 442 - “Safe Object Finalization” ➔ Step 1: Run

    finalizers on unreachable cycles (arbitrary order). Resurrect any cycles that become reachable again. ➔ Step 2: Break references in remaining cycles.
  39. Themes ➔ Garbage Collection is hard. ➔ (PyPy’s) GCs are

    awesome!
  40. Queries? (benjamin@python.org) ➔ PyPy Blog (http://morepypy.blogspot.com) ➔ GC module documentation

    ➔ Wikipedia article on garbage collection ➔ The source ◆ ◆