Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Zekun Li - There and Back Again: Disable and re-enable garbage collector at Instagram

Zekun Li - There and Back Again: Disable and re-enable garbage collector at Instagram

Python's cyclic garbage collector wonderfully hides the complexity of memory management from the programmer. But we pay the price in performance. Ever wondered how that works? In this talk, you'll learn how garbage collection is designed in Python, what the tradeoffs are and how Instagram battled copy-on-write memory issues by disabling the garbage collector entirely.

You'll also learn why that isn't such a great idea after all and how we ended up extending the garbage collector API which allowed us to (mostly) re-enable garbage collection. We'll discuss our upstream contributions to the garbage collector that landed in Python 3.6 and 3.7.

This is an in-depth talk about memory management but no prior experience with CPython internals is necessary to follow it.

https://us.pycon.org/2018/schedule/presentation/72/

PyCon 2018

May 11, 2018
Tweet

More Decks by PyCon 2018

Other Decks in Programming

Transcript

  1. AGENDA 1 Background 2 Why and How Instagram disable GC

    3 Why and How Instagram re-enable GC, mostly
  2. MEMORY MANAGEMENT IN PYTHON 5 foo = [] bar =

    [] __main__ bar foo del foo del bar
  3. MEMORY MANAGEMENT IN PYTHON 6 foo = [] bar =

    [] foo.append(bar) bar.append(foo) __main__ bar foo del foo del bar
  4. OBSERVATION 11 500 MB 350 MB 150 MB 150 MB

    150 MB 150 MB 150 MB 150 MB
  5. 12

  6. WHY GARBAGE COLLECTION? 15 /* GC information is stored BEFORE

    the object structure. */ typedef union _gc_head { struct { union _gc_head *gc_next; union _gc_head *gc_prev; Py_ssize_t gc_refs; } gc; double dummy; /* force worst-case alignment */ } PyGC_Head;
  7. REDESIGN GC HEAD 26 /* GC information is stored BEFORE

    the object structure. */ typedef union _gc_head { struct { union _gc_head *gc_next; union _gc_head *gc_prev; Py_ssize_t gc_refs; } gc; double dummy; /* force worst-case alignment */ } PyGC_Head;
  8. REDESIGN GC HEAD 27 /* GC information is stored BEFORE

    the object structure. */ typedef union _gc_head { struct { union _gc_head *head; } gc_ptr; double dummy; /* force worst-case alignment */ } PyGC_Head_Ptr;
  9. Memory Page Memory Page Memory Page Memory Page Memory Page

    REDESIGN GC HEAD 28 PyGC_HEAD PyObject PyGC_HEAD_Ptr
  10. WORKING PROOF 29 lists = [] strs = [] for

    i in range(16000): lists.append([]) for j in range(40): strs.append(' ' * 8) ~60MB ~0.9MB
  11. POINTERS TAKE SPACE 31 Memory Page Memory Page 16 BYTES

    1,000,000 OBJECTS 80 PROCESSES ~1 GB
  12. 33 FREEZE OBJECTS static PyObject * gc_freeze(PyObject *module) { for

    (int i = 0; i < NUM_GENERATIONS; ++i) { gc_list_merge(GEN_HEAD(i), &permanent_generation.head); generations[i].count = 0; } Py_RETURN_NONE; } gc.freeze() # upstream to python 3.7!
  13. WRAP UP 1 Disable garbage collection to avoid copy-on-write 2

    Re-enable garbage collection by freezing objects 3 No gc before fork to avoid copy-on-write by re-allocation