Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Vishal Kanaujia - Dissecting memory mysteries of Python

Vishal Kanaujia - Dissecting memory mysteries of Python

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Vishal Kanaujia, Chetan Giridhar:
Dissecting memory mysteries of Python
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
@ Kiwi PyCon 2013 - Saturday, 07 Sep 2013 - Track 2

**Audience level**



Memory leak has been perennial problem for Python applications. This causes application to behave erroneously with Memory error and very slow operation. What is wrong with the application? How to find out the cause and fix it?

This is the motivation for this talk.


Python is a dynamically typed language. Applications leave task of object memory management to Python VM. Python automatically manages memory using reference counting and garbage collection. But, Python memory manager may bloat the VM size, and sometimes it may consume complete main memory. It causes applications to deliver low performance and encounter unexpected memory errors.

This talk dissects the internals of CPython memory manager, its limitations and negative impact on application behavior. We demonstrate the problem of memory leaks by learning Python heap pattern, object graphs and memory profiling. Next, we suggest solutions to reduce memory footprints of applications, tools to diagnose and fix memory leaks and lesson learned as best development practices.



New Zealand Python User Group

September 07, 2013

More Decks by New Zealand Python User Group

Other Decks in Programming


  1. Dissecting memory mysteries of Python Vishal Kanaujia, Chetan Giridhar PyCon

    Kiwi, 2013
  2. Typical day for a Python Programmer O MemoryError exception O

    Python memory keeps bloating O No RAM extension possible O How to scale?
  3. Python Memory Management

  4. Python Memory Allocator O Objects/obmalloc.c Python Object Allocator Python Raw

    Memory Allocator General Purpose Allocator (malloc) Private Heap Integer String OS memory manager
  5. CPython Memory Manager: Object Model O Everything is an object

    O Object = {Identity, Value, Type} O Immutable v/s mutable O Primitive v/s Containers O Numbers, strings O Lists, dictionary, class instances
  6. Python Object Model

  7. Let’s dissect problems

  8. Problem 1: Interning O Optimization for speed O Small objects

    <= 256Bytes O intern(string) O performance bottleneck ?
  9. Example: Interning @profile def freelist(): l = range (999999) del

    l return $ python -m memory_profiler memProf.py Filename: memProf.py Line # Mem usage Increment Line Contents ================================================ 3 @profile 4 6.859 MB 0.000 MB def freelist(): 5 22.215 MB 15.355 MB l = range (999999) 6 18.398 MB -3.816 MB del l 7 18.398 MB 0.000 MB return
  10. Problem 2: Object Model Behavior a = 42 b =

    42 42 a b L1 = [a, b] L2 = [a, b] &a, &b L1 L2 &a, &b
  11. Ref count: immutable objects from sys import getrefcount def count():

    a = 1 # Object value b = 1 # a and b share reference to same object ID print "a=", id(a) # Object identity print "b=", id(b) # Constant 1 has ref count of +2 (a and b) print "getrefcount(1)=", getrefcount(1) # Constant 1 has ref count of +3 now (a,b and c) c = 1 print "getrefcount(1)=", getrefcount(1) # Decrement the object ref count del c print "getrefcount(1)=", getrefcount(1) $ python refcount.py a= 146243760 b= 146243760 getrefcount(1)= 369 getrefcount(1)= 370 getrefcount(1)= 369
  12. Ref count: mutable objects from sys import getrefcount a =

    1 b = 1 list1 = [a, b] list2 = [a, b] t1 = (a, b) t2 = (a, b) # Mutable objects like lists do not share object IDs in spite of similar references print "list1=", id(list1) print "list2=", id(list2) print "t1=", id(t1) print "t2=", id(t2) # Changing contained object’s values do not modify immutable container a = 3 b = 10 print "t1=", id(t1) print "t2=" ,id(t2) $ python ex2.py list1= 3077464876 list2= 3077464460 t1= 3077561260 t2= 3077560652 t1= 3077561260 t2= 3077560652
  13. Mutable v/s Immutable O Similar mutable objects do not share

    reference O Immutable objects are interned
  14. Problem 3: Python Data Size O Size of objects differ

    from C data size O Accurate flat size of objects is sys.getsizeof() O What if application creates too many objects? typedef struct { PyObject_VAR_HEAD long ob_shash; int ob_sstate; char ob_sval[1]; } PyStringObject; // ./Include/object.h
  15. Python Data Size from sys import getsizeof print getsizeof("Pycon 2013")

    print getsizeof(2**64) print getsizeof(123) print getsizeof(3.143456) print getsizeof(None) print getsizeof("") # Container sizes print getsizeof([]) print getsizeof([1, 2, 3]) print getsizeof({}) print getsizeof({1 : "value", 2: “values"}) 31 22 12 16 8 21 32 44 136 136
  16. Python Objects: Observations O Mutable Objects O Similar mutable objects

    do not share reference O Expensive object size (list, dictionary) O Be wise with mutable objects O Immutable objects O interned
  17. Problem 4: Garbage Collector

  18. Garbage Collector O Ref counting and Ref cycles O Does

    not track simple objects like numbers or strings O Collects objects in three generations O Enabled by default for version 2.0 onwards O gc module
  19. Garbage Collector: Problems O Reference cycles O Only container objects

    are capable O finalizer method O __del__() O GC has no idea of deletion order O Causes cycles to remain uncollected!
  20. Example: ref cycle @profile def cycle(): l = range (10

    ** 6) l.append(l) del l if __name__ == '__main__': cycle() $ python -m memory_profiler refCycle.py Filename: refCycle.py Line # Mem usage Increment Line Contents ================================================ 1 @profile 2 6.691 MB 0.000 MB def cycle(): 3 22.031 MB 15.340 MB l = range (10 ** 6) 4 22.031 MB 0.000 MB l.append(l) 5 22.031 MB 0.000 MB del l
  21. Problem 5: Memory leaks in extensions O Python VM is

    independent memory manager O Has no knowledge of memory usage by C/C++ code O Use valgrind: massif on extension modules
  22. Problems: Summary O Interning O Python Object model O Object

    mutability O Object size O Garbage collection O Memory leaks in extension modules (C/C++)
  23. Developer Tools O mem_profiler O Objgraph O Heapy

  24. memory_profiler O Line by line memory consumption O Uses “ps”

    O Easy to use O Useful to learn VM allocation pattern
  25. memory_profiler @profile def cycle(): l = range (999999) l.append(l) del

    l print "end" if __name__ == '__main__': cycle() $ python -m memory_profiler refCycle.py end Filename: refCycle.py Line # Mem usage Increment Line Contents ================================================ 3 @profile 4 6.859 MB 0.000 MB def cycle(): 5 22.211 MB 15.352 MB l = range (999999) 6 22.211 MB 0.000 MB l.append(l) 7 22.211 MB 0.000 MB del l 8 22.215 MB 0.004 MB print "end" Memory leak!
  26. Objgraph O Object references O Forward and backward O Objects,

    tracked by garbage collector O Useful for: O Memory leaks O Reference cycles O Reference counting bugs
  27. Objgraph: Example import objgraph def cycle(): l = range (2)

    l.append(l) d = dict(key=l) objgraph.show_refs([d], filename='sample-graph.png') if __name__ == '__main__': cycle() Reference cycle
  28. Pythontutor import objgraph def cycle(): l = range (2) l.append(l)

    d = dict(key=l) if __name__ == '__main__': cycle()
  29. Lessons and Best Practices O Use 64-bit Python O How

    many objects and in what order O Avoid all at once; Load as required O Use xrange over range for iteration O force the garbage collector O del data O Use weak references
  30. Lessons and Best Practices O Avoid creating reference cycles O

    Break cycles explicitly! O Delete garbage objects list O del gc.garbage[:] O Try using __slots__
  31. References O Python gc module: http://docs.python.org/2/library/gc.html#gc.garbage O Python design FAQ:

    http://docs.python.org/2/faq/design.html O Design of Cpython compiler: http://docs.python.org/devguide/compiler.html O Python object size computation: http://code.activestate.com/recipes/546530/ O Garbage Collector: http://arctrix.com/nas/python/gc/ O GC Code: http://svn.python.org/view/python/trunk/Modules/gcmodule.c O Memory profiler: http://fa.bianp.net/blog/2012/line-by-line-report-of- memory-usage/ O Objgraph: http://mg.pov.lt/objgraph/ O Using Objgraph: http://www.darkcoding.net/software/finding-memory- leaks-in-python-with-objgraph/ O Weak references: http://docs.python.org/2/library/weakref.html
  32. Contact me O http://freethreads.wordpress.com O [email protected]

  33. Q &A

  34. Backup

  35. Agenda O What is the problem? O What causes it?

    O CPython memory manager O Garbage collection O Memory leaks O Surgical tools O Objgraph, memory_profiler O Lessons and advices
  36. slots O __slot__: Instance variables of Classes keep attribute in

    a per instance dictionary. O You may see references to a dictionary from an object O Wastes space for a fewer instance variables and high instance count O __slot__ allots exactly the requested space in a sequence of instance variables O http://docs.python.org/2/reference/datamodel. html#slots
  37. Mutability O By definition, immutable objects such as numbers, strings,

    tuples, and None, are safe from change. Changes to mutable objects such as dictionaries, lists, and class instances can lead to confusion. O Because of this feature, it is good programming practice to not use mutable objects as default values. Instead, use None as the default value and inside the function, check if the parameter is None and create a new list/dictionary/whatever if it is.
  38. Weak References O Incapable of overriding garbage collection O Useful

    to create cache/ references to large objects O It is good to have a reference, but it is okay not to have one  O Not all objects could be weakly referenced O File objects, generators, sets, sockets, Objects that include class instances O If referent is dead, you get None
  39. Weak References import weakref class obj: pass o1 = obj()

    r = weakref.ref(o1) o2 = r()