Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Dissecting memory mysteries of Python

Dissecting memory mysteries of Python

Memory leak has been perennial problem for Python applications. This causes application to behave erroneously with Memory error and very slow operation. What is wrong with the application? How to find out the cause and fix it? This is the motivation for this talk.

vishalkanaujia

September 13, 2013
Tweet

More Decks by vishalkanaujia

Other Decks in Programming

Transcript

  1. Typical day for a Python Programmer O MemoryError exception O

    Python memory keeps bloating O No RAM extension possible O How to scale?
  2. Python Memory Allocator O Objects/obmalloc.c Python Object Allocator Python Raw

    Memory Allocator General Purpose Allocator (malloc) Private Heap Integer String OS memory manager
  3. CPython Memory Manager: Object Model O Everything is an object

    O Object = {Identity, Value, Type} O Immutable v/s mutable O Primitive v/s Containers O Numbers, strings O Lists, dictionary, class instances
  4. Problem 1: Interning O Optimization for speed O Small objects

    <= 256Bytes O intern(string) O performance bottleneck ?
  5. Example: Interning @profile def freelist(): l = range (999999) del

    l return $ python -m memory_profiler memProf.py Filename: memProf.py Line # Mem usage Increment Line Contents ================================================ 3 @profile 4 6.859 MB 0.000 MB def freelist(): 5 22.215 MB 15.355 MB l = range (999999) 6 18.398 MB -3.816 MB del l 7 18.398 MB 0.000 MB return
  6. Problem 2: Object Model Behavior a = 42 b =

    42 42 a b L1 = [a, b] L2 = [a, b] &a, &b L1 L2 &a, &b
  7. Ref count: immutable objects from sys import getrefcount def count():

    a = 1 # Object value b = 1 # a and b share reference to same object ID print "a=", id(a) # Object identity print "b=", id(b) # Constant 1 has ref count of +2 (a and b) print "getrefcount(1)=", getrefcount(1) # Constant 1 has ref count of +3 now (a,b and c) c = 1 print "getrefcount(1)=", getrefcount(1) # Decrement the object ref count del c print "getrefcount(1)=", getrefcount(1) $ python refcount.py a= 146243760 b= 146243760 getrefcount(1)= 369 getrefcount(1)= 370 getrefcount(1)= 369
  8. Ref count: mutable objects from sys import getrefcount a =

    1 b = 1 list1 = [a, b] list2 = [a, b] t1 = (a, b) t2 = (a, b) # Mutable objects like lists do not share object IDs in spite of similar references print "list1=", id(list1) print "list2=", id(list2) print "t1=", id(t1) print "t2=", id(t2) # Changing contained object’s values do not modify immutable container a = 3 b = 10 print "t1=", id(t1) print "t2=" ,id(t2) $ python ex2.py list1= 3077464876 list2= 3077464460 t1= 3077561260 t2= 3077560652 t1= 3077561260 t2= 3077560652
  9. Mutable v/s Immutable O Similar mutable objects do not share

    reference O Immutable objects are interned
  10. Problem 3: Python Data Size O Size of objects differ

    from C data size O Accurate flat size of objects is sys.getsizeof() O What if application creates too many objects? typedef struct { PyObject_VAR_HEAD long ob_shash; int ob_sstate; char ob_sval[1]; } PyStringObject; // ./Include/object.h
  11. Python Data Size from sys import getsizeof print getsizeof("Pycon 2013")

    print getsizeof(2**64) print getsizeof(123) print getsizeof(3.143456) print getsizeof(None) print getsizeof("") # Container sizes print getsizeof([]) print getsizeof([1, 2, 3]) print getsizeof({}) print getsizeof({1 : "value", 2: “values"}) 31 22 12 16 8 21 32 44 136 136
  12. Python Objects: Observations O Mutable Objects O Similar mutable objects

    do not share reference O Expensive object size (list, dictionary) O Be wise with mutable objects O Immutable objects O interned
  13. Garbage Collector O Ref counting and Ref cycles O Does

    not track simple objects like numbers or strings O Collects objects in three generations O Enabled by default for version 2.0 onwards O gc module
  14. Garbage Collector: Problems O Reference cycles O Only container objects

    are capable O finalizer method O __del__() O GC has no idea of deletion order O Causes cycles to remain uncollected!
  15. Example: ref cycle @profile def cycle(): l = range (10

    ** 6) l.append(l) del l if __name__ == '__main__': cycle() $ python -m memory_profiler refCycle.py Filename: refCycle.py Line # Mem usage Increment Line Contents ================================================ 1 @profile 2 6.691 MB 0.000 MB def cycle(): 3 22.031 MB 15.340 MB l = range (10 ** 6) 4 22.031 MB 0.000 MB l.append(l) 5 22.031 MB 0.000 MB del l
  16. Problem 5: Memory leaks in extensions O Python VM is

    independent memory manager O Has no knowledge of memory usage by C/C++ code O Use valgrind: massif on extension modules
  17. Problems: Summary O Interning O Python Object model O Object

    mutability O Object size O Garbage collection O Memory leaks in extension modules (C/C++)
  18. memory_profiler O Line by line memory consumption O Uses “ps”

    O Easy to use O Useful to learn VM allocation pattern
  19. memory_profiler @profile def cycle(): l = range (999999) l.append(l) del

    l print "end" if __name__ == '__main__': cycle() $ python -m memory_profiler refCycle.py end Filename: refCycle.py Line # Mem usage Increment Line Contents ================================================ 3 @profile 4 6.859 MB 0.000 MB def cycle(): 5 22.211 MB 15.352 MB l = range (999999) 6 22.211 MB 0.000 MB l.append(l) 7 22.211 MB 0.000 MB del l 8 22.215 MB 0.004 MB print "end"
  20. Objgraph O Object references O Forward and backward O Objects,

    tracked by garbage collector O Useful for: O Memory leaks O Reference cycles O Reference counting bugs
  21. Objgraph: Example import objgraph def cycle(): l = range (2)

    l.append(l) d = dict(key=l) objgraph.show_refs([d], filename='sample-graph.png') if __name__ == '__main__': cycle() Reference cycle
  22. Pythontutor import objgraph def cycle(): l = range (2) l.append(l)

    d = dict(key=l) if __name__ == '__main__': cycle()
  23. Lessons and Best Practices O Use 64-bit Python O How

    many objects and in what order O Avoid all at once; Load as required O Use xrange over range for iteration O force the garbage collector O del data O Use weak references
  24. Lessons and Best Practices O Avoid creating reference cycles O

    Break cycles explicitly! O Delete garbage objects list O del gc.garbage[:] O Try using __slots__
  25. References O Python gc module: http://docs.python.org/2/library/gc.html#gc.garbage O Python design FAQ:

    http://docs.python.org/2/faq/design.html O Design of Cpython compiler: http://docs.python.org/devguide/compiler.html O Python object size computation: http://code.activestate.com/recipes/546530/ O Garbage Collector: http://arctrix.com/nas/python/gc/ O GC Code: http://svn.python.org/view/python/trunk/Modules/gcmodule.c O Memory profiler: http://fa.bianp.net/blog/2012/line-by-line-report-of- memory-usage/ O Objgraph: http://mg.pov.lt/objgraph/ O Using Objgraph: http://www.darkcoding.net/software/finding-memory- leaks-in-python-with-objgraph/ O Weak references: http://docs.python.org/2/library/weakref.html
  26. Agenda O What is the problem? O What causes it?

    O CPython memory manager O Garbage collection O Memory leaks O Surgical tools O Objgraph, memory_profiler O Lessons and advices
  27. slots O __slot__: Instance variables of Classes keep attribute in

    a per instance dictionary. O You may see references to a dictionary from an object O Wastes space for a fewer instance variables and high instance count O __slot__ allots exactly the requested space in a sequence of instance variables O http://docs.python.org/2/reference/datamodel. html#slots
  28. Mutability O By definition, immutable objects such as numbers, strings,

    tuples, and None, are safe from change. Changes to mutable objects such as dictionaries, lists, and class instances can lead to confusion. O Because of this feature, it is good programming practice to not use mutable objects as default values. Instead, use None as the default value and inside the function, check if the parameter is None and create a new list/dictionary/whatever if it is.
  29. Weak References O Incapable of overriding garbage collection O Useful

    to create cache/ references to large objects O It is good to have a reference, but it is okay not to have one  O Not all objects could be weakly referenced O File objects, generators, sets, sockets, Objects that include class instances O If referent is dead, you get None