Slide 1

Slide 1 text

BASICS OF MEMORY MANAGEMENT IN PYTHON Nina Zakharenko

Slide 2

Slide 2 text

WHY SHOULD YOU CARE? Knowing about memory management helps you write more efficient code.

Slide 3

Slide 3 text

WHAT WILL YOU GET? ∎Vocabulary ∎Basic Concepts ∎Foundation

Slide 4

Slide 4 text

WHAT WON’T YOU GET? You won’t be an expert at the end of this talk.

Slide 5

Slide 5 text

WHAT’S A VARIABLE?

Slide 6

Slide 6 text

What’s a C-style variable? Memory variable location Value a 0x3E8 101 b 0x3E9 101 These values live in a fixed size bucket. Can only hold same-sized data, or an overflow occurs.

Slide 7

Slide 7 text

What’s a C-style variable? Memory location Value 0x3E8 101 0x3E9 101 Later… 110 The data in this memory location is overwritten.

Slide 8

Slide 8 text

PYTHON HAS NAMES, NOT VARIABLES

Slide 9

Slide 9 text

How are python objects stored in memory? names references objects

Slide 10

Slide 10 text

A name is just a label for an object. In python, each object can have lots of names.

Slide 11

Slide 11 text

Simple • numbers • strings Different Types of Objects Containers •dict •list • user defined- classes

Slide 12

Slide 12 text

What is a reference? A name or a container object pointing at another object.

Slide 13

Slide 13 text

What is a reference count?

Slide 14

Slide 14 text

How can we increase the ref count? 300 x = 300 x references: 1 +1

Slide 15

Slide 15 text

How can we increase the ref count? 300 x = 300 y = 300 x references: 2 y +1

Slide 16

Slide 16 text

How can we increase the ref count? 300 z = [300, 300] x references: 4 y

Slide 17

Slide 17 text

Decrease Ref Count - del 300 x = 300 y = 300 del x references: 1 y x

Slide 18

Slide 18 text

What does del do? The del statement doesn’t delete objects. It: • removes that name as a reference to that object • reduces the ref count by 1

Slide 19

Slide 19 text

Decrease Ref Count - Change Reference x = 300 y = 300 300 references:0 y y = None

Slide 20

Slide 20 text

Decrease Ref Count - Going out of Scope def print_word(): word = 'Seven' print('Word is ' + word) ref count +1 ‘seven’ is out of scope. ref count -1 print_word()

Slide 21

Slide 21 text

local vs. global namespace ■If refcounts decrease when an object goes out of scope, what happens to objects in the global namespace? ■Never go out of scope! Refcount never reaches 0. ■Avoid putting large or complex objects in the global namespace.

Slide 22

Slide 22 text

Every python object holds 3 things ∎Its type ∎Its value ∎A reference count

Slide 23

Slide 23 text

PyObject type integer refcount 2 value 300 Names References x y

Slide 24

Slide 24 text

x = 300 y = 300 print( id(x) ) > 28501818 print( id(y) ) > 28501818 print x is y > True * don’t try this in an interactive environment (REPL)

Slide 25

Slide 25 text

GARBAGE COLLECTION

Slide 26

Slide 26 text

What is Garbage Collection? A way for a program to automatically release memory when the object taking up that space is no longer in use.

Slide 27

Slide 27 text

Two Main Types of Garbage Collection Reference Counting Tracing

Slide 28

Slide 28 text

How does reference counting garbage collection work? Add and Remove References Refcount Reaches 0 Cascading Effect

Slide 29

Slide 29 text

The Good • Easy to Implement • When refcount is 0, objects are immediately deleted. Reference Counting Garbage Collection The Bad • space overhead - reference count is stored for every object • execution overhead - reference count changed on every assignment

Slide 30

Slide 30 text

The Ugly • Not generally thread safe • Reference counting doesn’t detect cyclical references Reference Counting Garbage Collection

Slide 31

Slide 31 text

Cyclical References By Example class Node: def __init__(self, value): self.value = value def next(self, next): self.next = next

Slide 32

Slide 32 text

What’s a cyclical reference? left right root rc = 1 rc = 3 rc = 2 root = Node('root') left = Node('left') right = Node(‘right') root.next(left) left.next(right) right.next(left)

Slide 33

Slide 33 text

What’s a cyclical reference? del root del node1 del node2 left right root rc = 0 rc = 1 rc = 1

Slide 34

Slide 34 text

Reference counting alone will not garbage collect objects with cyclical references.

Slide 35

Slide 35 text

Two Main Types of Garbage Collection Reference Counting Tracing

Slide 36

Slide 36 text

Tracing Garbage Collection ■source: http://webappguru.blogspot.com/2015/11/mark-and-sweep-garbage-collection.html

Slide 37

Slide 37 text

Tracing Garbage Collection ■source: http://webappguru.blogspot.com/2015/11/mark-and-sweep-garbage-collection.html

Slide 38

Slide 38 text

What does Python use? Reference Counting Generational +

Slide 39

Slide 39 text

Generational Garbage Collection is based on the theory that most objects die young. ■ source: http://cs.ucsb.edu/~ckrintz/racelab/gc/papers/hoelzle-jvm98.pdf

Slide 40

Slide 40 text

Python maintains a list of every object created as a program is run. Actually, it makes 3. generation 0 generation 1 generation 2 Newly created objects are stored in generation 0.

Slide 41

Slide 41 text

Only container objects with a refcount greater than 0 will be stored in a generation list.

Slide 42

Slide 42 text

When the number of objects in a generation reaches a threshold, python runs a garbage collection algorithm on that generation, and any generations younger than it.

Slide 43

Slide 43 text

What happens during a generational garbage collection cycle? Python makes a list for objects to discard. It runs an algorithm to detect reference cycles. If an object has no outside references, it’s put on the discard list. When the cycle is done, it frees up the objects on the discard list.

Slide 44

Slide 44 text

After a garbage collection cycle, objects that survived will be promoted to the next generation. Objects in the last generation (2) stay there as the program executes.

Slide 45

Slide 45 text

When the ref count reaches 0, you get immediate clean up. If you have a cycle, you need to wait for garbage collection.

Slide 46

Slide 46 text

REFERENCE COUNTING GOTCHAS

Slide 47

Slide 47 text

Reference counting is not generally thread-safe. We’ll see why this is a big deal™ later.

Slide 48

Slide 48 text

Remember our cycle from before? left right rc = 1 rc = 1 Cyclical references get cleaned up by generational garbage collection.

Slide 49

Slide 49 text

Cyclical Reference Cleanup Except in python2 if they have a __del__ method. **fixed in python 3.4! - https://www.python.org/dev/peps/pep-0442/ Gotcha!

Slide 50

Slide 50 text

The __del__ magic method ■ Sometimes called a “destructor” ■Not the del statement. ■ Runs before an object is removed from memory

Slide 51

Slide 51 text

__slots__

Slide 52

Slide 52 text

What are __slots__? class Dog(object): pass buddy = Dog() buddy.name = 'Buddy' print(buddy.__dict__) {'name': 'Buddy'}

Slide 53

Slide 53 text

What are __slots__? 'Pug'.name = 'Fred' AttributeError Traceback (most recent call last) ----> 1 'Pug'.name = 'Fred' AttributeError: 'str' object has no attribute 'name'

Slide 54

Slide 54 text

class Point(object): __slots__ = ('x', 'y') What are __slots__? What is the type of __slots__? point.name = "Fred" Traceback (most recent call last): File "point.py", line 8, in point.name = "Fred" AttributeError: 'Point' object has no attribute 'name' point = Point() point.x = 5 point.y = 7

Slide 55

Slide 55 text

size of dict vs. size of tuple import sys sys.getsizeof(dict()) sys.getsizeof(tuple()) sizeof dict: 288 bytes sizeof tuple: 48 bytes

Slide 56

Slide 56 text

When would we want to use __slots__? ■ If we’re going to be creating many instances of a class ■If we know in advance what properties the class should have

Slide 57

Slide 57 text

WHAT’S A GIL?

Slide 58

Slide 58 text

GLOBAL INTERPETER LOCK

Slide 59

Slide 59 text

Only one thread can run in the interpreter at a time.

Slide 60

Slide 60 text

Upside Fast & Simple Garbage Collection Advantages / Disadvantages of a GIL Downside In a python program, no matter how many threads exist, only one thread will be executed at a time.

Slide 61

Slide 61 text

■Use multi-processing instead of multi- threading. ■Each process will have it’s own GIL, it’s on the developer to figure out a way to share information between processes. Want to take advantage of multiple CPUs?

Slide 62

Slide 62 text

If the GIL limits us, can’t we just remove it? additional reading: https://docs.python.org/3/faq/library.html#can-t-we-get-rid-of-the-global-interpreter-lock

Slide 63

Slide 63 text

For better or for worse, the GIL is here to stay!

Slide 64

Slide 64 text

WHAT DID WE LEARN?

Slide 65

Slide 65 text

Garbage collection is pretty good.

Slide 66

Slide 66 text

Now you know how memory is managed.

Slide 67

Slide 67 text

Consider python3

Slide 68

Slide 68 text

Or, for scientific applications numpy & pandas.

Slide 69

Slide 69 text

Thanks! @nnja [email protected] bit.ly/memory_management

Slide 70

Slide 70 text

Bonus Material

Slide 71

Slide 71 text

Additional Reading • Great explanation of generational garbage collection and python’s reference detection algorithm. • https://www.quora.com/How-does-garbage-collection-in-Python- work • Weak Reference Documentation • https://docs.python.org/3/library/weakref.html • Python Module of the Week - gc • https://pymotw.com/2/gc/ • PyPy STM - GIL less Python Interpreter • http://morepypy.blogspot.com/2015/03/pypy-stm-251- released.html • Saving 9GB of RAM with python’s __slots__ • http://tech.oyster.com/save-ram-with-python-slots/

Slide 72

Slide 72 text

Getting in-depth with the GIL • Dave Beazley - Guide on how the GIL Operates • http://www.dabeaz.com/python/GIL.pdf • Dave Beazley - New GIL in Python 3.2 • http://www.dabeaz.com/python/NewGIL.pdf • Dave Beazley - Inside Look at Infamous GIL Patch • http://dabeaz.blogspot.com/2011/08/inside-look-at-gil- removal-patch-of.html

Slide 73

Slide 73 text

Why can’t we use the REPL to follow along at home? • Because It doesn’t behave like a typical python program that’s being executed. • Further reading: http:/ /stackoverflow.com/questions/ 25281892/weird-id-result-on-cpython-intobject PYTHON PRE-LOADS OBJECTS • Many objects are loaded by Python as the interpreter starts. • Called peephole optimization. • Numbers: -5 -> 256 • Single Letter Strings • Common Exceptions • Further reading: http:/ /akaptur.com/blog/2014/08/02/ the-cpython-peephole-optimizer-and-you/

Slide 74

Slide 74 text

Common Question - Why doesn’t python a python program shrink in memory after garbage collection? • The freed memory is fragmented. • i.e. It’s not freed in one continuous block. • When we say memory is freed during garbage collection, it’s released back to python to use for other objects, and not necessarily to the system. • After garbage collection, the size of the python program likely won’t go down.

Slide 75

Slide 75 text

PyListObject type list refcount 1 value size 3 capacity 10 nums Value -10 refcount 1 type integer PyObject Value -9 refcount 2 type integer PyObject How does python store container objects?

Slide 76

Slide 76 text

Credits Big thanks to: • Faris Chebib & The Salt Lake City Python Meetup • The many friends & co-workers who lent me their eyes & ears, particularly Steve Holden Special thanks to all the people who made and released these awesome resources for free: ■ Presentation template by SlidesCarnival ■ Photographs by Unsplash ■ Icons by iconsdb

Slide 77

Slide 77 text

No content