Slide 1

Slide 1 text

Memory Management in Python Nina Zakharenko - @nnja slides: bit.ly/nbpy-memory

Slide 2

Slide 2 text

Livetweet! use #nbpy @nnja

Slide 3

Slide 3 text

Why should you care? Knowing about memory management helps you write more efficient code. @nnja

Slide 4

Slide 4 text

What will you learn? 4 Vocabulary 4 Basic Concepts 4 Foundation @nnja

Slide 5

Slide 5 text

What won't you learn? You won’t be an expert at the end of this talk. @nnja

Slide 6

Slide 6 text

What's a variable? @nnja

Slide 7

Slide 7 text

What's a c-style variable?

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

Change value of c-style variables

Slide 10

Slide 10 text

Python has names not variables @nnja

Slide 11

Slide 11 text

How are Python objects stored in memory? names ➡ references ➡ objects @nnja

Slide 12

Slide 12 text

A name is just a label for an object. In Python, each object can have lots of names. Like 'x', 'y' @nnja

Slide 13

Slide 13 text

Different Types of Objects Simple Container numbers dict strings list user defined classes Container objects can contain simple objects, or other container objects. @nnja

Slide 14

Slide 14 text

What's a Reference? A name or a container object that points at another object. @nnja

Slide 15

Slide 15 text

Reference Count @nnja

Slide 16

Slide 16 text

‐ Increasing the ref count @nnja

Slide 17

Slide 17 text

No content

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

‑ Decreasing the ref count @nnja

Slide 21

Slide 21 text

Decrease Ref Count: Change the Reference

Slide 22

Slide 22 text

Decrease Ref Count: del keyword

Slide 23

Slide 23 text

What does del do? The del statement doesn't delete objects. It: 4 removes that name as a reference to that object 4 reduces the ref count by 1 @nnja

Slide 24

Slide 24 text

Decrease Ref Count: Go out of Scope

Slide 25

Slide 25 text

! When there are no more references, the object can be safely removed from memory. @nnja

Slide 26

Slide 26 text

local vs. global namespace If refcounts decrease when an object goes out of scope, what happens to objects in the global namespace? 4 Never goes out of scope! 4 Refcount never reaches 0. 4 Avoid putting large or complex objects in the global namespace. @nnja

Slide 27

Slide 27 text

Every Python object holds 3 things 4 Its type 4 A reference count 4 Its value @nnja

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

>>> def mem_test(): ... x = 300 ... y = 300 ... print( id(x) ) ... print( id(y) ) ... print( x is y ) >>> mem_test() 4504654160 4504654160 True ℹ note: run this from a function in the repl, or from a file @nnja

Slide 30

Slide 30 text

Garbage Collection @nnja

Slide 31

Slide 31 text

What is Garbage Collection? A way for a program to automatically release memory when the object taking up that space is no longer in use. @nnja

Slide 32

Slide 32 text

Two Main Types of Garbage Collection 1. Reference Counting 2. Tracing @nnja

Slide 33

Slide 33 text

How does reference counting garbage collection work? 1. Add and remove references 2. When the refcount reaches 0, remove the object 3. Cascading effect 4 decrease ref count of any object the deleted object was pointing to @nnja

Slide 34

Slide 34 text

Reference Counting Garbage Collection: The Good 4 Easy to implement 4 When refcount is 0, objects are immediately deleted. @nnja

Slide 35

Slide 35 text

Reference Counting: The Bad 4 space overhead 4 reference count is stored for every object 4 execution overhead 4 reference count changed on every assignment @nnja

Slide 36

Slide 36 text

Reference Counting: The Ugly Not generally thread safe! Reference counting doesn't detect cyclical references @nnja

Slide 37

Slide 37 text

Cyclical References By Example @nnja

Slide 38

Slide 38 text

What's a cyclical reference? @nnja

Slide 39

Slide 39 text

Cyclical Reference @nnja

Slide 40

Slide 40 text

Reference counting alone will not garbage collect objects with cyclical references. @nnja

Slide 41

Slide 41 text

Two Main Types of Garbage Collection 1. Reference Counting 2. Tracing @nnja

Slide 42

Slide 42 text

Tracing Garbage Collection - Marking

Slide 43

Slide 43 text

Tracing Garbage Collection - Sweeping

Slide 44

Slide 44 text

What does Python use? Reference Counting & Generational (A type of Tracing GC) @nnja

Slide 45

Slide 45 text

Generational Garbage Collection is based on the theory that most objects die young. @nnja

Slide 46

Slide 46 text

Python maintains a list of every object created as a program is run. Actually, it makes 3: - generation 0 - generation 1 - generation 2 Newly created objects are stored in generation 0. @nnja

Slide 47

Slide 47 text

Only container objects with a refcount greater than 0 will be stored in a generation list. @nnja

Slide 48

Slide 48 text

When the number of objects in a generation reaches a threshold, python runs a garbage collection algorithm on that generation, and any generations younger than it. @nnja

Slide 49

Slide 49 text

What happens during a generational garbage collection cycle? 1. Python makes a list for objects to discard. 2. It runs an algorithm to detect reference cycles. 3. If an object has no outside references, add it to the discard list. 4. When the cycle is done, free up the objects on the discard list. @nnja

Slide 50

Slide 50 text

After a garbage collection cycle, objects that survived will be promoted to the next generation. Objects in the last generation (2) stay there as the program executes. @nnja

Slide 51

Slide 51 text

When the ref count reaches 0, you get immediate clean up. If you have a cycle, you need to wait for garbage collection to run. @nnja

Slide 52

Slide 52 text

Objects with cyclical references get cleaned up by generational garbage collection. @nnja

Slide 53

Slide 53 text

❓ Why doesn’t a Python program shrink in memory after garbage collection? @nnja

Slide 54

Slide 54 text

After garbage collection, the size of the python program likely won’t shrink. 4 The freed memory is fragmented. 4 i.e. it's not freed in one continuous block. 4 When we say memory is freed during garbage collection, it’s released back to Python to use for other objects, not necessarily to the system. @nnja

Slide 55

Slide 55 text

Quick Optimizations @nnja

Slide 56

Slide 56 text

__slots__ @nnja

Slide 57

Slide 57 text

Python instances have a dict of values class Dog(object): pass buddy = Dog() buddy.name = 'Buddy' print(buddy.__dict__) {'name': 'Buddy'} @nnja

Slide 58

Slide 58 text

AttributeError 'Hello'.name = 'Fred' AttributeError Traceback (most recent call last) ----> 1 'Hello'.name = 'Fred' AttributeError: 'str' object has no attribute 'name' @nnja

Slide 59

Slide 59 text

__slots__ class Point(object): __slots__ = ('x', 'y') point = Point() point.x = 5 point.y = 7 point.name = "Fred" Traceback (most recent call last): File "point.py", line 8, in point.name = "Fred" AttributeError: 'Point' object has no attribute 'name' @nnja

Slide 60

Slide 60 text

size of dict vs. size of tuple import sys sys.getsizeof(dict()) sys.getsizeof(tuple()) sizeof dict: 232 bytes sizeof tuple: 40 bytes @nnja

Slide 61

Slide 61 text

When to use slots? 4 Creating many instances of a class 4 Know in advance what properties the class should have Saving 9 GB of RAM with __slots__

Slide 62

Slide 62 text

weakref 4 A weakref to an object is not enough to keep it alive. 4 When the only remaining references are weak references, the object can be garbage collected. 4 Useful for: 4 implementing caches or mappings holding large objects python3 weakref docs

Slide 63

Slide 63 text

What's a GIL? @nnja

Slide 64

Slide 64 text

Global Interpreter Lock @nnja

Slide 65

Slide 65 text

Only one thread can run in the interpreter at a time. @nnja

Slide 66

Slide 66 text

Advantages / Disadvantages of a GIL Upside: Reference counting is fast and easy to implement. Downside: In a Python program, no matter how many threads exist, only one thread will be executed at a time.

Slide 67

Slide 67 text

Want to take advantage of multiple cores? 4 Use multi-processing instead of multi-threading. 4 Each process will have it’s own GIL, it’s on the developer to figure out a way to share information between processes. @nnja

Slide 68

Slide 68 text

❓ If the GIL limits Python, can’t we just remove it? additional reading

Slide 69

Slide 69 text

For better or for worse, the GIL is here to stay! @nnja

Slide 70

Slide 70 text

What Did We Learn? @nnja

Slide 71

Slide 71 text

Garbage collection is pretty good. @nnja

Slide 72

Slide 72 text

Now you know how memory is managed. @nnja

Slide 73

Slide 73 text

Python3! @nnja

Slide 74

Slide 74 text

For scientific applications, use numpy & pandas. @nnja

Slide 75

Slide 75 text

Thank You! Python @ Microsoft: bit.ly/nbpy-microsoft @nnja *Bonus material on the next slide

Slide 76

Slide 76 text

Bonus Material Section ➡ @nnja

Slide 77

Slide 77 text

Additional Reading 4 Great explanation of generational garbage collection and python’s reference detection algorithm 4 Weak Reference Documentation 4 Python Module of the Week - gc 4 PyPy STM - GIL less Python Interpreter 4 Saving 9GB of RAM with python’s __slots__ @nnja

Slide 78

Slide 78 text

Getting in-depth with the GIL 4 Dave Beazley - Guide on how the GIL Operates 4 Dave Beazley - New GIL in Python 3.2 4 Dave Beazley - Inside Look at Infamous GIL Patch @nnja

Slide 79

Slide 79 text

Why can’t we use the REPL to follow along at home? 4 Because It doesn’t behave like a typical python program that’s being executed. 4 Further reading @nnja

Slide 80

Slide 80 text

Python pre-loads objects 4 Many objects are loaded by Python as the interpreter starts. 4 Called peephole optimization. 4 Numbers: -5 -> 256 4 Single Letter Strings 4 Common Exceptions 4 Further reading @nnja

Slide 81

Slide 81 text

Attempting to remove the Gil - A Gilectomy 4 Larry Hastings - Removing Python's GIL - The Gilectomy 4 Larry Hastings - The Gilectomy, How it's going 4 Gilectomy on GitHub 4 A Gilectomy Update @nnja

Slide 82

Slide 82 text

weakref 4 weakref Python Module of the week 4 weakref documentation @nnja

Slide 83

Slide 83 text

@nnja