Slide 1

Slide 1 text

The Memory Chronicles: A Tale of Two Pythons kavya @kavya719

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

The standard and default implementation.
 Oft-heard memory problems: Uses more memory than 
 desirable Ever increasing memory use High-water-mark usage Heap fragmentation. CPython

Slide 4

Slide 4 text

MicroPython For microcontrollers — 
 pyboard, ESP8266, micro:bit,
 others.
 256KiB ROM, 16KiB RAM.
 Implements the Python3 spec: complete syntax up to Python 3.4 but not complete functionality; 
 leaves out what’s unsuitable for 
 microcontrollers. Supports subset of stdlib.

Slide 5

Slide 5 text

Are bytecode interpreters .py source compiled to bytecode that the interpreters evaluate. And stack-based virtual machines They use a value stack to manipulate objects for evaluation.
 Written in C CPy and µPy …yet, on opposite ends of the memory use spectrum

Slide 6

Slide 6 text

Python 3.6 MicroPython 1.8 on a: 64bit Ubuntu 16.10 (Linux version 4.8.0) box, 4GB of RAM. What I’m running

Slide 7

Slide 7 text

methodology Create 200000 new objects of desired type and
 prevent them from being deallocated — 
 ints, strings, lists.
 Measure from within the Python process — CPy: sys, memory_profiler, pympler. µPy: micropython, gc modules.
 Heap use measured.

Slide 8

Slide 8 text

heap stack high addr low addr Process memory a prelude first CPy and µPy both use custom allocators to manage the heap. CPy’s allocator grows the heap on demand. µPy has a fixed heap.

Slide 9

Slide 9 text

integers CPy µPy 200000 integers with value from (10 ** 10) to (10 ** 10) + 200000. x-axis: number of integers.
 y-axis: memory use in bytes.

Slide 10

Slide 10 text

strings CPy µPy 200000 small strings of length <= 10, containing special characters.
 x-axis: number of strings.
 y-axis: memory use in bytes.

Slide 11

Slide 11 text

lists CPy µPy x-axis: number of elements in list.
 y-axis: memory use in bytes.

Slide 12

Slide 12 text

how objects are implemented memory management evaluation

Slide 13

Slide 13 text

how objects are implemented

Slide 14

Slide 14 text

Does CPy allocate larger objects?
 Does it allocate more objects?

Slide 15

Slide 15 text

CPy Objects All objects are allocated on the heap. x = 1 heap stack high addr low addr 1 CPy interpreter memory x in global or local namespace

Slide 16

Slide 16 text

All objects have an initial segment: PyObject_HEAD refcnt *type Reference count Pointer to type refcnt *type object-specific fields ] ] refcnt *type Py_ListType a list object refcnt *type Py_LongType an integer object

Slide 17

Slide 17 text

overhead { size_t refcnt; typeobject *type; } 8 bytes 8 bytes 16 bytes } PyObject_HEAD lower bound on sizeof CPy objects

Slide 18

Slide 18 text

x = 1 PyLongObject PyObject_HEAD refcnt *type size array Py_LongType array for value ] length of array CPy integers

Slide 19

Slide 19 text

x = 1 PyLongObject { PyObject_HEAD size_t size; uint32_t digit[1]; } 16 bytes 4 bytes }28 bytes! 4 bytes > import sys > x = 1 > sys.getsizeof(x) 28

Slide 20

Slide 20 text

> sys.int_info sys.int_info(bits_per_digit=30, sizeof_digit=4) > x = 10 ** 100 > sys.getsizeof(x) 32 { ... uint32_t digit[2]; } PyLongObject 24 bytes 8 bytes }32 bytes x = 10 ** 10

Slide 21

Slide 21 text

CPy 200000 integers * 32 bytes = ~ 6400000 bytes, or ~6 Mib

Slide 22

Slide 22 text

CPy 200000 integers * 32 bytes = ~ 6400000 bytes, or ~6 Mib µPy ?

Slide 23

Slide 23 text

µPy Objects A µPy “object” is a machine word. 8 bytes ?!

Slide 24

Slide 24 text

µPy Objects A µPy “object” is a machine word. pointer tagging store a “tag” in the unused bits of a pointer 8 bytes

Slide 25

Slide 25 text

pointer tagging • addresses are aligned to word-size i.e. 
 pointers will be multiples of 8.
 • So, they can be:
 8 : 0b1000
 16 : 0b10000 
 24 : 0b11000 } lower 3 bits will be 000; store a “tag” in it! remaining 61 bits used for value 0 <= tag <= 7 means to add extra info to the pointer xxxxxxxx | xxxxxxxx | xxxxxxxx | xxxxxxxx | xxxxxxxx | xxxxxxxx | xxxxxxxx | xxxxxTTT 1 byte

Slide 26

Slide 26 text

pointer tagging in µPy bit0 == 1: not a pointer at all but a small integer xxxxxxxx | … | … | xxxxxxx1 bits 1+ are the integer last two bits == 10: index into interned strings pool xxxxxxxx | … | … | xxxxxx10 bits 2+ are the index xxxxxxxx | … | … | xxxxxx00 last two bits == 00: pointer to a concrete object bits 2+ are the pointer || everything except small integers interned strings.

Slide 27

Slide 27 text

This is neat. Means small integers take 8 bytes, and are stored on the stack. x = 1 Is it a small int? Does its value fit in (64 - 1) bits? i.e. is it less than 263 - 1 xxxxxxxx | … | … | xxxxxxx1 8 bytes versus CPy 28 bytes

Slide 28

Slide 28 text

µPy integers x = 10 ** 10 8 bytes Still a small int versus CPy 32 bytes ! stored on stack, not heap

Slide 29

Slide 29 text

strings CPy µPy 200000 small strings of length <= 10, containing special characters.
 x-axis: number of strings.
 y-axis: memory use in bytes.

Slide 30

Slide 30 text

> import sys > x = “a” > sys.getsizeof(x) 50 CPy strings refcnt *type ... *wstr Py_UnicodeType null-terminated repr. PyObject_HEAD ] PyASCIIObject

Slide 31

Slide 31 text

> import sys > x = “a” > sys.getsizeof(x) 50 PyASCIIObject sizeof + sizeof value 48 bytes len(“a” + “\0”) * sizeof ASCII char 2 bytes CPy strings

Slide 32

Slide 32 text

> import sys > x = “a” > sys.getsizeof(x) 50 “gotmemory?” 59 bytes PyASCIIObject sizeof + sizeof value 48 bytes 10 + 1 (“\0”) bytes CPy strings

Slide 33

Slide 33 text

µPy strings 1 (“a”) + 1 (“\0”) x = “a” hash (2 bytes) length (1 byte) data (“length” number of bytes; null-terminated) Stores them as arrays: µPy has a special scheme for small strings. length <= 10 special chars okay! versus CPy 50 bytes So: 5 bytes !

Slide 34

Slide 34 text

CPy mutable objects Have an additional overhead for memory management: lists, dictionaries, classes, instances etc. 24 bytes } PyObject_HEAD ] ] PyGC_HEAD object-specific fields } ] 16 bytes 40 bytes total overhead

Slide 35

Slide 35 text

CPy lists > sys.getsizeof([1]) 72 16 bytes 8 bytes Array of pointers to items refcnt *type ... *item_ptrs Py_ListType GC_HEAD PyListObject 40 bytes ]

Slide 36

Slide 36 text

> sys.getsizeof([1]) 72 PyListObject sizeof + sizeof value 64 bytes 1 * sizeof pointer 8 bytes does not account for item itself! + sizeof items sys.getsizeof(1) list is really 100 bytes CPy lists

Slide 37

Slide 37 text

number of appends sizeof list (bytes)

Slide 38

Slide 38 text

> x = [1] > y = [] > y.append(1) > sys.getsizeof(y) 96 !! 72 (+ sizeof(1)) on append(): array is dynamically resized as needed. resizing over allocates ... *item_ptrs PyListObject CPy lists

Slide 39

Slide 39 text

µPy concrete objects mp_obj_base_t *type [ typeobject pointer too but no reference count. Have an initial segment: versus CPy 16 bytes. refcnt *type so, overhead then? 8 bytes. => everything except small integers, special strings. PyObject_ HEAD [ object-specific fields

Slide 40

Slide 40 text

µPy mutable objects No additional overhead. versus CPy for garbage collection 24 bytes (PyGC_HEAD)

Slide 41

Slide 41 text

µPy lists Same structure as CPy lists minus memory management overhead: > reference count 8 bytes > additional “PyGC_HEAD” overhead 24 bytes y = [1] CPy (without incl. sizeof(1)) 72 bytes µPy (without incl. sizeof(1)) 40 bytes 32 bytes

Slide 42

Slide 42 text

Does CPy allocate larger objects? 
 Does it allocate more objects? ✓ Generally speaking, yes. ✓ Generally speaking, yes.

Slide 43

Slide 43 text

No additional overhead refcnt Mutable objects: CPy’s PyObject_HEAD MicroPy’s mp_obj_base_t GC_HEAD 24 bytes 8 bytes

Slide 44

Slide 44 text

What’s with all of CPy’s 
 “memory management” overhead?

Slide 45

Slide 45 text

memory management overhead

Slide 46

Slide 46 text

CPy Memory Management Reference counting Number of references to this object. refcnt y y = x 1 x x = 300 local namespace contains: ‘x’ 1 ‘y’ 2

Slide 47

Slide 47 text

CPy Memory Management Reference counting Number of references to this object. refcnt 1 y x = 300 y = x local namespace contains: del x ‘y’

Slide 48

Slide 48 text

CPy Memory Management Reference counting Number of references to this object. refcnt 0 No references, so deallocate it! x = 300 y = x del x del y

Slide 49

Slide 49 text

Automatic reference counting CPython source, C extensions littered with Py_XINCREF, Py_XDECREF calls. Py_XDECREF calls deallocate if refcnt == 0. What’s with the PyGC_HEAD for mutable objects then?

Slide 50

Slide 50 text

x = [1, 2, 3] x.append(x) reference cycle! reference count never goes to 0, list would never be deallocated. del x generated using objgraph

Slide 51

Slide 51 text

So, CPy has a cyclic garbage collector too. Detects and breaks reference cycles. Is generational. Is stop-the-world. Runs automatically, but can be manually run as well:
 gc.collect(). Only mutable objects can participate in reference cycles, 
 so only tracks them —> PyGC_HEAD.

Slide 52

Slide 52 text

µPy Memory Management Does not use reference counting, so objects do not need the reference count field. Heap divided into “blocks”:
 > unit of allocation,
 > 32 bytes.
 state of each block tracked 
 in a bitmap, also on heap:
 > 2 bits per block.
 > tracks “free” / “in-use” bitmap allocator allocation bitmap blocks for application’s use

Slide 53

Slide 53 text

local namespace contains: ‘x’ x x = myFoo() y y = x ‘y’

Slide 54

Slide 54 text

local namespace contains: x = myFoo() y y = x del x ‘y’

Slide 55

Slide 55 text

…so when is the block deallocated, i.e. its bitmap bits set to 0 again? x = myFoo() y = x del x del y

Slide 56

Slide 56 text

Garbage collection. Mark-sweep collector. Is not generational.
 Is stop-the-world. Runs automatically (on the Unix port), but 
 can be disabled or run manually as well. Manages all heap-allocated objects.

Slide 57

Slide 57 text

What’s with all of CPy’s 
 “memory management” overhead? ✓

Slide 58

Slide 58 text

a note (or two)… µPy’s “operational” overhead / overhead to run the program
 is lower too:
 
 CPy interpreter: Python stack frames live on the heap. µPy interpreter: Python stack frames live on the C stack!
 
 µPy has optimizations for compiler/ compile-stage 
 memory use. …Go to the source for more goodness!

Slide 59

Slide 59 text

evaluation

Slide 60

Slide 60 text

Trade-offs? Why they make the design decisions they do?

Slide 61

Slide 61 text

@kavya719 speakerdeck.com/kavya719/the-memory-chronicles CPy versus µPy:
 > object implementations, and > memory management impact of these differences.

Slide 62

Slide 62 text

interpreters PSS: Proportional Set Size memory allocated to process and resident in RAM, with special accounting for shared memory.

Slide 63

Slide 63 text

CPy optimizations -5 <= integers <= 257 are shared i.e. single object per integer. In a preallocated array, allocated at interpreter start-up. > x = 250 > y = 250 > x is y True …but that’s 262 * 28 bytes = ~7KB of RAM!

Slide 64

Slide 64 text

CPy optimizations Strings that look like Python identifiers are interned. { A-z, a-z, 0-9, _ } shared via interned dict At compile time. > a = “python” > b = “python” > a is b True … µPy interns identifiers and small strings too.

Slide 65

Slide 65 text

source code MicroPython: https://github.com/micropython/ https://micropython.org CPython source code (Github mirror): https://github.com/python/cpython

Slide 66

Slide 66 text

µPy FAQ Architectures? x86, x86-64, ARM, ARM Thumb, Xtensa. Versus PyPy, etc.? Versus Go, Rust, Lua, JavaScript etc.? Multithreading?
 Via the "_thread" module, with an optional global-interpreter-lock 
 (still work in progress, only available on selected ports). Async? Unicode?

Slide 67

Slide 67 text

µPy FAQ Versus Arduino, Raspberry Pi or Tessel? ▪ PyBoard in between Arduino and Raspberry Pi ◦ More approachable than Arduino ◦ Not a full OS like Raspberry Pi ▪ Tessel similar to Micro Python but runs Javascript sulphide-glacier:py $ ls -l | grep "obj" | grep "\.h" obj.h objarray.h objexcept.h objfun.h objgenerator.h objint.h objlist.h objmodule.h objstr.h objstringio.h objtuple.h objtype.h

Slide 68

Slide 68 text

CPy memory allocation CPython uses an object allocator, and several object- specific allocators. _____ ______ ______ ________ [ int ] [ dict ] [ list ] ... [ string ] | | | | +3 | Object-specific memory | | _______________________________ | | [ Python's object allocator ] | | | | | | +2 | Object memory | | | ______________________________________________________________ | [ ] ] +1 | | | __________________________________________________________________ [ Underlying general-purpose allocator, e.g. C library malloc ] 0 | <------ Virtual memory allocated for the python process -------> | ========================================================================= Operating System

Slide 69

Slide 69 text

CPython has an “object allocator”, on top of a general-purpose allocator, like malloc. arena (256 KB) pool pool pool block block block pool (4KB) fixed-size blocks

Slide 70

Slide 70 text

x = 300 28 bytes Existing pools with free blocks of the desired size? size class usedpools 8 bytes 16 bytes 24 bytes 32 bytes pool free blk

Slide 71

Slide 71 text

Else, find an available pool + carve out a block. partially allocated arenas free pool arena What if arenas full? mmap() an arena create a pool of class size blocks return a block