The Memory Chronicles

The Memory Chronicles: A Tale of Two Pythons kavya @kavya719

The standard and default implementation.  Oft-heard memory problems: Uses more
memory than   desirable Ever increasing memory use High-water-mark usage Heap fragmentation. CPython

MicroPython For microcontrollers —   pyboard, ESP8266, micro:bit,  others.  256KiB
ROM, 16KiB RAM.  Implements the Python3 spec: complete syntax up to Python 3.4 but not complete functionality;   leaves out what’s unsuitable for   microcontrollers. Supports subset of stdlib.

Are bytecode interpreters .py source compiled to bytecode that the
interpreters evaluate. And stack-based virtual machines They use a value stack to manipulate objects for evaluation.  Written in C CPy and µPy …yet, on opposite ends of the memory use spectrum

Python 3.6 MicroPython 1.8 on a: 64bit Ubuntu 16.10 (Linux
version 4.8.0) box, 4GB of RAM. What I’m running

methodology Create 200000 new objects of desired type and  prevent
them from being deallocated —   ints, strings, lists.  Measure from within the Python process — CPy: sys, memory_profiler, pympler. µPy: micropython, gc modules.  Heap use measured.

heap stack high addr low addr Process memory a prelude
ﬁrst CPy and µPy both use custom allocators to manage the heap. CPy’s allocator grows the heap on demand. µPy has a ﬁxed heap.

integers CPy µPy 200000 integers with value from (10 **
10) to (10 ** 10) + 200000. x-axis: number of integers.  y-axis: memory use in bytes.

strings CPy µPy 200000 small strings of length <= 10,
containing special characters.  x-axis: number of strings.  y-axis: memory use in bytes.

lists CPy µPy x-axis: number of elements in list.  y-axis:
memory use in bytes.

how objects are implemented memory management evaluation

how objects are implemented

Does CPy allocate larger objects?  Does it allocate more objects?

CPy Objects All objects are allocated on the heap. x
= 1 heap stack high addr low addr 1 CPy interpreter memory x in global or local namespace

All objects have an initial segment: PyObject_HEAD refcnt *type Reference
count Pointer to type refcnt *type object-speciﬁc ﬁelds ] ] refcnt *type Py_ListType a list object refcnt *type Py_LongType an integer object

overhead { size_t refcnt; typeobject *type; } 8 bytes 8
bytes 16 bytes } PyObject_HEAD lower bound on sizeof CPy objects

x = 1 PyLongObject PyObject_HEAD refcnt *type size array Py_LongType
array for value ] length of array CPy integers

x = 1 PyLongObject { PyObject_HEAD size_t size; uint32_t digit[1];
} 16 bytes 4 bytes }28 bytes! 4 bytes > import sys > x = 1 > sys.getsizeof(x) 28

> sys.int_info sys.int_info(bits_per_digit=30, sizeof_digit=4) > x = 10 ** 100
> sys.getsizeof(x) 32 { ... uint32_t digit[2]; } PyLongObject 24 bytes 8 bytes }32 bytes x = 10 ** 10

CPy 200000 integers * 32 bytes = ~ 6400000 bytes,
or ~6 Mib

CPy 200000 integers * 32 bytes = ~ 6400000 bytes,
or ~6 Mib µPy ?

µPy Objects A µPy “object” is a machine word. 8
bytes ?!

µPy Objects A µPy “object” is a machine word. pointer
tagging store a “tag” in the unused bits of a pointer 8 bytes

pointer tagging • addresses are aligned to word-size i.e.  
pointers will be multiples of 8.  • So, they can be:  8 : 0b1000  16 : 0b10000   24 : 0b11000 } lower 3 bits will be 000; store a “tag” in it! remaining 61 bits used for value 0 <= tag <= 7 means to add extra info to the pointer xxxxxxxx | xxxxxxxx | xxxxxxxx | xxxxxxxx | xxxxxxxx | xxxxxxxx | xxxxxxxx | xxxxxTTT 1 byte

pointer tagging in µPy bit0 == 1: not a pointer
at all but a small integer xxxxxxxx | … | … | xxxxxxx1 bits 1+ are the integer last two bits == 10: index into interned strings pool xxxxxxxx | … | … | xxxxxx10 bits 2+ are the index xxxxxxxx | … | … | xxxxxx00 last two bits == 00: pointer to a concrete object bits 2+ are the pointer || everything except small integers interned strings.

This is neat. Means small integers take 8 bytes, and
are stored on the stack. x = 1 Is it a small int? Does its value ﬁt in (64 - 1) bits? i.e. is it less than 263 - 1 xxxxxxxx | … | … | xxxxxxx1 8 bytes versus CPy 28 bytes

µPy integers x = 10 ** 10 8 bytes Still
a small int versus CPy 32 bytes ! stored on stack, not heap

strings CPy µPy 200000 small strings of length <= 10,
containing special characters.  x-axis: number of strings.  y-axis: memory use in bytes.

> import sys > x = “a” > sys.getsizeof(x) 50
CPy strings refcnt *type ... *wstr Py_UnicodeType null-terminated repr. PyObject_HEAD ] PyASCIIObject

PyASCIIObject sizeof + sizeof value 48 bytes len(“a” + “\0”) * sizeof ASCII char 2 bytes CPy strings

“gotmemory?” 59 bytes PyASCIIObject sizeof + sizeof value 48 bytes 10 + 1 (“\0”) bytes CPy strings

µPy strings 1 (“a”) + 1 (“\0”) x = “a”
hash (2 bytes) length (1 byte) data (“length” number of bytes; null-terminated) Stores them as arrays: µPy has a special scheme for small strings. length <= 10 special chars okay! versus CPy 50 bytes So: 5 bytes !

CPy mutable objects Have an additional overhead for memory management:
lists, dictionaries, classes, instances etc. 24 bytes } PyObject_HEAD ] ] PyGC_HEAD object-speciﬁc ﬁelds } ] 16 bytes 40 bytes total overhead

CPy lists > sys.getsizeof([1]) 72 16 bytes 8 bytes Array
of pointers to items refcnt *type ... *item_ptrs Py_ListType GC_HEAD PyListObject 40 bytes ]

> sys.getsizeof([1]) 72 PyListObject sizeof + sizeof value 64 bytes
1 * sizeof pointer 8 bytes does not account for item itself! + sizeof items sys.getsizeof(1) list is really 100 bytes CPy lists

number of appends sizeof list (bytes)

> x = [1] > y = [] > y.append(1)
> sys.getsizeof(y) 96 !! 72 (+ sizeof(1)) on append(): array is dynamically resized as needed. resizing over allocates ... *item_ptrs PyListObject CPy lists

µPy concrete objects mp_obj_base_t *type [ typeobject pointer too but
no reference count. Have an initial segment: versus CPy 16 bytes. refcnt *type so, overhead then? 8 bytes. => everything except small integers, special strings. PyObject_ HEAD [ object-speciﬁc ﬁelds

µPy mutable objects No additional overhead. versus CPy for garbage
collection 24 bytes (PyGC_HEAD)

µPy lists Same structure as CPy lists minus memory management
overhead: > reference count 8 bytes > additional “PyGC_HEAD” overhead 24 bytes y = [1] CPy (without incl. sizeof(1)) 72 bytes µPy (without incl. sizeof(1)) 40 bytes 32 bytes

Does CPy allocate larger objects?   Does it allocate more
objects? ✓ Generally speaking, yes. ✓ Generally speaking, yes.

No additional overhead refcnt Mutable objects: CPy’s PyObject_HEAD MicroPy’s mp_obj_base_t
GC_HEAD 24 bytes 8 bytes

What’s with all of CPy’s   “memory management” overhead?

memory management overhead

CPy Memory Management Reference counting Number of references to this
object. refcnt y y = x 1 x x = 300 local namespace contains: ‘x’ 1 ‘y’ 2

object. refcnt 1 y x = 300 y = x local namespace contains: del x ‘y’

object. refcnt 0 No references, so deallocate it! x = 300 y = x del x del y

Automatic reference counting CPython source, C extensions littered with Py_XINCREF,
Py_XDECREF calls. Py_XDECREF calls deallocate if refcnt == 0. What’s with the PyGC_HEAD for mutable objects then?

x = [1, 2, 3] x.append(x) reference cycle! reference count
never goes to 0, list would never be deallocated. del x generated using objgraph

So, CPy has a cyclic garbage collector too. Detects and
breaks reference cycles. Is generational. Is stop-the-world. Runs automatically, but can be manually run as well:  gc.collect(). Only mutable objects can participate in reference cycles,   so only tracks them —> PyGC_HEAD.

µPy Memory Management Does not use reference counting, so objects
do not need the reference count ﬁeld. Heap divided into “blocks”:  > unit of allocation,  > 32 bytes.  state of each block tracked   in a bitmap, also on heap:  > 2 bits per block.  > tracks “free” / “in-use” bitmap allocator allocation bitmap blocks for application’s use

local namespace contains: ‘x’ x x = myFoo() y y
= x ‘y’

local namespace contains: x = myFoo() y y = x
del x ‘y’

…so when is the block deallocated, i.e. its bitmap bits
set to 0 again? x = myFoo() y = x del x del y

Garbage collection. Mark-sweep collector. Is not generational.  Is stop-the-world. Runs
automatically (on the Unix port), but   can be disabled or run manually as well. Manages all heap-allocated objects.

What’s with all of CPy’s   “memory management” overhead? ✓

a note (or two)… µPy’s “operational” overhead / overhead to
run the program  is lower too:    CPy interpreter: Python stack frames live on the heap. µPy interpreter: Python stack frames live on the C stack!    µPy has optimizations for compiler/ compile-stage   memory use. …Go to the source for more goodness!

evaluation

Trade-offs? Why they make the design decisions they do?

@kavya719 speakerdeck.com/kavya719/the-memory-chronicles CPy versus µPy:  > object implementations, and >
memory management impact of these differences.

interpreters PSS: Proportional Set Size memory allocated to process and
resident in RAM, with special accounting for shared memory.

CPy optimizations -5 <= integers <= 257 are shared i.e.
single object per integer. In a preallocated array, allocated at interpreter start-up. > x = 250 > y = 250 > x is y True …but that’s 262 * 28 bytes = ~7KB of RAM!

CPy optimizations Strings that look like Python identiﬁers are interned.
{ A-z, a-z, 0-9, _ } shared via interned dict At compile time. > a = “python” > b = “python” > a is b True … µPy interns identiﬁers and small strings too.

source code MicroPython: https://github.com/micropython/ https://micropython.org CPython source code (Github mirror):
https://github.com/python/cpython

µPy FAQ Architectures? x86, x86-64, ARM, ARM Thumb, Xtensa. Versus
PyPy, etc.? Versus Go, Rust, Lua, JavaScript etc.? Multithreading?  Via the "_thread" module, with an optional global-interpreter-lock   (still work in progress, only available on selected ports). Async? Unicode?

µPy FAQ Versus Arduino, Raspberry Pi or Tessel? ▪ PyBoard
in between Arduino and Raspberry Pi ◦ More approachable than Arduino ◦ Not a full OS like Raspberry Pi ▪ Tessel similar to Micro Python but runs Javascript sulphide-glacier:py $ ls -l | grep "obj" | grep "\.h" obj.h objarray.h objexcept.h objfun.h objgenerator.h objint.h objlist.h objmodule.h objstr.h objstringio.h objtuple.h objtype.h

CPy memory allocation CPython uses an object allocator, and several
object- speciﬁc allocators. _____ ______ ______ ________ [ int ] [ dict ] [ list ] ... [ string ] | | | | +3 | Object-specific memory | | _______________________________ | | [ Python's object allocator ] | | | | | | +2 | Object memory | | | ______________________________________________________________ | [ ] ] +1 | | | __________________________________________________________________ [ Underlying general-purpose allocator, e.g. C library malloc ] 0 | <------ Virtual memory allocated for the python process -------> | ========================================================================= Operating System

CPython has an “object allocator”, on top of a general-purpose
allocator, like malloc. arena (256 KB) pool pool pool block block block pool (4KB) ﬁxed-size blocks

x = 300 28 bytes Existing pools with free blocks
of the desired size? size class usedpools 8 bytes 16 bytes 24 bytes 32 bytes pool free blk

Else, ﬁnd an available pool + carve out a block.
partially allocated arenas free pool arena What if arenas full? mmap() an arena create a pool of class size blocks return a block

The Memory Chronicles

The Memory Chronicles

More Decks by kavya

Other Decks in Programming

Featured

Transcript