Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Memory Chronicles

kavya
May 20, 2017

The Memory Chronicles

MicroPython is the leanest, meanest full Python implementation. Designed for microcontrollers, this variant of Python runs in less than 300KB of memory, and retains support for all your favorite Python features.

So what does it take to make the smallest Python? Put differently, why does CPython have a large memory footprint?

This talk will explore the internals of MicroPython and contrast it with CPython, focusing on the aspects that relate to memory use. We will delve into the Python object models in each and the machinery for managing them. We will touch upon how the designs of the bytecode compiler and interpreter of each differ and why that matters.

kavya

May 20, 2017
Tweet

More Decks by kavya

Other Decks in Programming

Transcript

  1. The standard and default implementation.
 Oft-heard memory problems: Uses more

    memory than 
 desirable Ever increasing memory use High-water-mark usage Heap fragmentation. CPython
  2. MicroPython For microcontrollers — 
 pyboard, ESP8266, micro:bit,
 others.
 256KiB

    ROM, 16KiB RAM.
 Implements the Python3 spec: complete syntax up to Python 3.4 but not complete functionality; 
 leaves out what’s unsuitable for 
 microcontrollers. Supports subset of stdlib.
  3. Are bytecode interpreters .py source compiled to bytecode that the

    interpreters evaluate. And stack-based virtual machines They use a value stack to manipulate objects for evaluation.
 Written in C CPy and µPy …yet, on opposite ends of the memory use spectrum
  4. Python 3.6 MicroPython 1.8 on a: 64bit Ubuntu 16.10 (Linux

    version 4.8.0) box, 4GB of RAM. What I’m running
  5. methodology Create 200000 new objects of desired type and
 prevent

    them from being deallocated — 
 ints, strings, lists.
 Measure from within the Python process — CPy: sys, memory_profiler, pympler. µPy: micropython, gc modules.
 Heap use measured.
  6. heap stack high addr low addr Process memory a prelude

    first CPy and µPy both use custom allocators to manage the heap. CPy’s allocator grows the heap on demand. µPy has a fixed heap.
  7. integers CPy µPy 200000 integers with value from (10 **

    10) to (10 ** 10) + 200000. x-axis: number of integers.
 y-axis: memory use in bytes.
  8. strings CPy µPy 200000 small strings of length <= 10,

    containing special characters.
 x-axis: number of strings.
 y-axis: memory use in bytes.
  9. CPy Objects All objects are allocated on the heap. x

    = 1 heap stack high addr low addr 1 CPy interpreter memory x in global or local namespace
  10. All objects have an initial segment: PyObject_HEAD refcnt *type Reference

    count Pointer to type refcnt *type object-specific fields ] ] refcnt *type Py_ListType a list object refcnt *type Py_LongType an integer object
  11. overhead { size_t refcnt; typeobject *type; } 8 bytes 8

    bytes 16 bytes } PyObject_HEAD lower bound on sizeof CPy objects
  12. x = 1 PyLongObject PyObject_HEAD refcnt *type size array Py_LongType

    array for value ] length of array CPy integers
  13. x = 1 PyLongObject { PyObject_HEAD size_t size; uint32_t digit[1];

    } 16 bytes 4 bytes }28 bytes! 4 bytes > import sys > x = 1 > sys.getsizeof(x) 28
  14. > sys.int_info sys.int_info(bits_per_digit=30, sizeof_digit=4) > x = 10 ** 100

    > sys.getsizeof(x) 32 { ... uint32_t digit[2]; } PyLongObject 24 bytes 8 bytes }32 bytes x = 10 ** 10
  15. µPy Objects A µPy “object” is a machine word. pointer

    tagging store a “tag” in the unused bits of a pointer 8 bytes
  16. pointer tagging • addresses are aligned to word-size i.e. 


    pointers will be multiples of 8.
 • So, they can be:
 8 : 0b1000
 16 : 0b10000 
 24 : 0b11000 } lower 3 bits will be 000; store a “tag” in it! remaining 61 bits used for value 0 <= tag <= 7 means to add extra info to the pointer xxxxxxxx | xxxxxxxx | xxxxxxxx | xxxxxxxx | xxxxxxxx | xxxxxxxx | xxxxxxxx | xxxxxTTT 1 byte
  17. pointer tagging in µPy bit0 == 1: not a pointer

    at all but a small integer xxxxxxxx | … | … | xxxxxxx1 bits 1+ are the integer last two bits == 10: index into interned strings pool xxxxxxxx | … | … | xxxxxx10 bits 2+ are the index xxxxxxxx | … | … | xxxxxx00 last two bits == 00: pointer to a concrete object bits 2+ are the pointer || everything except small integers interned strings.
  18. This is neat. Means small integers take 8 bytes, and

    are stored on the stack. x = 1 Is it a small int? Does its value fit in (64 - 1) bits? i.e. is it less than 263 - 1 xxxxxxxx | … | … | xxxxxxx1 8 bytes versus CPy 28 bytes
  19. µPy integers x = 10 ** 10 8 bytes Still

    a small int versus CPy 32 bytes ! stored on stack, not heap
  20. strings CPy µPy 200000 small strings of length <= 10,

    containing special characters.
 x-axis: number of strings.
 y-axis: memory use in bytes.
  21. > import sys > x = “a” > sys.getsizeof(x) 50

    CPy strings refcnt *type ... *wstr Py_UnicodeType null-terminated repr. PyObject_HEAD ] PyASCIIObject
  22. > import sys > x = “a” > sys.getsizeof(x) 50

    PyASCIIObject sizeof + sizeof value 48 bytes len(“a” + “\0”) * sizeof ASCII char 2 bytes CPy strings
  23. > import sys > x = “a” > sys.getsizeof(x) 50

    “gotmemory?” 59 bytes PyASCIIObject sizeof + sizeof value 48 bytes 10 + 1 (“\0”) bytes CPy strings
  24. µPy strings 1 (“a”) + 1 (“\0”) x = “a”

    hash (2 bytes) length (1 byte) data (“length” number of bytes; null-terminated) Stores them as arrays: µPy has a special scheme for small strings. length <= 10 special chars okay! versus CPy 50 bytes So: 5 bytes !
  25. CPy mutable objects Have an additional overhead for memory management:

    lists, dictionaries, classes, instances etc. 24 bytes } PyObject_HEAD ] ] PyGC_HEAD object-specific fields } ] 16 bytes 40 bytes total overhead
  26. CPy lists > sys.getsizeof([1]) 72 16 bytes 8 bytes Array

    of pointers to items refcnt *type ... *item_ptrs Py_ListType GC_HEAD PyListObject 40 bytes ]
  27. > sys.getsizeof([1]) 72 PyListObject sizeof + sizeof value 64 bytes

    1 * sizeof pointer 8 bytes does not account for item itself! + sizeof items sys.getsizeof(1) list is really 100 bytes CPy lists
  28. > x = [1] > y = [] > y.append(1)

    > sys.getsizeof(y) 96 !! 72 (+ sizeof(1)) on append(): array is dynamically resized as needed. resizing over allocates ... *item_ptrs PyListObject CPy lists
  29. µPy concrete objects mp_obj_base_t *type [ typeobject pointer too but

    no reference count. Have an initial segment: versus CPy 16 bytes. refcnt *type so, overhead then? 8 bytes. => everything except small integers, special strings. PyObject_ HEAD [ object-specific fields
  30. µPy lists Same structure as CPy lists minus memory management

    overhead: > reference count 8 bytes > additional “PyGC_HEAD” overhead 24 bytes y = [1] CPy (without incl. sizeof(1)) 72 bytes µPy (without incl. sizeof(1)) 40 bytes 32 bytes
  31. Does CPy allocate larger objects? 
 Does it allocate more

    objects? ✓ Generally speaking, yes. ✓ Generally speaking, yes.
  32. CPy Memory Management Reference counting Number of references to this

    object. refcnt y y = x 1 x x = 300 local namespace contains: ‘x’ 1 ‘y’ 2
  33. CPy Memory Management Reference counting Number of references to this

    object. refcnt 1 y x = 300 y = x local namespace contains: del x ‘y’
  34. CPy Memory Management Reference counting Number of references to this

    object. refcnt 0 No references, so deallocate it! x = 300 y = x del x del y
  35. Automatic reference counting CPython source, C extensions littered with Py_XINCREF,

    Py_XDECREF calls. Py_XDECREF calls deallocate if refcnt == 0. What’s with the PyGC_HEAD for mutable objects then?
  36. x = [1, 2, 3] x.append(x) reference cycle! reference count

    never goes to 0, list would never be deallocated. del x generated using objgraph
  37. So, CPy has a cyclic garbage collector too. Detects and

    breaks reference cycles. Is generational. Is stop-the-world. Runs automatically, but can be manually run as well:
 gc.collect(). Only mutable objects can participate in reference cycles, 
 so only tracks them —> PyGC_HEAD.
  38. µPy Memory Management Does not use reference counting, so objects

    do not need the reference count field. Heap divided into “blocks”:
 > unit of allocation,
 > 32 bytes.
 state of each block tracked 
 in a bitmap, also on heap:
 > 2 bits per block.
 > tracks “free” / “in-use” bitmap allocator allocation bitmap blocks for application’s use
  39. …so when is the block deallocated, i.e. its bitmap bits

    set to 0 again? x = myFoo() y = x del x del y
  40. Garbage collection. Mark-sweep collector. Is not generational.
 Is stop-the-world. Runs

    automatically (on the Unix port), but 
 can be disabled or run manually as well. Manages all heap-allocated objects.
  41. a note (or two)… µPy’s “operational” overhead / overhead to

    run the program
 is lower too:
 
 CPy interpreter: Python stack frames live on the heap. µPy interpreter: Python stack frames live on the C stack!
 
 µPy has optimizations for compiler/ compile-stage 
 memory use. …Go to the source for more goodness!
  42. interpreters PSS: Proportional Set Size memory allocated to process and

    resident in RAM, with special accounting for shared memory.
  43. CPy optimizations -5 <= integers <= 257 are shared i.e.

    single object per integer. In a preallocated array, allocated at interpreter start-up. > x = 250 > y = 250 > x is y True …but that’s 262 * 28 bytes = ~7KB of RAM!
  44. CPy optimizations Strings that look like Python identifiers are interned.

    { A-z, a-z, 0-9, _ } shared via interned dict At compile time. > a = “python” > b = “python” > a is b True … µPy interns identifiers and small strings too.
  45. µPy FAQ Architectures? x86, x86-64, ARM, ARM Thumb, Xtensa. Versus

    PyPy, etc.? Versus Go, Rust, Lua, JavaScript etc.? Multithreading?
 Via the "_thread" module, with an optional global-interpreter-lock 
 (still work in progress, only available on selected ports). Async? Unicode?
  46. µPy FAQ Versus Arduino, Raspberry Pi or Tessel? ▪ PyBoard

    in between Arduino and Raspberry Pi ◦ More approachable than Arduino ◦ Not a full OS like Raspberry Pi ▪ Tessel similar to Micro Python but runs Javascript sulphide-glacier:py $ ls -l | grep "obj" | grep "\.h" obj.h objarray.h objexcept.h objfun.h objgenerator.h objint.h objlist.h objmodule.h objstr.h objstringio.h objtuple.h objtype.h
  47. CPy memory allocation CPython uses an object allocator, and several

    object- specific allocators. _____ ______ ______ ________ [ int ] [ dict ] [ list ] ... [ string ] | | | | +3 | Object-specific memory | | _______________________________ | | [ Python's object allocator ] | | | | | | +2 | Object memory | | | ______________________________________________________________ | [ ] ] +1 | | | __________________________________________________________________ [ Underlying general-purpose allocator, e.g. C library malloc ] 0 | <------ Virtual memory allocated for the python process -------> | ========================================================================= Operating System
  48. CPython has an “object allocator”, on top of a general-purpose

    allocator, like malloc. arena (256 KB) pool pool pool block block block pool (4KB) fixed-size blocks
  49. x = 300 28 bytes Existing pools with free blocks

    of the desired size? size class usedpools 8 bytes 16 bytes 24 bytes 32 bytes pool free blk
  50. Else, find an available pool + carve out a block.

    partially allocated arenas free pool arena What if arenas full? mmap() an arena create a pool of class size blocks return a block