$30 off During Our Annual Pro Sale. View Details »

The Memory Chronicles

kavya
May 20, 2017

The Memory Chronicles

MicroPython is the leanest, meanest full Python implementation. Designed for microcontrollers, this variant of Python runs in less than 300KB of memory, and retains support for all your favorite Python features.

So what does it take to make the smallest Python? Put differently, why does CPython have a large memory footprint?

This talk will explore the internals of MicroPython and contrast it with CPython, focusing on the aspects that relate to memory use. We will delve into the Python object models in each and the machinery for managing them. We will touch upon how the designs of the bytecode compiler and interpreter of each differ and why that matters.

kavya

May 20, 2017
Tweet

More Decks by kavya

Other Decks in Programming

Transcript

  1. The Memory Chronicles:
    A Tale of Two Pythons
    kavya
    @kavya719

    View Slide

  2. View Slide

  3. The standard and default
    implementation.

    Oft-heard memory
    problems:
    Uses more memory than 

    desirable
    Ever increasing memory
    use
    High-water-mark usage
    Heap fragmentation.
    CPython

    View Slide

  4. MicroPython
    For microcontrollers — 

    pyboard, ESP8266, micro:bit,

    others.

    256KiB ROM, 16KiB RAM.

    Implements the Python3 spec:
    complete syntax up to Python 3.4
    but not complete functionality; 

    leaves out what’s unsuitable for 

    microcontrollers.
    Supports subset of stdlib.

    View Slide

  5. Are bytecode interpreters
    .py source compiled to bytecode that the interpreters
    evaluate.
    And stack-based virtual machines
    They use a value stack to manipulate objects for
    evaluation.

    Written in C
    CPy and µPy
    …yet, on opposite ends of the
    memory use spectrum

    View Slide

  6. Python 3.6
    MicroPython 1.8
    on a:
    64bit Ubuntu 16.10 (Linux version 4.8.0) box,
    4GB of RAM.
    What I’m running

    View Slide

  7. methodology
    Create 200000 new objects of desired type and

    prevent them from being deallocated — 

    ints, strings, lists.

    Measure from within the Python process —
    CPy: sys, memory_profiler, pympler.
    µPy: micropython, gc modules.

    Heap use measured.

    View Slide

  8. heap
    stack
    high addr
    low addr
    Process memory
    a prelude first
    CPy and µPy both use custom
    allocators to manage the heap.
    CPy’s allocator grows the heap
    on demand.
    µPy has a fixed heap.

    View Slide

  9. integers
    CPy µPy
    200000 integers with value from (10 ** 10) to (10 ** 10) + 200000.
    x-axis: number of integers.

    y-axis: memory use in bytes.

    View Slide

  10. strings
    CPy µPy
    200000 small strings of length <= 10, containing special characters.

    x-axis: number of strings.

    y-axis: memory use in bytes.

    View Slide

  11. lists
    CPy µPy
    x-axis: number of elements in list.

    y-axis: memory use in bytes.

    View Slide

  12. how objects are implemented
    memory management
    evaluation

    View Slide

  13. how objects
    are implemented

    View Slide

  14. Does CPy allocate larger objects?

    Does it allocate more objects?

    View Slide

  15. CPy Objects
    All objects are allocated on the heap.
    x = 1
    heap
    stack
    high addr
    low addr
    1
    CPy interpreter memory
    x
    in
    global or local
    namespace

    View Slide

  16. All objects have an initial segment:
    PyObject_HEAD
    refcnt
    *type
    Reference count
    Pointer to type
    refcnt
    *type
    object-specific fields
    ]
    ]
    refcnt
    *type
    Py_ListType
    a list object
    refcnt
    *type Py_LongType
    an integer object

    View Slide

  17. overhead
    {
    size_t refcnt;
    typeobject *type;
    }
    8 bytes
    8 bytes
    16 bytes
    }
    PyObject_HEAD
    lower bound
    on
    sizeof CPy objects

    View Slide

  18. x = 1
    PyLongObject
    PyObject_HEAD refcnt
    *type
    size
    array
    Py_LongType
    array for value
    ]
    length of array
    CPy integers

    View Slide

  19. x = 1
    PyLongObject {
    PyObject_HEAD
    size_t size;
    uint32_t digit[1];
    }
    16 bytes
    4 bytes
    }28 bytes!
    4 bytes
    > import sys
    > x = 1
    > sys.getsizeof(x)
    28

    View Slide

  20. > sys.int_info
    sys.int_info(bits_per_digit=30, sizeof_digit=4)
    > x = 10 ** 100
    > sys.getsizeof(x)
    32
    {
    ...
    uint32_t digit[2];
    }
    PyLongObject
    24 bytes
    8 bytes
    }32 bytes
    x = 10 ** 10

    View Slide

  21. CPy
    200000 integers * 32 bytes
    = ~ 6400000 bytes, or ~6 Mib

    View Slide

  22. CPy
    200000 integers * 32 bytes
    = ~ 6400000 bytes, or ~6 Mib
    µPy
    ?

    View Slide

  23. µPy Objects
    A µPy “object” is a machine word.
    8 bytes
    ?!

    View Slide

  24. µPy Objects
    A µPy “object” is a machine word.
    pointer tagging
    store a “tag” in the unused bits of a pointer
    8 bytes

    View Slide

  25. pointer tagging
    • addresses are aligned to word-size i.e. 

    pointers will be multiples of 8.

    • So, they can be:

    8 : 0b1000

    16 : 0b10000 

    24 : 0b11000
    }
    lower 3 bits will be 000;
    store a “tag” in it!
    remaining 61 bits used for value
    0 <= tag <= 7
    means to add extra info
    to the pointer
    xxxxxxxx | xxxxxxxx | xxxxxxxx | xxxxxxxx | xxxxxxxx | xxxxxxxx | xxxxxxxx | xxxxxTTT
    1 byte

    View Slide

  26. pointer tagging in µPy
    bit0
    == 1: not a pointer at all but a small integer
    xxxxxxxx | … | … | xxxxxxx1
    bits 1+ are the integer
    last two bits == 10: index into interned strings pool
    xxxxxxxx | … | … | xxxxxx10
    bits 2+ are the index
    xxxxxxxx | … | … | xxxxxx00
    last two bits == 00: pointer to a concrete object
    bits 2+ are the pointer
    ||
    everything except
    small integers
    interned strings.

    View Slide

  27. This is neat.
    Means small integers take 8 bytes, and are stored on the
    stack.
    x = 1
    Is it a small int?
    Does its value fit in (64 - 1) bits?
    i.e. is it less than 263 - 1
    xxxxxxxx | … | … | xxxxxxx1
    8 bytes
    versus CPy 28 bytes

    View Slide

  28. µPy integers
    x = 10 ** 10
    8 bytes
    Still a small int
    versus CPy 32 bytes
    !
    stored on stack,
    not heap

    View Slide

  29. strings
    CPy µPy
    200000 small strings of length <= 10, containing special characters.

    x-axis: number of strings.

    y-axis: memory use in bytes.

    View Slide

  30. > import sys
    > x = “a”
    > sys.getsizeof(x)
    50
    CPy strings
    refcnt
    *type
    ...
    *wstr
    Py_UnicodeType
    null-terminated
    repr.
    PyObject_HEAD
    ]
    PyASCIIObject

    View Slide

  31. > import sys
    > x = “a”
    > sys.getsizeof(x)
    50
    PyASCIIObject
    sizeof + sizeof value
    48 bytes
    len(“a” + “\0”) * sizeof ASCII char
    2 bytes
    CPy strings

    View Slide

  32. > import sys
    > x = “a”
    > sys.getsizeof(x)
    50
    “gotmemory?”
    59 bytes
    PyASCIIObject
    sizeof + sizeof value
    48 bytes 10 + 1 (“\0”) bytes
    CPy strings

    View Slide

  33. µPy strings
    1 (“a”) + 1 (“\0”)
    x = “a”
    hash
    (2 bytes)
    length
    (1 byte)
    data
    (“length” number of bytes;
    null-terminated)
    Stores them as arrays:
    µPy has a special scheme for small strings.
    length <= 10
    special chars okay!
    versus CPy 50 bytes
    So: 5 bytes !

    View Slide

  34. CPy mutable objects
    Have an additional overhead for memory management:
    lists, dictionaries, classes, instances etc.
    24 bytes
    }
    PyObject_HEAD
    ]
    ]
    PyGC_HEAD
    object-specific fields
    }
    ]
    16 bytes
    40 bytes
    total overhead

    View Slide

  35. CPy lists
    > sys.getsizeof([1])
    72
    16 bytes
    8 bytes Array of pointers
    to items
    refcnt
    *type
    ...
    *item_ptrs
    Py_ListType
    GC_HEAD
    PyListObject
    40 bytes
    ]

    View Slide

  36. > sys.getsizeof([1])
    72
    PyListObject
    sizeof + sizeof value
    64 bytes
    1 * sizeof pointer
    8 bytes
    does not account for item itself!
    + sizeof items
    sys.getsizeof(1)
    list is really 100 bytes
    CPy lists

    View Slide

  37. number of appends
    sizeof list
    (bytes)

    View Slide

  38. > x = [1]
    > y = []
    > y.append(1)
    > sys.getsizeof(y)
    96 !!
    72 (+ sizeof(1))
    on append():
    array is dynamically resized
    as needed.
    resizing over allocates
    ...
    *item_ptrs
    PyListObject
    CPy lists

    View Slide

  39. µPy concrete objects
    mp_obj_base_t
    *type
    [
    typeobject pointer too
    but no reference count.
    Have an initial segment:
    versus CPy 16 bytes.
    refcnt
    *type
    so, overhead then? 8 bytes.
    => everything except
    small integers, special strings.
    PyObject_
    HEAD
    [
    object-specific fields

    View Slide

  40. µPy mutable objects
    No additional overhead.
    versus CPy for garbage collection 24 bytes
    (PyGC_HEAD)

    View Slide

  41. µPy lists
    Same structure as CPy lists minus memory management
    overhead:
    > reference count 8 bytes
    > additional “PyGC_HEAD” overhead 24 bytes
    y = [1]
    CPy (without incl. sizeof(1)) 72 bytes
    µPy (without incl. sizeof(1)) 40 bytes
    32 bytes

    View Slide

  42. Does CPy allocate larger objects?

    Does it allocate more objects?

    Generally speaking, yes.

    Generally speaking, yes.

    View Slide

  43. No additional overhead
    refcnt
    Mutable objects:
    CPy’s PyObject_HEAD MicroPy’s mp_obj_base_t
    GC_HEAD
    24 bytes
    8 bytes

    View Slide

  44. What’s with all of CPy’s 

    “memory management” overhead?

    View Slide

  45. memory management
    overhead

    View Slide

  46. CPy Memory Management
    Reference counting
    Number of references
    to this object.
    refcnt
    y
    y = x 1
    x
    x = 300
    local namespace
    contains:
    ‘x’
    1
    ‘y’
    2

    View Slide

  47. CPy Memory Management
    Reference counting
    Number of references
    to this object.
    refcnt
    1
    y
    x = 300
    y = x local namespace
    contains:
    del x
    ‘y’

    View Slide

  48. CPy Memory Management
    Reference counting
    Number of references
    to this object.
    refcnt
    0 No references,
    so deallocate it!
    x = 300
    y = x
    del x
    del y

    View Slide

  49. Automatic reference counting
    CPython source, C extensions littered with
    Py_XINCREF, Py_XDECREF calls.
    Py_XDECREF calls deallocate if refcnt == 0.
    What’s with the PyGC_HEAD for mutable objects then?

    View Slide

  50. x = [1, 2, 3]
    x.append(x)
    reference cycle!
    reference count never goes to 0,
    list would never be deallocated.
    del x
    generated using objgraph

    View Slide

  51. So, CPy has a cyclic garbage collector too.
    Detects and breaks reference cycles.
    Is generational.
    Is stop-the-world.
    Runs automatically, but can be manually run as well:

    gc.collect().
    Only mutable objects can participate in reference cycles, 

    so only tracks them —> PyGC_HEAD.

    View Slide

  52. µPy Memory Management
    Does not use reference counting,
    so objects do not need the reference count field.
    Heap divided into “blocks”:

    > unit of allocation,

    > 32 bytes.

    state of each block tracked 

    in a bitmap, also on heap:

    > 2 bits per block.

    > tracks “free” / “in-use”
    bitmap allocator allocation bitmap
    blocks for application’s use

    View Slide

  53. local namespace
    contains:
    ‘x’
    x
    x = myFoo()
    y
    y = x
    ‘y’

    View Slide

  54. local namespace
    contains:
    x = myFoo()
    y
    y = x
    del x
    ‘y’

    View Slide

  55. …so when is the block deallocated, i.e.
    its bitmap bits set to 0 again?
    x = myFoo()
    y = x
    del x
    del y

    View Slide

  56. Garbage collection.
    Mark-sweep collector.
    Is not generational.

    Is stop-the-world.
    Runs automatically (on the Unix port), but 

    can be disabled or run manually as well.
    Manages all heap-allocated objects.

    View Slide

  57. What’s with all of CPy’s 

    “memory management” overhead?

    View Slide

  58. a note (or two)…
    µPy’s “operational” overhead / overhead to run the program

    is lower too:


    CPy interpreter: Python stack frames live on the heap.
    µPy interpreter: Python stack frames live on the C stack!


    µPy has optimizations for compiler/ compile-stage 

    memory use.
    …Go to the source for more goodness!

    View Slide

  59. evaluation

    View Slide

  60. Trade-offs?
    Why they make the design
    decisions they do?

    View Slide

  61. @kavya719
    speakerdeck.com/kavya719/the-memory-chronicles
    CPy versus µPy:

    > object implementations, and
    > memory management
    impact of these differences.

    View Slide

  62. interpreters
    PSS: Proportional Set Size
    memory allocated to process and resident in RAM,
    with special accounting for shared memory.

    View Slide

  63. CPy optimizations
    -5 <= integers <= 257 are shared
    i.e. single object per integer.
    In a preallocated array, allocated at interpreter start-up.
    > x = 250
    > y = 250
    > x is y
    True
    …but that’s 262 * 28 bytes = ~7KB of RAM!

    View Slide

  64. CPy optimizations
    Strings that look like Python identifiers are interned.
    { A-z, a-z, 0-9, _ } shared
    via
    interned dict
    At compile time.
    > a = “python”
    > b = “python”
    > a is b
    True
    … µPy interns identifiers and small strings too.

    View Slide

  65. source code
    MicroPython:
    https://github.com/micropython/
    https://micropython.org
    CPython source code (Github mirror):
    https://github.com/python/cpython

    View Slide

  66. µPy FAQ
    Architectures?
    x86, x86-64, ARM, ARM Thumb, Xtensa.
    Versus PyPy, etc.?
    Versus Go, Rust, Lua, JavaScript etc.?
    Multithreading?

    Via the "_thread" module, with an optional global-interpreter-lock 

    (still work in progress, only available on selected ports).
    Async?
    Unicode?

    View Slide

  67. µPy FAQ
    Versus Arduino, Raspberry Pi or Tessel?
    ▪ PyBoard in between Arduino and Raspberry Pi
    ◦ More approachable than Arduino
    ◦ Not a full OS like Raspberry Pi
    ▪ Tessel similar to Micro Python but runs Javascript
    sulphide-glacier:py $ ls -l | grep "obj" | grep "\.h"
    obj.h
    objarray.h
    objexcept.h
    objfun.h
    objgenerator.h
    objint.h
    objlist.h
    objmodule.h
    objstr.h
    objstringio.h
    objtuple.h
    objtype.h

    View Slide

  68. CPy memory allocation
    CPython uses an object allocator, and several object-
    specific allocators.
    _____ ______ ______ ________
    [ int ] [ dict ] [ list ] ... [ string ] |
    | | |
    +3 | Object-specific memory | |
    _______________________________ | |
    [ Python's object allocator ] | |
    | | | |
    +2 | Object memory | | |
    ______________________________________________________________ |
    [ ] ]
    +1 | | |
    __________________________________________________________________
    [ Underlying general-purpose allocator, e.g. C library malloc ]
    0 | <------ Virtual memory allocated for the python process -------> |
    =========================================================================
    Operating System

    View Slide

  69. CPython has an “object allocator”,
    on top of a general-purpose allocator, like malloc.
    arena (256 KB)
    pool pool pool
    block
    block
    block
    pool (4KB)
    fixed-size blocks

    View Slide

  70. x = 300 28 bytes
    Existing pools with free blocks of the desired size?
    size class
    usedpools
    8 bytes
    16 bytes
    24 bytes
    32 bytes pool
    free blk

    View Slide

  71. Else, find an available pool + carve out a block.
    partially
    allocated
    arenas
    free pool
    arena
    What if arenas full?
    mmap() an arena
    create a pool of class size blocks
    return a block

    View Slide