Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Nina Zakharenko - The Basics of Memory Management in Python - North Bay Python 2018

Nina Zakharenko - The Basics of Memory Management in Python - North Bay Python 2018

https://2018.northbaypython.org/schedule/presentation/19/

As a new python developer, do you find memory management in Python confusing? Come to this talk to learn about the basics of how Memory Management works in Python. We'll cover the concepts of reference counting, garbage collection, weak references, __slots__, and the Global Interpreter Lock.

The documentation immediately jumps into difficult to follow concepts, especially if you don't have a background in Computer Science.

I'll provide a simple, easy to follow overview of the concepts that a developer needs to be familiar with in order to scratch the surface of how memory management and garbage collection works in Python.

Nina Zakharenko

November 04, 2018
Tweet

More Decks by Nina Zakharenko

Other Decks in Technology

Transcript

  1. Memory Management in
    Python
    Nina Zakharenko - @nnja
    slides: bit.ly/nbpy-memory

    View full-size slide

  2. Livetweet!
    use #nbpy
    @nnja

    View full-size slide

  3. Why should you care?
    Knowing about memory management
    helps you write more efficient code.
    @nnja

    View full-size slide

  4. What will you learn?
    4 Vocabulary
    4 Basic Concepts
    4 Foundation
    @nnja

    View full-size slide

  5. What won't you learn?
    You won’t be an expert at the end of
    this talk.
    @nnja

    View full-size slide

  6. What's a variable?
    @nnja

    View full-size slide

  7. What's a c-style variable?

    View full-size slide

  8. Change value of c-style variables

    View full-size slide

  9. Python has
    names
    not
    variables
    @nnja

    View full-size slide

  10. How are Python objects
    stored in memory?
    names ➡ references ➡ objects
    @nnja

    View full-size slide

  11. A name is just a label for an
    object.
    In Python, each object can have lots of
    names.
    Like 'x', 'y'
    @nnja

    View full-size slide

  12. Different Types of Objects
    Simple Container
    numbers dict
    strings list
    user defined classes
    Container objects can contain simple objects, or other
    container objects.
    @nnja

    View full-size slide

  13. What's a Reference?
    A name or a container object that
    points at another object.
    @nnja

    View full-size slide

  14. Reference Count
    @nnja

    View full-size slide

  15. ‐ Increasing the ref count
    @nnja

    View full-size slide

  16. ‑ Decreasing the ref count
    @nnja

    View full-size slide

  17. Decrease Ref Count: Change the Reference

    View full-size slide

  18. Decrease Ref Count: del keyword

    View full-size slide

  19. What does del do?
    The del statement doesn't delete objects.
    It:
    4 removes that name as a reference to that object
    4 reduces the ref count by 1
    @nnja

    View full-size slide

  20. Decrease Ref Count: Go out of Scope

    View full-size slide

  21. !
    When there are no more references,
    the object can be safely removed from
    memory.
    @nnja

    View full-size slide

  22. local vs. global namespace
    If refcounts decrease when an object goes out of scope,
    what happens to objects in the global namespace?
    4 Never goes out of scope!
    4 Refcount never reaches 0.
    4 Avoid putting large or complex objects in the global
    namespace.
    @nnja

    View full-size slide

  23. Every Python object holds 3 things
    4 Its type
    4 A reference count
    4 Its value
    @nnja

    View full-size slide

  24. >>> def mem_test():
    ... x = 300
    ... y = 300
    ... print( id(x) )
    ... print( id(y) )
    ... print( x is y )
    >>> mem_test()
    4504654160
    4504654160
    True
    ℹ note: run this from a function in the repl, or from a file
    @nnja

    View full-size slide

  25. Garbage
    Collection
    @nnja

    View full-size slide

  26. What is Garbage
    Collection?
    A way for a program to automatically
    release memory when the object taking
    up that space is no longer in use.
    @nnja

    View full-size slide

  27. Two Main Types of Garbage
    Collection
    1. Reference Counting
    2. Tracing
    @nnja

    View full-size slide

  28. How does reference counting garbage
    collection work?
    1. Add and remove references
    2. When the refcount reaches 0, remove the object
    3. Cascading effect
    4 decrease ref count of any object the deleted
    object was pointing to
    @nnja

    View full-size slide

  29. Reference Counting Garbage Collection:
    The Good
    4 Easy to implement
    4 When refcount is 0, objects are immediately
    deleted.
    @nnja

    View full-size slide

  30. Reference Counting:
    The Bad
    4 space overhead
    4 reference count is stored for every object
    4 execution overhead
    4 reference count changed on every assignment
    @nnja

    View full-size slide

  31. Reference Counting:
    The Ugly
    Not generally thread safe!
    Reference counting doesn't detect
    cyclical references
    @nnja

    View full-size slide

  32. Cyclical References By Example
    @nnja

    View full-size slide

  33. What's a cyclical reference?
    @nnja

    View full-size slide

  34. Cyclical Reference
    @nnja

    View full-size slide

  35. Reference counting alone
    will not garbage collect
    objects with cyclical
    references.
    @nnja

    View full-size slide

  36. Two Main Types of Garbage
    Collection
    1. Reference Counting
    2. Tracing
    @nnja

    View full-size slide

  37. Tracing Garbage Collection - Marking

    View full-size slide

  38. Tracing Garbage Collection - Sweeping

    View full-size slide

  39. What does Python use?
    Reference Counting &
    Generational
    (A type of Tracing GC)
    @nnja

    View full-size slide

  40. Generational Garbage
    Collection is based on the
    theory that most objects
    die young.
    @nnja

    View full-size slide

  41. Python maintains a list of every object
    created as a program is run.
    Actually, it makes 3:
    - generation 0
    - generation 1
    - generation 2
    Newly created objects are stored in generation 0.
    @nnja

    View full-size slide

  42. Only container objects
    with a refcount greater
    than 0 will be stored in a
    generation list.
    @nnja

    View full-size slide

  43. When the number of objects in a
    generation reaches a threshold, python
    runs a garbage collection algorithm on
    that generation, and any generations
    younger than it.
    @nnja

    View full-size slide

  44. What happens during a generational garbage
    collection cycle?
    1. Python makes a list for objects to discard.
    2. It runs an algorithm to detect reference cycles.
    3. If an object has no outside references, add it to the discard
    list.
    4. When the cycle is done, free up the objects on the discard
    list.
    @nnja

    View full-size slide

  45. After a garbage collection cycle, objects
    that survived will be promoted to the
    next generation.
    Objects in the last generation (2) stay
    there as the program executes.
    @nnja

    View full-size slide

  46. When the ref count reaches 0, you get
    immediate clean up.
    If you have a cycle, you need to wait
    for garbage collection to run.
    @nnja

    View full-size slide

  47. Objects with cyclical references get
    cleaned up by generational garbage
    collection.
    @nnja

    View full-size slide


  48. Why doesn’t a Python
    program shrink in memory
    after garbage collection?
    @nnja

    View full-size slide

  49. After garbage collection, the size of
    the python program likely won’t
    shrink.
    4 The freed memory is fragmented.
    4 i.e. it's not freed in one continuous block.
    4 When we say memory is freed during garbage collection, it’s released
    back to Python to use for other objects, not necessarily to the system.
    @nnja

    View full-size slide

  50. Quick Optimizations
    @nnja

    View full-size slide

  51. __slots__
    @nnja

    View full-size slide

  52. Python instances have a dict of values
    class Dog(object):
    pass
    buddy = Dog()
    buddy.name = 'Buddy'
    print(buddy.__dict__)
    {'name': 'Buddy'}
    @nnja

    View full-size slide

  53. AttributeError
    'Hello'.name = 'Fred'
    AttributeError
    Traceback (most recent call last)
    ----> 1 'Hello'.name = 'Fred'
    AttributeError: 'str' object has no attribute 'name'
    @nnja

    View full-size slide

  54. __slots__
    class Point(object):
    __slots__ = ('x', 'y')
    point = Point()
    point.x = 5
    point.y = 7
    point.name = "Fred"
    Traceback (most recent call last):
    File "point.py", line 8, in
    point.name = "Fred"
    AttributeError: 'Point' object has no attribute 'name'
    @nnja

    View full-size slide

  55. size of dict vs. size of tuple
    import sys
    sys.getsizeof(dict())
    sys.getsizeof(tuple())
    sizeof dict: 232 bytes
    sizeof tuple: 40 bytes
    @nnja

    View full-size slide

  56. When to use slots?
    4 Creating many instances of a class
    4 Know in advance what properties the class should
    have
    Saving 9 GB of RAM with __slots__

    View full-size slide

  57. weakref
    4 A weakref to an object is not enough to keep it alive.
    4 When the only remaining references are weak
    references, the object can be garbage collected.
    4 Useful for:
    4 implementing caches or mappings holding large
    objects
    python3 weakref docs

    View full-size slide

  58. What's
    a
    GIL?
    @nnja

    View full-size slide

  59. Global
    Interpreter
    Lock
    @nnja

    View full-size slide

  60. Only one thread can run in
    the interpreter at a time.
    @nnja

    View full-size slide

  61. Advantages / Disadvantages of a GIL
    Upside:
    Reference counting is fast and easy to implement.
    Downside:
    In a Python program, no matter how many threads
    exist, only one thread will be executed at a time.

    View full-size slide

  62. Want to take advantage of multiple
    cores?
    4 Use multi-processing instead of multi-threading.
    4 Each process will have it’s own GIL, it’s on the
    developer to figure out a way to share information
    between processes.
    @nnja

    View full-size slide


  63. If the GIL limits Python,
    can’t we just remove it?
    additional reading

    View full-size slide

  64. For better or for worse, the GIL is here
    to stay!
    @nnja

    View full-size slide

  65. What Did We Learn?
    @nnja

    View full-size slide

  66. Garbage collection is pretty
    good.
    @nnja

    View full-size slide

  67. Now you know how
    memory is managed.
    @nnja

    View full-size slide

  68. Python3!
    @nnja

    View full-size slide

  69. For scientific applications,
    use numpy & pandas.
    @nnja

    View full-size slide

  70. Thank You!
    Python @ Microsoft:
    bit.ly/nbpy-microsoft
    @nnja
    *Bonus material on the next slide

    View full-size slide

  71. Bonus Material
    Section ➡
    @nnja

    View full-size slide

  72. Additional Reading
    4 Great explanation of generational garbage collection
    and python’s reference detection algorithm
    4 Weak Reference Documentation
    4 Python Module of the Week - gc
    4 PyPy STM - GIL less Python Interpreter
    4 Saving 9GB of RAM with python’s __slots__
    @nnja

    View full-size slide

  73. Getting in-depth with the GIL
    4 Dave Beazley - Guide on how the GIL Operates
    4 Dave Beazley - New GIL in Python 3.2
    4 Dave Beazley - Inside Look at Infamous GIL Patch
    @nnja

    View full-size slide

  74. Why can’t we use the REPL to follow
    along at home?
    4 Because It doesn’t behave like a typical python
    program that’s being executed.
    4 Further reading
    @nnja

    View full-size slide

  75. Python pre-loads objects
    4 Many objects are loaded by Python as the interpreter starts.
    4 Called peephole optimization.
    4 Numbers: -5 -> 256
    4 Single Letter Strings
    4 Common Exceptions
    4 Further reading
    @nnja

    View full-size slide

  76. Attempting to remove the Gil - A
    Gilectomy
    4 Larry Hastings - Removing Python's GIL - The
    Gilectomy
    4 Larry Hastings - The Gilectomy, How it's going
    4 Gilectomy on GitHub
    4 A Gilectomy Update
    @nnja

    View full-size slide

  77. weakref
    4 weakref Python Module of the week
    4 weakref documentation
    @nnja

    View full-size slide