Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Nina Zakharenko - The Basics of Memory Management in Python - North Bay Python 2018

Nina Zakharenko - The Basics of Memory Management in Python - North Bay Python 2018

https://2018.northbaypython.org/schedule/presentation/19/

As a new python developer, do you find memory management in Python confusing? Come to this talk to learn about the basics of how Memory Management works in Python. We'll cover the concepts of reference counting, garbage collection, weak references, __slots__, and the Global Interpreter Lock.

The documentation immediately jumps into difficult to follow concepts, especially if you don't have a background in Computer Science.

I'll provide a simple, easy to follow overview of the concepts that a developer needs to be familiar with in order to scratch the surface of how memory management and garbage collection works in Python.

Nina Zakharenko

November 04, 2018
Tweet

More Decks by Nina Zakharenko

Other Decks in Technology

Transcript

  1. Memory Management in
    Python
    Nina Zakharenko - @nnja
    slides: bit.ly/nbpy-memory

    View Slide

  2. Livetweet!
    use #nbpy
    @nnja

    View Slide

  3. Why should you care?
    Knowing about memory management
    helps you write more efficient code.
    @nnja

    View Slide

  4. What will you learn?
    4 Vocabulary
    4 Basic Concepts
    4 Foundation
    @nnja

    View Slide

  5. What won't you learn?
    You won’t be an expert at the end of
    this talk.
    @nnja

    View Slide

  6. What's a variable?
    @nnja

    View Slide

  7. What's a c-style variable?

    View Slide

  8. View Slide

  9. Change value of c-style variables

    View Slide

  10. Python has
    names
    not
    variables
    @nnja

    View Slide

  11. How are Python objects
    stored in memory?
    names ➡ references ➡ objects
    @nnja

    View Slide

  12. A name is just a label for an
    object.
    In Python, each object can have lots of
    names.
    Like 'x', 'y'
    @nnja

    View Slide

  13. Different Types of Objects
    Simple Container
    numbers dict
    strings list
    user defined classes
    Container objects can contain simple objects, or other
    container objects.
    @nnja

    View Slide

  14. What's a Reference?
    A name or a container object that
    points at another object.
    @nnja

    View Slide

  15. Reference Count
    @nnja

    View Slide

  16. ‐ Increasing the ref count
    @nnja

    View Slide

  17. View Slide

  18. View Slide

  19. View Slide

  20. ‑ Decreasing the ref count
    @nnja

    View Slide

  21. Decrease Ref Count: Change the Reference

    View Slide

  22. Decrease Ref Count: del keyword

    View Slide

  23. What does del do?
    The del statement doesn't delete objects.
    It:
    4 removes that name as a reference to that object
    4 reduces the ref count by 1
    @nnja

    View Slide

  24. Decrease Ref Count: Go out of Scope

    View Slide

  25. !
    When there are no more references,
    the object can be safely removed from
    memory.
    @nnja

    View Slide

  26. local vs. global namespace
    If refcounts decrease when an object goes out of scope,
    what happens to objects in the global namespace?
    4 Never goes out of scope!
    4 Refcount never reaches 0.
    4 Avoid putting large or complex objects in the global
    namespace.
    @nnja

    View Slide

  27. Every Python object holds 3 things
    4 Its type
    4 A reference count
    4 Its value
    @nnja

    View Slide

  28. View Slide

  29. >>> def mem_test():
    ... x = 300
    ... y = 300
    ... print( id(x) )
    ... print( id(y) )
    ... print( x is y )
    >>> mem_test()
    4504654160
    4504654160
    True
    ℹ note: run this from a function in the repl, or from a file
    @nnja

    View Slide

  30. Garbage
    Collection
    @nnja

    View Slide

  31. What is Garbage
    Collection?
    A way for a program to automatically
    release memory when the object taking
    up that space is no longer in use.
    @nnja

    View Slide

  32. Two Main Types of Garbage
    Collection
    1. Reference Counting
    2. Tracing
    @nnja

    View Slide

  33. How does reference counting garbage
    collection work?
    1. Add and remove references
    2. When the refcount reaches 0, remove the object
    3. Cascading effect
    4 decrease ref count of any object the deleted
    object was pointing to
    @nnja

    View Slide

  34. Reference Counting Garbage Collection:
    The Good
    4 Easy to implement
    4 When refcount is 0, objects are immediately
    deleted.
    @nnja

    View Slide

  35. Reference Counting:
    The Bad
    4 space overhead
    4 reference count is stored for every object
    4 execution overhead
    4 reference count changed on every assignment
    @nnja

    View Slide

  36. Reference Counting:
    The Ugly
    Not generally thread safe!
    Reference counting doesn't detect
    cyclical references
    @nnja

    View Slide

  37. Cyclical References By Example
    @nnja

    View Slide

  38. What's a cyclical reference?
    @nnja

    View Slide

  39. Cyclical Reference
    @nnja

    View Slide

  40. Reference counting alone
    will not garbage collect
    objects with cyclical
    references.
    @nnja

    View Slide

  41. Two Main Types of Garbage
    Collection
    1. Reference Counting
    2. Tracing
    @nnja

    View Slide

  42. Tracing Garbage Collection - Marking

    View Slide

  43. Tracing Garbage Collection - Sweeping

    View Slide

  44. What does Python use?
    Reference Counting &
    Generational
    (A type of Tracing GC)
    @nnja

    View Slide

  45. Generational Garbage
    Collection is based on the
    theory that most objects
    die young.
    @nnja

    View Slide

  46. Python maintains a list of every object
    created as a program is run.
    Actually, it makes 3:
    - generation 0
    - generation 1
    - generation 2
    Newly created objects are stored in generation 0.
    @nnja

    View Slide

  47. Only container objects
    with a refcount greater
    than 0 will be stored in a
    generation list.
    @nnja

    View Slide

  48. When the number of objects in a
    generation reaches a threshold, python
    runs a garbage collection algorithm on
    that generation, and any generations
    younger than it.
    @nnja

    View Slide

  49. What happens during a generational garbage
    collection cycle?
    1. Python makes a list for objects to discard.
    2. It runs an algorithm to detect reference cycles.
    3. If an object has no outside references, add it to the discard
    list.
    4. When the cycle is done, free up the objects on the discard
    list.
    @nnja

    View Slide

  50. After a garbage collection cycle, objects
    that survived will be promoted to the
    next generation.
    Objects in the last generation (2) stay
    there as the program executes.
    @nnja

    View Slide

  51. When the ref count reaches 0, you get
    immediate clean up.
    If you have a cycle, you need to wait
    for garbage collection to run.
    @nnja

    View Slide

  52. Objects with cyclical references get
    cleaned up by generational garbage
    collection.
    @nnja

    View Slide


  53. Why doesn’t a Python
    program shrink in memory
    after garbage collection?
    @nnja

    View Slide

  54. After garbage collection, the size of
    the python program likely won’t
    shrink.
    4 The freed memory is fragmented.
    4 i.e. it's not freed in one continuous block.
    4 When we say memory is freed during garbage collection, it’s released
    back to Python to use for other objects, not necessarily to the system.
    @nnja

    View Slide

  55. Quick Optimizations
    @nnja

    View Slide

  56. __slots__
    @nnja

    View Slide

  57. Python instances have a dict of values
    class Dog(object):
    pass
    buddy = Dog()
    buddy.name = 'Buddy'
    print(buddy.__dict__)
    {'name': 'Buddy'}
    @nnja

    View Slide

  58. AttributeError
    'Hello'.name = 'Fred'
    AttributeError
    Traceback (most recent call last)
    ----> 1 'Hello'.name = 'Fred'
    AttributeError: 'str' object has no attribute 'name'
    @nnja

    View Slide

  59. __slots__
    class Point(object):
    __slots__ = ('x', 'y')
    point = Point()
    point.x = 5
    point.y = 7
    point.name = "Fred"
    Traceback (most recent call last):
    File "point.py", line 8, in
    point.name = "Fred"
    AttributeError: 'Point' object has no attribute 'name'
    @nnja

    View Slide

  60. size of dict vs. size of tuple
    import sys
    sys.getsizeof(dict())
    sys.getsizeof(tuple())
    sizeof dict: 232 bytes
    sizeof tuple: 40 bytes
    @nnja

    View Slide

  61. When to use slots?
    4 Creating many instances of a class
    4 Know in advance what properties the class should
    have
    Saving 9 GB of RAM with __slots__

    View Slide

  62. weakref
    4 A weakref to an object is not enough to keep it alive.
    4 When the only remaining references are weak
    references, the object can be garbage collected.
    4 Useful for:
    4 implementing caches or mappings holding large
    objects
    python3 weakref docs

    View Slide

  63. What's
    a
    GIL?
    @nnja

    View Slide

  64. Global
    Interpreter
    Lock
    @nnja

    View Slide

  65. Only one thread can run in
    the interpreter at a time.
    @nnja

    View Slide

  66. Advantages / Disadvantages of a GIL
    Upside:
    Reference counting is fast and easy to implement.
    Downside:
    In a Python program, no matter how many threads
    exist, only one thread will be executed at a time.

    View Slide

  67. Want to take advantage of multiple
    cores?
    4 Use multi-processing instead of multi-threading.
    4 Each process will have it’s own GIL, it’s on the
    developer to figure out a way to share information
    between processes.
    @nnja

    View Slide


  68. If the GIL limits Python,
    can’t we just remove it?
    additional reading

    View Slide

  69. For better or for worse, the GIL is here
    to stay!
    @nnja

    View Slide

  70. What Did We Learn?
    @nnja

    View Slide

  71. Garbage collection is pretty
    good.
    @nnja

    View Slide

  72. Now you know how
    memory is managed.
    @nnja

    View Slide

  73. Python3!
    @nnja

    View Slide

  74. For scientific applications,
    use numpy & pandas.
    @nnja

    View Slide

  75. Thank You!
    Python @ Microsoft:
    bit.ly/nbpy-microsoft
    @nnja
    *Bonus material on the next slide

    View Slide

  76. Bonus Material
    Section ➡
    @nnja

    View Slide

  77. Additional Reading
    4 Great explanation of generational garbage collection
    and python’s reference detection algorithm
    4 Weak Reference Documentation
    4 Python Module of the Week - gc
    4 PyPy STM - GIL less Python Interpreter
    4 Saving 9GB of RAM with python’s __slots__
    @nnja

    View Slide

  78. Getting in-depth with the GIL
    4 Dave Beazley - Guide on how the GIL Operates
    4 Dave Beazley - New GIL in Python 3.2
    4 Dave Beazley - Inside Look at Infamous GIL Patch
    @nnja

    View Slide

  79. Why can’t we use the REPL to follow
    along at home?
    4 Because It doesn’t behave like a typical python
    program that’s being executed.
    4 Further reading
    @nnja

    View Slide

  80. Python pre-loads objects
    4 Many objects are loaded by Python as the interpreter starts.
    4 Called peephole optimization.
    4 Numbers: -5 -> 256
    4 Single Letter Strings
    4 Common Exceptions
    4 Further reading
    @nnja

    View Slide

  81. Attempting to remove the Gil - A
    Gilectomy
    4 Larry Hastings - Removing Python's GIL - The
    Gilectomy
    4 Larry Hastings - The Gilectomy, How it's going
    4 Gilectomy on GitHub
    4 A Gilectomy Update
    @nnja

    View Slide

  82. weakref
    4 weakref Python Module of the week
    4 weakref documentation
    @nnja

    View Slide

  83. @nnja

    View Slide