Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Avoiding memory leaks with "weakref"

reuven
July 18, 2022

Avoiding memory leaks with "weakref"

What are weak references, and how can we use them?

reuven

July 18, 2022
Tweet

More Decks by reuven

Other Decks in Technology

Transcript

  1. Reuven M. Lerner • PyCon Israel 2021 @reuvenmlerner • https://lerner.co.il

    It does two things • Create a new object of type “int”, with a value 5 • Create a reference from the variable “x” to the object we created 3
  2. Reuven M. Lerner • PyCon Israel 2021 @reuvenmlerner • https://lerner.co.il

    Notice • In Python, variables aren’t aliases to locations in memory. • Rather, variables are names that refer to objects. 4
  3. Reuven M. Lerner • PyCon Israel 2021 @reuvenmlerner • https://lerner.co.il

    It’s a bit more complex • Create a new object of type “int”, with a value 5 • Create a reference from the variable “x” to 5 • Determine what object “x” refers to — our int, 5 • Create a reference from the variable “y” to that int 6
  4. Reuven M. Lerner • PyCon Israel 2021 @reuvenmlerner • https://lerner.co.il

    Notice • Assigning “y=x” does not create a pointer from the variable y to the variable x. • That doesn’t exist in Python! • Rather, Python fi rst evaluates the right side, then refers to whatever value (object) it gets back. 7
  5. Reuven M. Lerner • PyCon Israel 2021 @reuvenmlerner • https://lerner.co.il

    Wait a second… • Objects consume memory. Where is Python getting the memory to create these int objects? • Answer: Python handles that on our behalf. We don’t need to think about it! 8
  6. Reuven M. Lerner • PyCon Israel 2021 @reuvenmlerner • https://lerner.co.il

    Reference counting • Each arrow pointing toward an object is a “reference.” • When the reference count of an object drops to zero, Python removes the object and frees its memory. 9
  7. Reuven M. Lerner • PyCon Israel 2021 @reuvenmlerner • https://lerner.co.il

    How many references? import gc class Person: def __init__(self, name): self.name = name p1 = Person('name1') print(gc.get_referrers(p1)) ≈ 10
  8. Reuven M. Lerner • PyCon Israel 2021 @reuvenmlerner • https://lerner.co.il

    The result? $ ./wr1.py [{'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <_frozen_importlib_external.SourceFileLoader object at 0x1066ff130>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, '__file__': '/Users/reuven/ Conferences/PyCon Israel/2021/weakrefs/./wr1.py', '__cached__': None, 'gc': <module 'gc' (built-in)>, 'Person': <class '__main__.Person'>, 'p1': <__main__.Person object at 0x1068e6f10>}] ≈ 11
  9. Reuven M. Lerner • PyCon Israel 2021 @reuvenmlerner • https://lerner.co.il

    What?!? • “p1” is a global variable • Global variables are all stored in a dict, which we can access via the builtin “globals” function • Which means: Once you de fi ne a global variable, there is at least one reference to it until the program exits. • Global variables never go away on their own • Elements of global containers don’t, either 12
  10. Reuven M. Lerner • PyCon Israel 2021 @reuvenmlerner • https://lerner.co.il

    Visible release • We can fi nd out when an object is released via the “__del__” magic method. • This method is invoked just before an object is about to be released. • Note: It’s not a destructor! And it’s not something you should normally use. 13
  11. Reuven M. Lerner • PyCon Israel 2021 @reuvenmlerner • https://lerner.co.il

    import gc class Person: def __init__(self, name): self.name = name def __del__(self): print(f"""Ack! I'm dead! ({self.name})""") p1 = Person('name1') print(gc.get_referrers(p1)) 14
  12. Reuven M. Lerner • PyCon Israel 2021 @reuvenmlerner • https://lerner.co.il

    And when we run it… $ ./wr2.py [{'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <_frozen_importlib_external.SourceFileLoader object at 0x100f79130>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, '__file__': '/Users/reuven/Conferences/PyCon Israel/ 2021/weakrefs/./wr2.py', '__cached__': None, 'gc': <module 'gc' (built-in)>, 'Person': <class '__main__.Person'>, 'p1': <__main__.Person object at 0x10115ff10>}] Ack! I'm dead! (name1) ≈ ≈ 15
  13. Reuven M. Lerner • PyCon Israel 2021 @reuvenmlerner • https://lerner.co.il

    What does this mean? • So long as there is at least one reference to an object, it remains alive. • The reference can be from a variable, but it can also be from another data structure. • When the reference count goes away, Python deletes the object and releases its memory. • When a program exits, all variables are deleted — and thus, all objects go away. 16
  14. Reuven M. Lerner • PyCon Israel 2021 @reuvenmlerner • https://lerner.co.il

    class Person: def __init__(self, name): self.name = name def __del__(self): print(f"""Ack! I'm dead! ({self.name})""") all_people = [] print('Before loop') for i in range(3): all_people.append(Person(f'name{i}')) print('After loop, now ending the program') ≈ 17
  15. Reuven M. Lerner • PyCon Israel 2021 @reuvenmlerner • https://lerner.co.il

    When we run it… $ ./wr3.py Before loop After loop, now ending the program Ack! I'm dead! (name0) Ack! I'm dead! (name1) Ack! I'm dead! (name2) Printed before the objects die 18
  16. Reuven M. Lerner • PyCon Israel 2021 @reuvenmlerner • https://lerner.co.il

    What happened? • Each new Person object was appended to “all_people” • Since “all_people” is a global variable, it’s only released when the program exits. • As such, its elements aren’t released until the program exits, either • This is despite the fact that no Person object was ever assigned to a variable! 19
  17. Reuven M. Lerner • PyCon Israel 2021 @reuvenmlerner • https://lerner.co.il

    What’s a memory leak? • Over time, the program will use more and more memory. • Sometimes, that’s necessary. • But it can also happen because of situations like this one — in which our objects are stored in a global collection, and thus never removed. • What if we put millions of objects in “all_people” and forget to remove them when our program runs? 21
  18. Reuven M. Lerner • PyCon Israel 2021 @reuvenmlerner • https://lerner.co.il

    But … garbage collection! • You might be thinking: I thought Python was garbage collected! It can’t have memory leaks! • It’s true, memory leaks in Python don’t happen because you failed to free up used memory. • But they occur, in situations like we saw, because you stored an object (often in a global) and forgot about it. 22
  19. Reuven M. Lerner • PyCon Israel 2021 @reuvenmlerner • https://lerner.co.il

    One solution: Local variables • Local variables go away when a function exits. • If our code had run inside of a function, then the leak wouldn’t have happened • Or it would have stopped after a reasonable amount of time. 23
  20. Reuven M. Lerner • PyCon Israel 2021 @reuvenmlerner • https://lerner.co.il

    import gc class Person: def __init__(self, name): self.name = name def __del__(self): print(f"""Ack! I'm dead! ({self.name})""") def create_some_people(): all_people = [] for i in range(3): all_people.append(Person(f'name{i}')) for i in range(2): print(f'--- {i} ---') create_some_people() The local variable “all_people” goes away when the function returns We’ll run the function twice 24
  21. Reuven M. Lerner • PyCon Israel 2021 @reuvenmlerner • https://lerner.co.il

    When we run this code $ ./wr4.py --- 0 --- Ack! I'm dead! (name2) Ack! I'm dead! (name1) Ack! I'm dead! (name0) --- 1 --- Ack! I'm dead! (name2) Ack! I'm dead! (name1) Ack! I'm dead! (name0) 25
  22. Reuven M. Lerner • PyCon Israel 2021 @reuvenmlerner • https://lerner.co.il

    Another solution: Weak references • Weak references (“weakrefs”) allow us to refer to an object, but without adding to the reference count. • Meaning: Our reference won’t stand in the way of the object being removed. 26
  23. Reuven M. Lerner • PyCon Israel 2021 @reuvenmlerner • https://lerner.co.il

    import weakref class Person: def __init__(self, name): self.name = name def __del__(self): print(f"""Ack! I'm dead! ({self.name})""") all_people = [] print('Before loop') for i in range(3): all_people.append(weakref.ref(Person(f'name{i}'))) print('After loop, now ending the program') Call “weakref.ref” to create a weak reference to an object 27
  24. Reuven M. Lerner • PyCon Israel 2021 @reuvenmlerner • https://lerner.co.il

    When we run it… Before loop Ack! I'm dead! (name0) Ack! I'm dead! (name1) Ack! I'm dead! (name2) After loop, now ending the program Printed after the objects die 28
  25. Reuven M. Lerner • PyCon Israel 2021 @reuvenmlerner • https://lerner.co.il

    Working with weakrefs import weakref class Person: def __init__(self, name): self.name = name def __del__(self): print(f"""Ack! I'm dead! ({self.name})""") p1 = Person('name1') p2 = weakref.ref(p1) print(p1.name) print(p2().name) Create a ref to an existing object Call the ref to get the object 29
  26. Reuven M. Lerner • PyCon Israel 2021 @reuvenmlerner • https://lerner.co.il

    When the target dies import weakref class Person: def __init__(self, name): self.name = name def __del__(self): print(f"""Ack! I'm dead! ({self.name})""") p1 = Person('name1') p2 = weakref.ref(p1) del(p1) print(p2() is None) print(p2().name) Remove p1, the “strong” reference Now p2() resolves to None Trying to get an attribute will blow up with an error 30
  27. Reuven M. Lerner • PyCon Israel 2021 @reuvenmlerner • https://lerner.co.il

    weakref.proxy import weakref class Person: def __init__(self, name): self.name = name def __del__(self): print(f"""Ack! I'm dead! ({self.name})""") p1 = Person('name1') p2 = weakref.proxy(p1) print(p2.name) del(p1) print(type(p2)) print(p2) Remove p1, the “strong” reference Now type(p2) is “weak proxy” Trying to access the object raises a ReferenceError The proxy returns the object; no () are needed! 31
  28. Reuven M. Lerner • PyCon Israel 2021 @reuvenmlerner • https://lerner.co.il

    Callbacks • You can specify that a function should be called when the weakref’s referent (i.e., object it refers to) goes away • This function gets the weakref object as an argument • But the referent has already disappeared, so you can’t get any last-minute data from it 32
  29. Reuven M. Lerner • PyCon Israel 2021 @reuvenmlerner • https://lerner.co.il

    Callbacks import weakref class Person: def __init__(self, name): self.name = name def __del__(self): print(f"""Ack! I'm dead! ({self.name})""") def value_gone(ref): print(f'Boo hoo: weakref {id(ref)} is refless!') p1 = Person('name1') p2 = weakref.proxy(p1, value_gone) print(f'Before deleting, p2.name = {p2.name}') print(f’Deleting') del(p1) print(f'Done deleting') Callback function de fi ned Invoke the callback when the referent disappears 33
  30. Reuven M. Lerner • PyCon Israel 2021 @reuvenmlerner • https://lerner.co.il

    When we run this $ ./wr9.py Before deleting, p2.name = name1 Deleting Ack! I'm dead! (name1) Boo hoo: weakref 4310797744 is refless! Done deleting from __del__ from the callback 34
  31. Reuven M. Lerner • PyCon Israel 2021 @reuvenmlerner • https://lerner.co.il

    Observers • In the “Observer” design pattern, one or more objects register their interest in a central object. • When the central object’s state changes, it informs the observers what has happened. • Weak references ensure our observer isn’t needlessly stopping objects from sticking around. 36
  32. Reuven M. Lerner • PyCon Israel 2021 @reuvenmlerner • https://lerner.co.il

    import weakref class Person: def __init__(self, name): self.name = name self.observers = [] def add_observers(self, *args): for new_observer in args: self.observers.append(weakref.ref(new_observer)) def inform_observers(self, message): for one_observer in self.observers: if one_observer() is None: continue print(f'Message for {one_observer().name}: {message}') Add a weakref, not the object itself 37
  33. Reuven M. Lerner • PyCon Israel 2021 @reuvenmlerner • https://lerner.co.il

    Using our Observer p1 = Person('main') p2 = Person('name2') p3 = Person('name3') p4 = Person('name4') p1.add_observers(p2, p3, p4) p1.inform_observers(‘Hello!') # p2, p3, and p4 all receive “Hello” del(p2) p1.inform_observers('Hello again!’) # p3 and p4 both receive “Hello, again!" 38
  34. Reuven M. Lerner • PyCon Israel 2021 @reuvenmlerner • https://lerner.co.il

    Caching! • Weakrefs are perfect for caching • The cache doesn’t prevent an object from removal • But so long as the object exists, it’ll remain in the cache 39
  35. Reuven M. Lerner • PyCon Israel 2021 @reuvenmlerner • https://lerner.co.il

    Weakref-based dicts • Many times, you’ll want to use one of the dict variations de fi ned in the “weakref” module: • WeakKeyDictionary — the keys are weak refs, and removed automatically • WeakValueDictionary — the values are weak refs, and removed automatically • WeakSet — like a set, but the values are weak refs to other objects 40
  36. Reuven M. Lerner • PyCon Israel 2021 @reuvenmlerner • https://lerner.co.il

    Use case: Descriptors • Descriptors are a bit arcane, but the idea is that we have: • A de fi ned class attribute • The attribute’s class de fi nes __get__ and/or __set__ • Retrieving the descriptor via an instance invokes __get__ • Assigning to the descriptor via an instance invokes __set__ 41
  37. Reuven M. Lerner • PyCon Israel 2021 @reuvenmlerner • https://lerner.co.il

    How is this relevant? • A descriptor is a class attribute; only one is shared across all of the instances. • But we want each instance to have its own values. • We can thus use a dict on the descriptor instance to keep track of per-instance values, using the instance as a key. • Ah, but what if the instance goes away? Should our dict continue to hold onto it? 42
  38. Reuven M. Lerner • PyCon Israel 2021 @reuvenmlerner • https://lerner.co.il

    Example: Age • We want instances of Person to have an “age” attribute • But we want to ensure that the age isn’t < 0 or > 120. • We thus de fi ne an Age class, a descriptor on Person 43
  39. Reuven M. Lerner • PyCon Israel 2021 @reuvenmlerner • https://lerner.co.il

    Example descriptor class Person: def __init__(self, name): self.name = name age = Age() def __del__(self): print(f'Sadly, {self.name} is gone.') Our descriptor, an instance of Age de fi ned as a class attribute 44
  40. Reuven M. Lerner • PyCon Israel 2021 @reuvenmlerner • https://lerner.co.il

    Age implementation class Age: def __init__(self): self.ages = WeakKeyDictionary() def __get__(self, instance, owner): return self.ages[instance] def __set__(self, instance, new_age): if new_age < 0: raise AgeTooLowError(f'Age of {new_age} < 0, the minimum') if new_age > 120: raise AgeTooHighError(f'Age of {new_age} > 120, the maximum') self.ages[instance] = new_age Store all instance age values here Retrieve an age from our cache Store into our cache 45
  41. Reuven M. Lerner • PyCon Israel 2021 @reuvenmlerner • https://lerner.co.il

    Summary • Weak refs provide us with insights into Python’s garbage- collection mechanism • It’s hard, but not impossible, to leak memory in Python • Moreover, you can end up keeping objects around longer than you might want • Using weak refs, you can ensure that your data structure isn’t the reason objects are sticking around. 46
  42. Reuven M. Lerner • PyCon Israel 2021 @reuvenmlerner • https://lerner.co.il

    Questions? Comments? • Contact me! • [email protected] • Twitter: @reuvenmlerner • https://lerner.co.il • Join 20k others on my weekly “Better developers” list: • https://BetterDevelopersWeekly.com/ 47