Key
hash
value
0 1 2 3 4 5 6 7
x = d["baz"]
hash("baz") = 58
58 % 8 = 2 (conflict with dummy, then linear probing)
"bar"
52
"ham"
"baz"
58
"egg"
DMMY
Slide 15
Slide 15 text
Problems in classical open addressing hash table
● Large memory usage
○ at least 1/3 of entries are empty
■ otherwise, "probing" can be too slow
○ one entry uses 3 words. (24bytes in 64bit machine)
○ 8 * 8 * 3 = 192 bytes for minimum dict
Slide 16
Slide 16 text
New dict implementation
Slide 17
Slide 17 text
Compact dict
Original idea is from Raymond Hettinger.
PyPy implements it with some customize.
https://morepypy.blogspot.jp/2015/01/faster-more-memory-eff
icient-and-more.html
CPython 3.6 has almost same as PyPy
Pros and cons
● Less memory usage
○ index can be 1 byte for size < 255
○ 3 * 8 * 5 + 8 = 128bytes (was 192bytes)
● Faster iteration
● Keep insertion order
● (cons) One more lookup stage
Slide 21
Slide 21 text
PEP 412: Key sharing dict
Slide 22
Slide 22 text
PEP 412: Key sharing dict
Introduced in Python 3.4
Instances of same class can share keys object
Slide 23
Slide 23 text
class A:
def __init__(self, a, b):
self.foo = a
self.bar = b
a = A("spam", "ham")
b = A("bacon", "egg")
Slide 24
Slide 24 text
Key
Class
value
0 1 2 3 4 5 6 7
"bar"
52
"foo"
42
0 1
index
"ham"
"spam"
values
"egg"
"bacon"
values
instance
instance
Slide 25
Slide 25 text
Problem
● Two instances can have different insertion order
○ drop key sharing dict?
■ key sharing dict can save more memory.
● But __slots__ can be used for such cases!
■ performance improvements in some microbench
● Is it matter for real case? __slots__?
■ Needs consensus
● it's more difficult than implementation
Slide 26
Slide 26 text
Keep key sharing dict support
● Only exactly same order can be permitted
○ "skipped" keys are prohibited
○ deletion is also prohibited
● Otherwise, stop "key sharing"
○ `self.x = None` is faster than `del self.x`
Slide 27
Slide 27 text
Future ideas
Slide 28
Slide 28 text
Ideas that will be tried later...
● specialized dict for namespace
○ all keys are interned string
○ only pointer comparison
○ no "hash" in entry -> more compact
● OrderedKey based on new dict
○ no more doubly linked list
○ `od.move_to_end(k, last=False)` is difficult, but it's possible
● functools.lru_cache
○ no more doubly linked list
○ Using `od.move_to_end(key)`
Slide 29
Slide 29 text
We're moving to Github!
New contributors are welcome!