Upgrade to Pro — share decks privately, control downloads, hide ads and more …

New dict implementation in Python 3.6

INADA Naoki
January 31, 2017

New dict implementation in Python 3.6

Python 3.6 の新しい dict の実装の紹介と、苦労話をする予定でした。
口頭で補足するつもりだった部分が抜けているので、気になる方はこちらの blog 記事も参考にしてください。
http://dsas.blog.klab.org/archives/python-compact-dict.html

INADA Naoki

January 31, 2017
Tweet

More Decks by INADA Naoki

Other Decks in Programming

Transcript

  1. Key hash value 0 1 2 3 4 5 6

    7 d["foo"] = "spam" hash("foo") = 42 42 % 8 = 2
  2. Key hash value 0 1 2 3 4 5 6

    7 d["foo"] = "spam" hash("foo") = 42 42 % 8 = 2 "foo" 42 "spam"
  3. Key hash value 0 1 2 3 4 5 6

    7 d["bar"] = "ham" hash("bar") = 52 52 % 8 = 4 "foo" 42 "spam" "bar" 52 "ham"
  4. Key hash value 0 1 2 3 4 5 6

    7 d["baz"] = "egg" hash("baz") = 58 58 % 8 = 2 (conflict!) "foo" 42 "spam" "bar" 52 "ham"
  5. Key hash value 0 1 2 3 4 5 6

    7 "probing" "linear probing" uses next entry. (CPython uses "5i + 1" | (hash >> 5) probing, but use simpler way in this example) "foo" 42 "spam" "bar" 52 "ham" "baz" 58 "egg"
  6. Key hash value 0 1 2 3 4 5 6

    7 del d["foo"] hash("foo") = 42 42 % 8 = 2 "foo" 42 "spam" "bar" 52 "ham" "baz" 58 "egg"
  7. Key hash value 0 1 2 3 4 5 6

    7 del d["foo"] hash("foo") = 42 42 % 8 = 2 "bar" 52 "ham" "baz" 58 "egg"
  8. Key hash value 0 1 2 3 4 5 6

    7 x = d["baz"] hash("baz") = 58 58 % 8 = 2 (!!?) "bar" 52 "ham" "baz" 58 "egg"
  9. Key hash value 0 1 2 3 4 5 6

    7 del d["foo"] remains DUMMY key "bar" 52 "ham" "baz" 58 "egg" DMMY
  10. Key hash value 0 1 2 3 4 5 6

    7 x = d["baz"] hash("baz") = 58 58 % 8 = 2 (conflict with dummy, then linear probing) "bar" 52 "ham" "baz" 58 "egg" DMMY
  11. Problems in classical open addressing hash table • Large memory

    usage ◦ at least 1/3 of entries are empty ▪ otherwise, "probing" can be too slow ◦ one entry uses 3 words. (24bytes in 64bit machine) ◦ 8 * 8 * 3 = 192 bytes for minimum dict
  12. Compact dict Original idea is from Raymond Hettinger. PyPy implements

    it with some customize. https://morepypy.blogspot.jp/2015/01/faster-more-memory-eff icient-and-more.html CPython 3.6 has almost same as PyPy
  13. Key hash value 0 1 2 3 4 5 6

    7 d["foo"] = "spam" d["bar"] = "ham" "bar" 52 "ham" "foo" 42 "spam" 0 1 index
  14. Key hash value 0 1 2 3 4 5 6

    7 d["foo"] = "spam" d["bar"] = "ham" d["baz"] = "egg" del d["foo"] "bar" 52 "ham" "baz" 58 "egg" DMMY 2 1 index
  15. Pros and cons • Less memory usage ◦ index can

    be 1 byte for size < 255 ◦ 3 * 8 * 5 + 8 = 128bytes (was 192bytes) • Faster iteration • Keep insertion order • (cons) One more lookup stage
  16. class A: def __init__(self, a, b): self.foo = a self.bar

    = b a = A("spam", "ham") b = A("bacon", "egg")
  17. Key Class value 0 1 2 3 4 5 6

    7 "bar" 52 "foo" 42 0 1 index "ham" "spam" values "egg" "bacon" values instance instance
  18. Problem • Two instances can have different insertion order ◦

    drop key sharing dict? ▪ key sharing dict can save more memory. • But __slots__ can be used for such cases! ▪ performance improvements in some microbench • Is it matter for real case? __slots__? ▪ Needs consensus • it's more difficult than implementation
  19. Keep key sharing dict support • Only exactly same order

    can be permitted ◦ "skipped" keys are prohibited ◦ deletion is also prohibited • Otherwise, stop "key sharing" ◦ `self.x = None` is faster than `del self.x`
  20. Ideas that will be tried later... • specialized dict for

    namespace ◦ all keys are interned string ◦ only pointer comparison ◦ no "hash" in entry -> more compact • OrderedKey based on new dict ◦ no more doubly linked list ◦ `od.move_to_end(k, last=False)` is difficult, but it's possible • functools.lru_cache ◦ no more doubly linked list ◦ Using `od.move_to_end(key)`