New dict implementation in Python 3.6 (KLab Tech Meetup 2017-09-04)

Slide 1

Slide 1 text

New dict implementation in Python 3.6 Inada Naoki (@methane)

Slide 2

Slide 2 text

自己紹介 @methane K-Labo, KLab Inc. Python core developer C, Go, Network (server) programming, MySQL clients ISUCON 6 winner (See http://isucon.net/ )

Slide 3

Slide 3 text

Table of contents ● dict in Python ● Python 3.5 implementation ● Python 3.6 implementation ● Toward Python 3.7

Slide 4

Slide 4 text

Dict in Python

Slide 5

Slide 5 text

Dict Key-Value storage. A.k.a. associative-array, map, hash. x = {"foo": 42, "bar": 84} print( x["foo"] ) # => 42 Key feature: ● Constant time lookup ● Amortized constant time insertion ● Support custom (user-defined) key type

Slide 6

Slide 6 text

Dicts are everywhere in Python x = 5 # global namespace is dict. Insert 'x' to it. def add(a): # Insert 'add' to global dict return a + x # lookup 'x' from global dict print(add(7)) # search 'print' and 'add' from global dict There are many dicts in Python program. Lookup speed is critical. Insertion speed and memory usage is very important too.

Slide 7

Slide 7 text

Python 3.5 implementation

Slide 8

Slide 8 text

Key hash value 0 1 2 3 4 5 6 7 d["foo"] = "spam" # insert new item hash("foo") = 42 # hash value is 42 42 % 8 = 2 # hash value % hash table size = 2

Slide 9

Slide 9 text

Key hash value 0 1 2 3 4 5 6 7 d["foo"] = "spam" hash("foo") = 42 42 % 8 = 2 "foo" 42 "spam"

Slide 10

Slide 10 text

Key hash value 0 1 2 3 4 5 6 7 d["bar"] = "ham" hash("bar") = 52 52 % 8 = 4 "foo" 42 "spam" "bar" 52 "ham"

Slide 11

Slide 11 text

Key hash value 0 1 2 3 4 5 6 7 d["baz"] = "egg" hash("baz") = 58 58 % 8 = 2 # "baz" is conflict with "foo" "foo" 42 "spam" "bar" 52 "ham"

Slide 12

Slide 12 text

Key hash value 0 1 2 3 4 5 6 7 "Open addressing" uses another slot in the table. (Another strategy is "chaining") For example, "linear probing" algorithm uses next entry. ※Python uses more complex probing, but I use simpler way in this example. "foo" 42 "spam" "bar" 52 "ham" "baz" 58 "egg"

Slide 13

Slide 13 text

Key hash value 0 1 2 3 4 5 6 7 del d["foo"] hash("foo") = 42 42 % 8 = 2 "foo" 42 "spam" "bar" 52 "ham" "baz" 58 "egg"

Slide 14

Slide 14 text

Key hash value 0 1 2 3 4 5 6 7 del d["foo"] hash("foo") = 42 42 % 8 = 2 "bar" 52 "ham" "baz" 58 "egg"

Slide 15

Slide 15 text

Key hash value 0 1 2 3 4 5 6 7 x = d["baz"] hash("baz") = 58 58 % 8 = 2 (!!?) "bar" 52 "ham" "baz" 58 "egg"

Slide 16

Slide 16 text

Key hash value 0 1 2 3 4 5 6 7 del d["foo"] remains DUMMY key "bar" 52 "ham" "baz" 58 "egg" DUMMY

Slide 17

Slide 17 text

Key hash value 0 1 2 3 4 5 6 7 x = d["baz"] hash("baz") = 58 58 % 8 = 2 (conflict with dummy, then linear probing) "bar" 52 "ham" "baz" 58 "egg" DUMMY

Slide 18

Slide 18 text

Problems in classical open addressing hash table ● Large memory usage ○ At least 1/3 of entries are empty ■ Otherwise, "probing" can be too slow ○ One entry uses 3 words ■ word = 8 bytes on recent machine ○ minimum size = 192 byte ■ 8 (byte/word) * 3 (word/entry) * 8 (table width)

Slide 19

Slide 19 text

Python 3.6 implementation

Slide 20

Slide 20 text

Compact and ordered dict PyPy implements it in 2015 https://morepypy.blogspot.jp/2015/01/faster-more-memory-efficient-and-more.html Python 3.6 dict is almost same as PyPy. Ruby 2.4, php 7 has similar one.

Slide 21

Slide 21 text

Key hash value 0 1 2 3 4 5 6 7 d["foo"] = "spam" # hash("foo") = 42, 42 % 8 = 2 "foo" 42 "spam" 0 index

Slide 22

Slide 22 text

Key hash value 0 1 2 3 4 5 6 7 d["foo"] = "spam" d["bar"] = "ham" # hash("bar") = 52 , 52 % 8 = 4 "bar" 52 "ham" "foo" 42 "spam" 0 1 index

Slide 23

Slide 23 text

Key hash value 0 1 2 3 4 5 6 7 d["foo"] = "spam" d["bar"] = "ham" d["baz"] = "egg" del d["foo"] "bar" 52 "ham" "baz" 58 "egg" DUMMY 2 1 index

Slide 24

Slide 24 text

● Less memory usage ○ Index can be 1 byte for small dict ○ 3*8 *5 (entries) + 8 (index table) = 128 bytes ■ It was 192 bytes in legacy implementation ● Faster iteration (dense entries) ● Preserve insertion order ● (cons) One more indirect memory access New dict vs Legacy dict

Slide 25

Slide 25 text

Toward Python 3.7

Slide 26

Slide 26 text

Working on ... ● Remove redundant code for optimize legacy implementation. ● OrderedDict based on New dict ○ Remove doubly linked list used for keep order ○ About 1/2 memory usage! ○ Faster creation and iterating. ○ (cons) Slower .move_to_end() method

Slide 27

Slide 27 text

We're finding new contributors Contributing to Python is easier, thanks to Github. ● Read devguide (https://devguide.python.org/ ) ● Find easy bug on https://bugs.python.org/ and fix it. ● Review other's code ● Translate document on Transifex ○ See https://docs.python.org/ja/

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

Future ideas ● specialized dict for namespace ○ all keys are interned string ○ only pointer comparison ○ no "hash" in entry -> more compact ● Implement set like dict ○ current set is larger than dict... ● functools.lru_cache ○ Use `od.move_to_end(key)`, instead of linked list

Slide 31

Slide 31 text

PEP 412: Key sharing dict

Slide 32

Slide 32 text

PEP 412: Key sharing dict Introduced in Python 3.4 Instances of same class can share keys object

Slide 33

Slide 33 text

class A: def __init__(self, a, b): self.foo = a self.bar = b a = A("spam", "ham") b = A("bacon", "egg")

Slide 34

Slide 34 text

Key Class value 0 1 2 3 4 5 6 7 "bar" 52 "foo" 42 0 1 index "ham" "spam" values "egg" "bacon" values instance instance

Slide 35

Slide 35 text

Problem ● Two instances can have different insertion order ○ drop key sharing dict? ■ key sharing dict can save more memory. ● But __slots__ can be used for such cases! ■ performance improvements in some microbench ● Is it matter for real case? __slots__? ■ Needs consensus ● it's more difficult than implementation

Slide 36

Slide 36 text

Keep key sharing dict support ● Only exactly same order can be permitted ○ "skipped" keys are prohibited ○ deletion is also prohibited ● Otherwise, stop "key sharing" ○ `self.x = None` is faster than `del self.x`