Slide 1

Slide 1 text

CPython Objects Implementation Pearls Laurent Luce

Slide 2

Slide 2 text

Goal Knowing more about a programming language implementation helps to be more efficient when coding.

Slide 3

Slide 3 text

List Pearls

Slide 4

Slide 4 text

list.append odds = [] odds.append(0) odds.append(1) odds [0, 1]

Slide 5

Slide 5 text

list.append odds = [] 1 3 5 7 odds.append(1) odds.append(3) odds.append(5) odds.append(7) odds.append(9) 9

Slide 6

Slide 6 text

Growth pattern (size >> 3) + (size < 9 ? 3 : 6) + size Ask for 1 -> Get 4. Ask for 5 -> Get 8. Ask for 9 -> Get 16. 25, 35, 46, 58, 72, 88, … TODO: Graph

Slide 7

Slide 7 text

list.pop odds.pop() 1 3 5 7 odds.pop() 9

Slide 8

Slide 8 text

list.sort ints = [4, 1, 3, 5, 8] ints.sort() ints [1, 3, 4, 5, 8]

Slide 9

Slide 9 text

list.sort if N < 64: Binary insertion sort else: Find natural runs. a0 <= a1 <= a2 <= … a0 > a1 > a2 > … Merge runs.

Slide 10

Slide 10 text

Runs N = 2112. If minrun = 32: Two runs of length 2048 and 64 to merge. Costly. If minrun = 33: All merges balanced. q, r = divmod(N, minrun). Bad: q power of 2 and r > 0. q little larger than power of 2. Good: q power of 2 and r = 0. q slightly less than power of 2.

Slide 11

Slide 11 text

1st run 1 3 6 4 8 7 9 ... 5 ... 2 1 3 0 33 2112 6 1 3 4 5 6 7 9 ... 41 0 33

Slide 12

Slide 12 text

Runs in stack Two invariants: A > B + C B < C ... A B C

Slide 13

Slide 13 text

Merging runs A B C If A <= B + C: Merge smaller of A and C with B. 30 20 10 BC 30

Slide 14

Slide 14 text

Galloping 10 12 16 21 24 27 32 36 38 41 ... 2 4 5 7 9 13 14 18 23 ... 10 2 4 7 14 11 9 13 11 Happens when one run keeps winning. B[2**(k-1) - 1] < A[0] <= B[2**k - 1] Compare A[0] to B[0], B[1], B[3], B[7]... Uncertainty: 2**(k-1) - 1

Slide 15

Slide 15 text

Pros/Cons of “timsort” ● Does well on pre-existing order. ● Does not do as well as the old “samplesort” on lists with many duplicates. ● Requires a temp array of up to N/2 elements (random data).

Slide 16

Slide 16 text

Dictionary Pearls

Slide 17

Slide 17 text

dict letters = {} letters[‘a’] = ‘first’ letters[‘b’] = ‘second’ letters[‘z’] = ‘last’ letters {‘a’: ‘first’, ‘b’: ‘second’, ‘z’: ‘last’}

Slide 18

Slide 18 text

Hash table and slot index Hash table with initial size of 8 slots. Slot index = hash(key) & (table_size - 1) Python hash function is very regular for ints and strings: map(hash, (0, 1, 2, 3)) = [0, 1, 2, 3] key ‘0’ -> slot 0 key ‘1’ -> slot 1...

Slide 19

Slide 19 text

Slot index letters = {} ‘a’ ‘b’ letters[‘a’] = ‘first’ letters[‘b’] = ‘second’ letters[‘z’] = ‘last’ ‘z’

Slide 20

Slide 20 text

Collision resolution Bad scenario: Hash table of size 2**15 Adding keys: [2**16, 2**16+1, …] Key 2**16 -> Slot 2**16 & (2**15 - 1) = 0 Key 2**16 + 1 -> Slot 0. Linear probing If slot used: check slot + 1

Slide 21

Slide 21 text

Linear probing data = {} data[2**15] = ‘first’ data[2**15+1] = ‘second’ data[2**15+2] = ‘third’ 2**15 2**15 +1 2**15 +2

Slide 22

Slide 22 text

Collision resolution slot = ((5*slot) + 1) % 2**i Unlikely hash codes follow a 5*slot+1 recurrence. 0 -> 1 -> 6 -> 7 -> 4 -> 5 -> 2 -> 3 -> 0 slot = (5*slot) + 1 + perturb perturb >>= 5 use slot % 2**i next.

Slide 23

Slide 23 text

Resizing fill = used slots + dummy slots if table usage >= 2/3: if used slots > 50000: Double the table size. else: Quadruple the table size.

Slide 24

Slide 24 text

String Pearls

Slide 25

Slide 25 text

str s1 = ‘a’ 0 UCHAR_MAX ‘a’ s1 s2 s2 = ‘a’ 97

Slide 26

Slide 26 text

str.find a b c b a b b d c b ... b a c Mix between boyer-moore and horspool. b b b c c b a b b c d

Slide 27

Slide 27 text

Bloom filter for (mask = i = 0; i < p_len; i++) mask |= (1 << (p[i] & 0x1F)); Is chr in string? mask & (1 << (chr & 0x1F))

Slide 28

Slide 28 text

str.find Good cases: O(n/m). Worst case: O(nm). Simple implementation. Python 2.7: str and unicode. Python 3.x: bytes.

Slide 29

Slide 29 text

Int/Long Pearls

Slide 30

Slide 30 text

int a = 1000 0 40 1000 b = 1001 a 1001 b del a del b

Slide 31

Slide 31 text

Int a = 1000 b = 1000 a is b False a = 10 b = 10 a is b True

Slide 32

Slide 32 text

Sharing small ints a = 5 b = 5 4 -5 256 5 6 7 a b

Slide 33

Slide 33 text

Thanks! Questions?