Slide 1

Slide 1 text

Sharing memory efficiently Claudio Daniel Freire klaussfreire@gmail.com PyCon US 2018

Slide 2

Slide 2 text

Python (sometimes)

Slide 3

Slide 3 text

Sharing memory... what for? ● Cache too big to fit in RAM... – N times with N processors ● Multiprocessing: input data ● Tornado / mod_wsgi: slow-evolving caches – When only a small fraction of it is frequently accessed ● The “working set” fits in memory, but not the whole dataset ● When disk access for infrequently needed data is acceptable 2

Slide 4

Slide 4 text

Sharing memory... what for? ● When the serialization cost becomes prohibitive – Big and complex data structures: you can’t afford the CPU cycles (de)serializing lots of objects – Objects of inefficient serialization: ● Eg: SQLAlchemy 2

Slide 5

Slide 5 text

Sharing memory... what for? ● The transition to a “shared buffer” – Split the cache in two layers: ● A read-only layer of slow-evolving data ● A regular read-write layer with continuous (but infrequent) updates – Most data must reside in the static layer 3

Slide 6

Slide 6 text

Why not multiprocessing? ● Lock contention ● Poor support for complex structures: 4 Note Although it is possible to store a pointer in shared memory remember that this will refer to a location in the address space of a specific process. However, the pointer is quite likely to be invalid in the context of a second process and trying to dereference the pointer from the second process may cause a crash. multiprocessing docs

Slide 7

Slide 7 text

The shared buffer ● Getting the best of both worlds – Compact and efficient shared memory representation of static or slow-changing data – Dynamic and fast-updateable structure for the rest User Old stuff New stuff Buffer 5

Slide 8

Slide 8 text

How? P1 P2 P3 P3 P4 P5 File Shared through mmap 6

Slide 9

Slide 9 text

How? As simple as: fileobj = open("buf", "r+") buf = mmap.mmap( fileobj.fileno(), 0, access = mmap.ACCESS_READ) 6

Slide 10

Slide 10 text

How do I get objects into this thing? Slightly more complex ● Define a schema that is: – Easily manipulated without serialization – Efficient in space and access time ● Build the machinery that allows accessing it... – …as if it were an object – …without copying it to process-private memory 7

Slide 11

Slide 11 text

How do I get objects into this thing? Slightly more complex ● Define a schema that is: –struct— – Easily manipulated without serialization – Efficient in space and access time ● Build the machinery that allows accessing it... – …as if it were an object – …without copying it to process-private memory –proxies— 7

Slide 12

Slide 12 text

Structs In C: struct { int a; float b; bool c; } In Python import struct struct.pack( "if?", 1, 2.0, True) 8

Slide 13

Slide 13 text

Structs ● Why on earth get C into this? – Native machine code can access struct elements natively – Widely portable (most every language can parse C structs in some way or another) – Cython 8

Slide 14

Slide 14 text

Proxies ● Classes that know where a struct lays within a buffer ● They convert attribute access to struct access: x = Proxy(buf, offset=10) x.a # reads the int x.b # reads the float x.c # reads the bool 9

Slide 15

Slide 15 text

Proxies ● Don’t require serialization – It’s enough to know where the struct is (ie, have a pointer) ● The can easily be “repointed” – Change the offset to switch the proxy to another object – Avoids python object creation overhead ● Relativley transparent – They look quite like the original object – They can even quack like the original as well 9

Slide 16

Slide 16 text

Proxies – adding complexity struct ComplexProxy { int value; int child_left_offset; int child_right_offset; } class ComplexObj: def __init__(self, l = None, r = None): self.value = 3 self.left = l self.right = r 10

Slide 17

Slide 17 text

Proxies – adding complexity class ComplexProxy: def __init__(self, buf, pos): self.buf = buf self.pos = pos a = IntProperty(offset=0) b = ProxyProperty(ComplexProxy, offset=4) c = ProxyProperty(ComplexProxy, offset=8) class IntProperty: def __get__(self, obj, kls): return unpack("i", obj.buf, obj.pos+self.offset) class ProxyProperty: def __get__(self, obj, kls): voffset = unpack("i", obj.buf, obj.pos+self.offset) return ComplexProxy( obj.buf, voffset) 11

Slide 18

Slide 18 text

Proxies – cyclic references – OOPS! ● It gets tricky when you add cyclic references – They need to be recognized when building the buffer – They require care, as always ● A few options available: – Forbid them – Allow them 12

Slide 19

Slide 19 text

Proxies – cyclic references – OOPS! Identity maps ● id(object) → offset ● When an object is packed, update the identity map – Check it also to detect already-packed objects ● Compresses the file – Unifies repeated references to the same object ● Breaks cycles 13

Slide 20

Slide 20 text

Proxies – cyclic references – OOPS! Identity maps ● Tricky points – If the buffer is built by iterating a generator, you will probably get different objects with the same id() ● The identity map has to be synchronized with the lifetime of in-memory objects at all times. If an object is destroyed, its entry on the identity map must be removed as well. – The identity map can get quite big ● In particular when packing millions of objects into large buffers 13

Slide 21

Slide 21 text

Wait a minute

Slide 22

Slide 22 text

You said “no serialization”

Slide 23

Slide 23 text

Manipulation without serialization ● Building a buffer is expensive – Kinda like serializing, sure

Slide 24

Slide 24 text

Manipulation without serialization ● Building a buffer is expensive – Kinda like serializing, sure ● But... using it, isn’t – Open – Read – Search – Even write (up to a point) 13

Slide 25

Slide 25 text

Manipulation without serialization Structure of an object 14

Slide 26

Slide 26 text

Manipulation without serialization Structure of an object Attribute bitmap present:11010000 nulls:11010000

Slide 27

Slide 27 text

Manipulation without serialization Structure of an object Attribute bitmap present:11010000 nulls:11010000 a : 4 bytes : int b : 4 bytes : float *c : 8 bytes : uint

Slide 28

Slide 28 text

Manipulation without serialization Structure of an object Attribute bitmap present:11010000 nulls:11010000 a : 4 bytes : int b : 4 bytes : float *c : 8 bytes : uint c : 12 bytes : str

Slide 29

Slide 29 text

Manipulation without serialization Nesting objects Attribute bitmap present:11010000 nulls:11010000 a : 4 bytes : int b : 4 bytes : float *c : 8 bytes : uint c : N bytes : object Attribute bitmap present:11010000 nulls:11010000 a : 4 bytes : int b : 4 bytes : float *c : 8 bytes : uint c : N bytes : object

Slide 30

Slide 30 text

Manipulation without serialization Dynamic typing Attribute bitmap present:11010000 nulls:11010000 a : 4 bytes : int b : 4 bytes : float *c : 8 bytes : uint c : N bytes : any typecode a : 4 bytes : int value : 8 bytes : double 17

Slide 31

Slide 31 text

Manipulation without serialization Linear sequences *i1 *i2 *i3 *i4 v1 : 4b v2 : 4b v3 : 10b v4 : 40b Index Data

Slide 32

Slide 32 text

Manipulation without serialization Writing *i1 *i2 *i3 *i4 v1 : 4b v2 : 4b v3 : 10b v4 : 40b Index Data

Slide 33

Slide 33 text

Manipulation without serialization Writing *i1 *i2 *i3 *i4 v1 : 4b v2 : 4b v3 : 10b v4 : 40b Index Data 20

Slide 34

Slide 34 text

Associative maps ● Compact hash table: – Sorted array of tuples – Binary search optimized for uniform distributions ● One prediction given the known key distribution (hash) ● One iteration of exponential search to adjust the prediction ● Finalize with a regular binary search ● Approximate hash table: – Throw away the key, assume hash collisions as acceptable error – Particularly efficient with long string keys

Slide 35

Slide 35 text

Associative maps *k1 *k2 *k3 *k4 h1 h2 h3 h4 *v1 *v2 *v3 *v4 pedro 2324 4141 Index Keys Values

Slide 36

Slide 36 text

Associative maps alice bob cloe pedro 1 7 7 15 *v1 *v2 *v3 *v4 pedro 2324 4141 Index Keys Values m['bob'] == v2 m['cloe'] == v3

Slide 37

Slide 37 text

Approximate associative maps h1 h2 h3 h4 *v1 *v2 *v3 *v4 2324 4141 Index Values

Slide 38

Slide 38 text

Approximate associative maps 1 7 7 15 *v1 *v2 *v3 *v4 2324 4141 Index Values m['bob'] == m['cloe'] == [v2,v3]

Slide 39

Slide 39 text

Approximate associative maps Optimized binary search ... 3 27 87 56 23

Slide 40

Slide 40 text

Approximate associative maps Optimized binary search ... 3 27 87 56 Initial prediction

Slide 41

Slide 41 text

Approximate associative maps Optimized binary search ... 3 27 87 56 Initial prediction 29 45

Slide 42

Slide 42 text

Approximate associative maps Optimized binary search ... 3 27 87 56 Initial prediction 29 45

Slide 43

Slide 43 text

Approximate associative maps Optimized binary search ... 3 27 87 56 Initial prediction 29 45

Slide 44

Slide 44 text

Approximate associative maps Optimized binary search ... 3 27 87 56 Initial prediction 29 45 Upper bound

Slide 45

Slide 45 text

Approximate associative maps Optimized binary search ... 3 27 87 56 Initial prediction Upper bound Found

Slide 46

Slide 46 text

Speed ● Performance: – Only "hot" data set (most used) needs to fit in RAM – Optimized search in 2 log( ) ɛ ● ɛ being the error between prediction and actual position ● < n ɛ ● Approximate hash table: – Fixed size even with big keys (long strings) – Even more efficient access (no need to verify and store keys) 25

Slide 47

Slide 47 text

Speed ● Performance: – Good disk access pattern even if it won’t fit in RAM: ● Exponential search is mostly sequential access ● Good locality with good predictions – O(1) seeks on average ● Possibility to preload the index to RAM – Much more likely to fit than values or keys

Slide 48

Slide 48 text

Speed ● Cython magic: – Instead of using struct everywhere – Avoids building python objects for temporary operations ● Proxy reuse: – Instead of building new proxies, repoint a reusable one – Type transmutation to change the shape of a proxy ● proxy.__class__ = new_cls 28

Slide 49

Slide 49 text

Related fascinating stuff http://poshmodule.sourceforge.net/posh/posh.pdf

Slide 50

Slide 50 text

Don’t write it pip install it Sharedbuffers Questions?