Slide 1

Slide 1 text

Kevin  Ballard   SFpython.org   2014-­‐03-­‐12  

Slide 2

Slide 2 text

kevin@   tellapart.com   Introductions

Slide 3

Slide 3 text

Taba •  Distributed event aggregation service import taba ... taba.RecordValue(‘winning_bid_price’, wincpm) ... $ taba-cli aggregate winning_bid_price {“name”: “winning_bid_price”, “10m”: {“count”: 14709, “total”: 5836.4}, “percentiles”: [0.07 0.16 0.32 0.84 1.33 8.03]}

Slide 4

Slide 4 text

Taba +10,000,000   events/sec   +50,000   metrics   +1,000   clients   +100   processors  

Slide 5

Slide 5 text

GET THE DATA MODEL RIGHT Lesson #1

Slide 6

Slide 6 text

Data Model

Slide 7

Slide 7 text

Data Model Event:  (‘bid_cpm’,  ‘Counter’,  time(),  0.233)       State:             Aggregate:  {“10m”:  43.9,  “1h”:  592.22}    

Slide 8

Slide 8 text

Data Model

Slide 9

Slide 9 text

Data Model

Slide 10

Slide 10 text

Data Model

Slide 11

Slide 11 text

STATE IS HARD Lesson #2

Slide 12

Slide 12 text

Centralizing State

Slide 13

Slide 13 text

GENERATORS + GREENLETS = AWESOME Lesson #3

Slide 14

Slide 14 text

Asynchronous Iterator •  JIT processing •  Automatically switches through I/O

Slide 15

Slide 15 text

CPYTHON SUFFERS FROM MEMORY FRAGMENTATION Lesson #4

Slide 16

Slide 16 text

Fragmentation •  Fragmentation is when a process’s heap is inefficiently used. •  The GC may report a low memory footprint, but the OS reports a much larger RSS.

Slide 17

Slide 17 text

Fragmentation

Slide 18

Slide 18 text

Fragmentation

Slide 19

Slide 19 text

Fragmentation

Slide 20

Slide 20 text

Fragmentation

Slide 21

Slide 21 text

Fragmentation

Slide 22

Slide 22 text

Hybrid Memory Management •  Use Cython to allocate page-sized blocks of pointers into incoming chunk •  Hand-off the whole thing to the CPython memory manager •  Whole thing gets deallocated at once

Slide 23

Slide 23 text

Hybrid Memory Management

Slide 24

Slide 24 text

Hybrid Memory Management

Slide 25

Slide 25 text

Hybrid Memory Management

Slide 26

Slide 26 text

Ratcheting •  Ratcheting is a pathological case of Fragmentation, caused by the fact that the heap must be contiguous*: •  It’s a limitation of CPython that it cannot compact memory (mostly due to extensions).

Slide 27

Slide 27 text

Ratcheting •  Ratcheting is a pathological case of Fragmentation, caused by the fact that the heap must be contiguous*: •  It’s a limitation of CPython that it cannot compact memory (mostly due to extensions).

Slide 28

Slide 28 text

Ratcheting •  Ratcheting is a pathological case of Fragmentation, caused by the fact that the heap must be contiguous*: •  It’s a limitation of CPython that it cannot compact memory (mostly due to extensions).

Slide 29

Slide 29 text

Ratcheting •  Ratcheting is a pathological case of Fragmentation, caused by the fact that the heap must be contiguous*: •  It’s a limitation of CPython that it cannot compact memory (mostly due to extensions).

Slide 30

Slide 30 text

Ratcheting •  Avoid persistent objects •  Sockets are common offenders •  Anything that has to be persistent should be created at application startup, before processing data •  Avoid letting the heap grow in the first place

Slide 31

Slide 31 text

fin. github.com/tellapart/taba       [email protected]      |    @misterkgb     We’re  Hiring!      tellapart.com/careers