Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Pushing Python: Lessons Learned Building a High...

Pushing Python: Lessons Learned Building a High Throughput Service in Python by Kevin Ballard

Taba is a distributed metrics aggregator, similar in concept to statsd. Built with Python using Redis, gevent, and Cython, it currently handles over 6M events/sec with strong consistency guarantees. This talk will present an overview of its design, and discuss the challenges and solutions encountered in the process of building a high throughput, low latency distributed service.

PyCon 2014

April 12, 2014
Tweet

More Decks by PyCon 2014

Other Decks in Technology

Transcript

  1. Taba •  Distributed event aggregation service import taba ... taba.RecordValue(‘winning_bid_price’,

    wincpm) ... $ taba-cli aggregate winning_bid_price {“name”: “winning_bid_price”, “10m”: {“count”: 14709, “total”: 5836.4}, “percentiles”: [0.07 0.16 0.32 0.84 1.33 8.03]}
  2. Data Model Event:  (‘bid_cpm’,  ‘Counter’,  time(),  0.233)      

    State:             Aggregate:  {“10m”:  43.9,  “1h”:  592.22}    
  3. Fragmentation •  Fragmentation is when a process’s heap is inefficiently

    used. •  The GC may report a low memory footprint, but the OS reports a much larger RSS.
  4. Hybrid Memory Management •  Use Cython to allocate page-sized blocks

    of pointers into incoming chunk •  Hand-off the whole thing to the CPython memory manager •  Whole thing gets deallocated at once
  5. Ratcheting •  Ratcheting is a pathological case of Fragmentation, caused

    by the fact that the heap must be contiguous*: •  It’s a limitation of CPython that it cannot compact memory (mostly due to extensions).
  6. Ratcheting •  Ratcheting is a pathological case of Fragmentation, caused

    by the fact that the heap must be contiguous*: •  It’s a limitation of CPython that it cannot compact memory (mostly due to extensions).
  7. Ratcheting •  Ratcheting is a pathological case of Fragmentation, caused

    by the fact that the heap must be contiguous*: •  It’s a limitation of CPython that it cannot compact memory (mostly due to extensions).
  8. Ratcheting •  Ratcheting is a pathological case of Fragmentation, caused

    by the fact that the heap must be contiguous*: •  It’s a limitation of CPython that it cannot compact memory (mostly due to extensions).
  9. Ratcheting •  Avoid persistent objects •  Sockets are common offenders

    •  Anything that has to be persistent should be created at application startup, before processing data •  Avoid letting the heap grow in the first place
  10. fin. github.com/tellapart/taba       [email protected]      |  

     @misterkgb     We’re  Hiring!      tellapart.com/careers