Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PyConZA 2013: "Software Transactional Memory with PyPy" by Armin Rigo

Pycon ZA
October 04, 2013

PyConZA 2013: "Software Transactional Memory with PyPy" by Armin Rigo

PyPy is a fast alternative Python implementation. Software Transactional Memory is a current academic research topic. Put the two together --brew for a couple of years-- and we get a version of PyPy that runs on multiple cores, without the infamous Global Interpreter Lock (GIL). It has been freshly released in beta, including integration with the Just-in-Time compiler.

But its point is not only of being "GIL-less": it can also give the illusion of single-threaded programming. I will give examples of what exactly I mean by that. Starting from the usual explicitly multithreaded demos, I will move to other examples where the actual threads are hidden from the programmer. I will explain how the core of async libraries (Twisted, Tornado, gevent, ...) can be modified to use multiples threads, without exposing any concurrency issues to the user of the library --- existing Twisted/etc. programs still run correctly without change. (They may need a few small changes to enable parallelism.)

I will also give an overview of how things work under the cover: the 10000-feet view is to internally create copies of objects and write changes into these copies. This allows the originals to continue being used by other threads.

Pycon ZA

October 04, 2013
Tweet

More Decks by Pycon ZA

Other Decks in Programming

Transcript

  1. Software Transactional Memory with PyPy Armin Rigo PyCon ZA 2013

    4th October 2013 arigo Software Transactional Memory with PyPy
  2. Introduction me: Armin Rigo what is PyPy: an alternative implementation

    of Python very compatible main focus is on speed arigo Software Transactional Memory with PyPy
  3. SQL by example BEGIN TRANSACTION; SELECT * FROM ...; UPDATE

    ...; COMMIT; arigo Software Transactional Memory with PyPy
  4. Python by example ... x = obj.value obj.value = x

    + 1 ... arigo Software Transactional Memory with PyPy
  5. Python by example ... x = obj.value obj.value = x

    + 1 ... arigo Software Transactional Memory with PyPy
  6. Python by example begin_transaction() x = obj.value obj.value = x

    + 1 commit_transaction() arigo Software Transactional Memory with PyPy
  7. Python by example the_lock.acquire() x = obj.value obj.value = x

    + 1 the_lock.release() arigo Software Transactional Memory with PyPy
  8. Python by example with the_lock: x = obj.value obj.value =

    x + 1 arigo Software Transactional Memory with PyPy
  9. Python by example with atomic: x = obj.value obj.value =

    x + 1 arigo Software Transactional Memory with PyPy
  10. Locks != Transactions BEGIN TRANSACTION; BEGIN TRANSACTIO SELECT * FROM

    ...; SELECT * FROM .. UPDATE ...; UPDATE ...; COMMIT; COMMIT; arigo Software Transactional Memory with PyPy
  11. Locks != Transactions with the_lock: with the_lock: x = obj.val

    x = obj.val obj.val = x + 1 obj.val = x + 1 arigo Software Transactional Memory with PyPy
  12. Locks != Transactions with atomic: with atomic: x = obj.val

    x = obj.val obj.val = x + 1 obj.val = x + 1 arigo Software Transactional Memory with PyPy
  13. STM Transactional Memory advanced but not magic (same as databases)

    arigo Software Transactional Memory with PyPy
  14. By the way STM replaces the GIL (Global Interpreter Lock)

    any existing multithreaded program runs on multiple cores arigo Software Transactional Memory with PyPy
  15. By the way the GIL is necessary and very hard

    to avoid, but if you look at it like a lock around every single subexpression, then it can be replaced with with atomic too arigo Software Transactional Memory with PyPy
  16. So... yes, any existing multithreaded program runs on multiple cores

    yes, we solved the GIL great arigo Software Transactional Memory with PyPy
  17. So... no, it would be quite hard to implement it

    in standard CPython too bad for now, only in PyPy but it would not be completely impossible arigo Software Transactional Memory with PyPy
  18. But... but only half of the story in my opinion

    :-) arigo Software Transactional Memory with PyPy
  19. Example 1 def apply_interest(self): self.balance *= 1.05 for account in

    all_accounts: account.apply_interest() arigo Software Transactional Memory with PyPy
  20. Example 1 def apply_interest(self): self.balance *= 1.05 for account in

    all_accounts: account.apply_interest() ^^^ run this loop multithreaded arigo Software Transactional Memory with PyPy
  21. Example 1 def apply_interest(self): #with atomic: --- automatic self.balance *=

    1.05 for account in all_accounts: add_task(account.apply_interest) run_all_tasks() arigo Software Transactional Memory with PyPy
  22. Internally run_all_tasks() manages a pool of threads each thread runs

    tasks in a with atomic uses threads, but internally only very simple, pure Python arigo Software Transactional Memory with PyPy
  23. Example 2 def next_iteration(all_trains): for train in all_trains: start_time =

    ... for othertrain in train.deps: if ...: start_time = ... train.start_time = start_time arigo Software Transactional Memory with PyPy
  24. Example 2 def compute_time(train): ... train.start_time = ... def next_iteration(all_trains):

    for train in all_trains: add_task(compute_time, train) run_all_tasks() arigo Software Transactional Memory with PyPy
  25. Conflicts like database transactions but with objects instead of records

    the transaction aborts and retries automatically arigo Software Transactional Memory with PyPy
  26. Inevitable "inevitable" (means "unavoidable") handles I/O in a with atomic

    cannot abort the transaction any more arigo Software Transactional Memory with PyPy
  27. Current status basics work, JIT compiler integration almost done different

    executable (pypy-stm instead of pypy) slow-down: around 3x (in bad cases up to 10x) real time speed-ups measured with 4 or 8 cores Linux 64-bit only arigo Software Transactional Memory with PyPy
  28. User feedback implemented: Detected conflict: File "foo.py", line 58, in

    wtree walk(root) File "foo.py", line 17, in walk if node.left not in seen: Transaction aborted, 0.047 sec lost arigo Software Transactional Memory with PyPy
  29. User feedback not implemented yet: Forced inevitable: File "foo.py", line

    19, in walk print >> log, logentry Transaction blocked others for XX s arigo Software Transactional Memory with PyPy
  30. Asynchronous libraries future work tweak a Twisted reactor: run multithreaded,

    but use with atomic existing Twisted apps still work, but we need to look at conflicts/inevitables similar with Tornado, eventlib, and so on arigo Software Transactional Memory with PyPy
  31. Asynchronous libraries while True: events = epoll.poll() for event in

    events: queue.put(event) And in several threads: while True: event = queue.get() with atomic: handle(event) arigo Software Transactional Memory with PyPy
  32. More future work look at many more examples tweak data

    structures to avoid conflicts reduce slow-down, port to other OS’es arigo Software Transactional Memory with PyPy
  33. STM versus HTM Software versus Hardware CPU hardware specially to

    avoid the high overhead (Intel Haswell processor) too limited for now arigo Software Transactional Memory with PyPy
  34. Under the cover 10’000-feet overview every object can have multiple

    versions the shared versions are immutable the most recent version can belong to one thread synchronization only at the point where one thread "steals" another thread’s most recent version, to make it shared integrated with a generational garbage collector, with one nursery per thread arigo Software Transactional Memory with PyPy
  35. Summary transactions in Python a big change under the cover

    a small change for Python users (and the GIL is gone) this work is sponsored by crownfunding (thanks!) Q & A arigo Software Transactional Memory with PyPy