How PyPy can help High Performance Computing

How PyPy can help High Performance Computing

Cdc3cafa377f0e0e93fc69636021ef65?s=128

Antonio Cuni

August 23, 2018
Tweet

Transcript

  1. How PyPy can help High Performance Computing How PyPy can

    help High Performance Computing
  2. Short bio Short bio PyPy core dev since 2006 pdb++,

    CFFI, vmprof, capnpy, ... @antocuni https://github.com/antocuni (https://github.com/antocuni) https://bitbucket.org/antocuni (https://bitbucket.org/antocuni)
  3. How many of you use Python? How many of you

    use Python?
  4. How many have ever had performance problems? How many have

    ever had performance problems?
  5. Why do you use Python, then? Why do you use

    Python, then?
  6. Python strong points Python strong points Simplicity Lots of libraries

    Ecosystem Ok, but why?
  7. Python Python REAL strong points strong points Expressive & simple

    APIs Uniform typesystem (everything is an object) Powerful abstractions
  8. Example: JSON Example: JSON

  9. JSONObject jsonObj = new JSONObject(jsonString); JSONArray jArray = jsonObj.getJSONArray("data"); int

    length = jArray.length(); for(int i=0; i<length; i++) { JSONObject jObj = jArray.getJSONObject(i); String id = jObj.optString("id"); String name=jObj.optString("name"); JSONArray ingredientArray = jObj.getJSONArray("Ingredients"); int size = ingredientArray.length(); ArrayList<String> Ingredients = new ArrayList<>(); for(int j=0; j<size; j++) { JSONObject json = ja.getJSONObject(j); Ingredients.add(json.optString("name")); } } // googled for "getJSONArray example", found this: // https://stackoverflow.com/questions/32624166/how-to-get-json-array-within-j son-object
  10. obj = json.loads(string) for item in obj['data']: id = item['id']

    name = item['name'] ingredients = [] for ingr in item["ingredients"]: ingredients.append(ingr['name'])
  11. So far so good, BUT So far so good, BUT

    abstraction iterators abstraction temp objects abstraction classes/methods/functions core of computation
  12. Example of temporary objects Example of temporary objects Bound methods

    Bound methods In [ ]: class A(object): def foo(self): return 42 a = A() bound_foo = a.foo %timeit a.foo() %timeit bound_foo()
  13. Ideally Ideally Think of concepts, not implementation details Think of

    concepts, not implementation details Real world Real world Details leak to the user Details leak to the user
  14. Python problem Python problem Tension between abstractions and performance Tension

    between abstractions and performance
  15. Classical Python approaches to performance Classical Python approaches to performance

  16. 1. Work around in the user code 1. Work around

    in the user code e.g. create bound methods beforehand e.g. create bound methods beforehand
  17. 2. Work around in the language specs 2. Work around

    in the language specs range vs xrange dict.keys vs .iterkeys int vs long array.array vs list Easier to implement Harder to use Clutter the language unnecessarily More complex to understand Not really Pythonic
  18. 3. Stay in C as much as possible 3. Stay

    in C as much as possible In [29]: In [31]: numbers = range(1000) % timeit [x*2 for x in numbers] import numpy as np numbers = np.arange(1000) % timeit numbers*2 10000 loops, best of 3: 47.1 µs per loop The slowest run took 17.59 times longer than the fastest. This could mean that an intermediate result is being cached. 1000000 loops, best of 3: 1.48 µs per loop
  19. 4. Rewrite in C 4. Rewrite in C #include "Python.h"

    Cython CFFI
  20. "Rewrite in C" approach "Rewrite in C" approach aka, 90/10

    rule aka, 90/10 rule
  21. None
  22. None
  23. None
  24. Abstractions cost Code quality => poor performance Python parts become

    relevant
  25. Python in the HPC world Python in the HPC world

    Python as a glue-only language Python as a glue-only language Tradeo between speed and code quality Tradeo between speed and code quality
  26. PyPy PyPy Alternative Python implementation Ideally: no visible difference to

    the user JIT compiler http://pypy.org (http://pypy.org)
  27. How fast is PyPy? How fast is PyPy? Wrong question

    Wrong question Up to 80x faster in extreme cases 10x faster in good cases 2x faster on "random" code sometime it's just slower
  28. PyPy aws PyPy aws Far from being perfect it leaks

    other implementation details than CPython e.g. JIT warmup, GC pecularities
  29. PyPy qualities PyPy qualities

  30. Make pythonic, idiomatic code fast Make pythonic, idiomatic code fast

  31. Abstractions are (almost) free Abstractions are (almost) free

  32. The better the code, the biggest the speedup The better

    the code, the biggest the speedup
  33. None
  34. None
  35. Python as a rst class language Python as a rst

    class language No longer "just glue" No longer "just glue"
  36. Example: Sobel lter Example: Sobel lter Extendend version "The Joy

    of PyPy: Abstractions for Free", EP 2017 https://speakerdeck.com/antocuni/the-joy-of-pypy-jit-abstractions-for-free (https://speakerdeck.com/antocuni/the-joy-of-pypy-jit-abstractions-for-free) https://www.youtube.com/watch?v=NQfpHQII2cU (https://www.youtube.com/watch?v=NQfpHQII2cU)
  37. The The BIG problem: C extensions problem: C extensions CPython

    CPython
  38. PyPy (cpyext) PyPy (cpyext)

  39. None
  40. cpyext cpyext PyPy version of Python.h Compatibility layer Most C

    extensions just work: numpy, scipy, pandas, etc. Slow :( Use CFFI whenever it's possible
  41. We are working on it We are working on it

    Future status (hopefully) Future status (hopefully) All C extensions will just work C code as fast as today, Python code super-fast The best of both worlds PyPy as the default choice for HPC My personal estimate: 6 months of work and we have a fast cpyext (let's talk about money :))
  42. That's all That's all Questions? Questions?