Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Compiled, automatically parallel Python with Pyfora

Compiled, automatically parallel Python with Pyfora

Slides from the April 11, 2016 ODSC talk. http://www.meetup.com/New-York-ODSC/events/229694046/

Thomas Peters

April 12, 2016
Tweet

More Decks by Thomas Peters

Other Decks in Programming

Transcript

  1. Who I am • Engineer at Ufora for last 4

    years. • Before that did PhD in pure math at Columbia. • Big fan of compilers, machine learning, and eating. 2
  2. What is ufora? • Team of computer scientists and engineers.

    • Build tools for data science. • Not so much in business of making models, more of speeding up existing models. • We provide consulting, training, and support. • Developers of Pyfora (open source). www.ufora.com 3
  3. The problems Pyfora addresses: 1. Distributed computing in data science

    is hard • What if my dataset is larger than RAM? • What if I really want to run my model on all of the data? • What if I want to fit many models at once? (For example, cross- validation or grid search). • Why do I need different tools for different scale? 5
  4. The problems Pyfora addresses: 2. Writing fast Python is hard

    • Cython and C-extensions are tricky. • Numpy is great, but can't do everything. • Why can't loops be fast? Custom classes? Higher- order functions? • (There are many other python implementations which address this problem: pypy, numba, …) 6
  5. Pyfora is JIT-compiled, automatically-parallel Python. • JIT-compiled: we produce fast

    machine code (C-speed) during runtime. • Automatically-parallel: parallelism happens without the user needing to be aware of threads, processes, synchonization, etc. • All of this while writing ordinary python. (And we're open source: https://github.com/ufora/ufora) 7
  6. Pyfora consists of two main components 1. A distributed backend

    that runs on one or more machines in your local network or in the cloud (on your own machines). This runs on docker. 2. A python package ( ) that sends code from your local python process to the backend for compilation and execution (aka, the client, or frontend). 8
  7. Example. (AWS version) Installation (of client): Start a cluster (in

    the cloud) … This will output the IP of the manager node, and all other worker nodes. Deploying a cluster is that easy! 10
  8. Run some code Note: Only the lines in the -block

    body are executed in parallel in pyfora. 11
  9. Result? This program runs in 13.76 seconds on a 3.40GHz

    Intel(R) Core(TM) i7-2600 quad-core (8 hyperthreaded) CPU, and utilizes all 8 cores. The same program in the local python interpreter takes 185.95 seconds and uses one core. 12
  10. How do we do this? • By splitting, and adaptive

    parallelism (to be explained shortly). • By taking advantage of pure functions and stateless generators. 14
  11. … 100 0 - 50 50 - 100 0 -

    25 25 - 50 50 - 75 75 - 100 CPU 1 CPU 2 CPU 3 CPU 4 Splitting Adaptive Parallelism 15
  12. For this to work, we needed: 1. to be a

    pure, stateless function. 2. xrange is a stateless generator. (essentially because it implements ) 3. on lists is associative. 16
  13. OK … what's the catch? • No mutable data structures.

    ◦ is not allowed. is ok (and fast) ◦ , are not allowed. ◦ Numpy arrays are ok, but mutating operations are not allowed. • No operations can have side effects ◦ E.g., no writing to files. ◦ Side-effectful operations can still happen in the host Python process, while referencing objects in remote calls. ◦ is OK, but it is a no op. • All operations are deterministic ◦ E.g., no access to within a remote call. 17
  14. What happens if you violate these constraints? • You'll get

    an exception -- either at runtime or at "parse time" (when we ship code to the server and define it there). • In general, pyfora should always give the same answer as normal CPython, or it should throw an exception saying it can't handle that code. 18
  15. Other parallel constructs? We saw that and list comprehensions are

    parallel (when possible). The pyfora runtime searches for parallel operations as code executes, and these come in the form of independent function calls. For example, if we see We can execute these in parallel, due to our immutability assumptions. 19
  16. This isn't as silly as it may seem. Take our

    regression tree code. It looks roughly like: … … By design, these last two calls to fit can (and are) automatically parallelized in pyfora. 20
  17. What are we working on? • Better library support. •

    More language support. • GPU computing. • Better computation/data scheduling. • Better JIT compilation. • CI tooling (TestLooper: statistical QA) 21
  18. A little history • We started with our own functional

    language, Fora. • Fora has implicit parallelism and JIT-compilation built in. • Getting people to adopt a new language is not easy. • So we built a source-to-source compiler: Python -> Fora. • Now, Fora is our "bitcode". 22
  19. The end! Remember: Pyfora is auto-parallel, compiled Python. • Find

    us on github: https://github.com/ufora/ufora • Read our docs: http://docs.pyfora.com/ • Follow us on twitter: @uforainc • Checkout our Google groups: ufora-dev, ufora-user • Email me: [email protected] Try it out! Let us know your experiences and things you'd like implemented! 23