$30 off During Our Annual Pro Sale. View Details »

EuroSciPy 2012 Parallel Python Tutorial

ianozsvald
September 04, 2012

EuroSciPy 2012 Parallel Python Tutorial

My EuroSciPy 2012 tutorial covering 5 ways of running python in parallel (multiprocessing, parallelpython, gearman, picloud, ipython cluster) written up here: http://ianozsvald.com/2012/09/04/euroscipy-parallel-python-tutorial-now-online/

ianozsvald

September 04, 2012
Tweet

More Decks by ianozsvald

Other Decks in Technology

Transcript

  1. [email protected] @IanOzsvald - EuroSciPy 2012
    Parallel Python (2 hour tutorial)
    Parallel Python (2 hour tutorial)
    EuroSciPy 2012

    View Slide

  2. [email protected] @IanOzsvald - EuroSciPy 2012
    Goal
    Goal
    • Evaluate some parallel options for core-
    bound problems using Python
    • Your task is probably in pure Python, may
    be CPU bound and can be parallelised
    (right?)
    • We're not looking at network-bound
    problems
    • Focusing on serial->parallel in easy steps

    View Slide

  3. [email protected] @IanOzsvald - EuroSciPy 2012
    About me (Ian Ozsvald)
    About me (Ian Ozsvald)
    • A.I. researcher in industry for 13 years
    • C, C++ before, Python for 9 years
    • pyCUDA and Headroid at EuroPythons
    • Lecturer on A.I. at Sussex Uni (a bit)
    • StrongSteam.com co-founder
    • ShowMeDo.com co-founder
    • IanOzsvald.com - MorConsulting.com
    • Somewhat unemployed right now...

    View Slide

  4. [email protected] @IanOzsvald - EuroSciPy 2012
    Something to consider
    Something to consider
    • “Proebsting's Law”
    http://research.microsoft.com/en-us/um/people/t
    “improvements to compiler technology
    double the performance of typical
    programs every 18 years”
    • Compiler advances (generally) unhelpful
    (sort-of – consider auto vectorisation!)
    • Multi-core/cluster increasingly common

    View Slide

  5. [email protected] @IanOzsvald - EuroSciPy 2012
    Group photo
    Group photo
    • I'd like to take a photo - please smile :-)

    View Slide

  6. [email protected] @IanOzsvald - EuroSciPy 2012
    Overview (pre-requisites)
    Overview (pre-requisites)
    • multiprocessing
    • ParallelPython
    • Gearman
    • PiCloud
    • IPython Cluster
    • Python Imaging Library

    View Slide

  7. [email protected] @IanOzsvald - EuroSciPy 2012
    We won't be looking at...
    We won't be looking at...
    • Algorithmic or cache choices
    • Gnumpy (numpy->GPU)
    • Theano (numpy(ish)->CPU/GPU)
    • BottleNeck (Cython'd numpy)
    • CopperHead (numpy(ish)->GPU)
    • BottleNeck
    • Map/Reduce
    • pyOpenCL, EC2 etc

    View Slide

  8. [email protected] @IanOzsvald - EuroSciPy 2012
    What can we expect?
    What can we expect?
    • Close to C speeds (shootout):
    http://shootout.alioth.debian.org/u32/which-programm
    http://attractivechaos.github.com/plb/
    • Depends on how much work you put in
    • nbody JavaScript much faster than
    Python but we can catch it/beat it (and
    get close to C speed)

    View Slide

  9. [email protected] @IanOzsvald - EuroSciPy 2012
    Practical result - PANalytical
    Practical result - PANalytical

    View Slide

  10. [email protected] @IanOzsvald - EuroSciPy 2012
    Our building blocks
    Our building blocks
    • serial_python.py
    • multiproc.py
    • git clone
    [email protected]:ianozsvald/Para
    llelPython_EuroSciPy2012.git
    • Google “github ianozsvald” ->
    ParallelPython_EuroSciPy2012
    • $ python serial_python.py

    View Slide

  11. [email protected] @IanOzsvald - EuroSciPy 2012
    Mandelbrot problem
    Mandelbrot problem
    • Embarrassingly parallel
    • Varying times to calculate each pixel
    • We choose to send array of setup data
    • CPU bound with large data payload

    View Slide

  12. [email protected] @IanOzsvald - EuroSciPy 2012
    multiprocessing
    multiprocessing
    • Using all our CPUs is cool, 4 are
    common, 32 will be common
    • Global Interpreter Lock (isn't our enemy)
    • Silo'd processes are easiest to
    parallelise
    • http://docs.python.org/library/multiprocessing

    View Slide

  13. [email protected] @IanOzsvald - EuroSciPy 2012
    multiprocessing Pool
    multiprocessing Pool
    • # multiproc.py
    • p = multiprocessing.Pool()
    • po = p.map_async(fn, args)
    • result = po.get() # for all po
    objects
    • join the result items to make full result

    View Slide

  14. [email protected] @IanOzsvald - EuroSciPy 2012
    Making chunks of work
    Making chunks of work
    • Split the work into chunks (follow my
    code)
    • Splitting by number of CPUs is a good
    start
    • Submit the jobs with map_async
    • Get the results back, join the lists

    View Slide

  15. [email protected] @IanOzsvald - EuroSciPy 2012
    Time various chunks
    Time various chunks
    • Let's try chunks: 1,2,4,8
    • Look at Process Monitor - why not 100%
    utilisation?
    • What about trying 16 or 32 chunks?
    • Can we predict the ideal number?
    – what factors are at play?

    View Slide

  16. [email protected] @IanOzsvald - EuroSciPy 2012
    How much memory moves?
    How much memory moves?
    • sys.getsizeof(0+0j) # bytes
    • 250,000 complex numbers by default
    • How much RAM used in q?
    • With 8 chunks - how much memory per
    chunk?
    • multiprocessing uses pickle, max
    32MB pickles
    • Process forked, data pickled

    View Slide

  17. [email protected] @IanOzsvald - EuroSciPy 2012
    ParallelPython
    ParallelPython
    • Same principle as multiprocessing but
    allows >1 machine with >1 CPU
    • http://www.parallelpython.com/
    • Seems to work poorly with lots of data
    (e.g. 8MB split into 4 lists...!)
    • We can run it locally, run it locally via
    ppserver.py and run it remotely too
    • Can we demo it to another machine?

    View Slide

  18. [email protected] @IanOzsvald - EuroSciPy 2012
    ParallelPython
    ParallelPython
    • ifconfig gives us IP address
    • NBR_LOCAL_CPUS=0
    • ppserver('your ip')
    • nbr_chunks=1 # try lots?
    • term2$ ppserver.py -d
    • parallel_python_and_ppserver.p
    y
    • Arguments: 1000 50000

    View Slide

  19. [email protected] @IanOzsvald - EuroSciPy 2012
    ParallelPython + binaries
    ParallelPython + binaries
    • We can ask it to use modules, other
    functions and our own compiled
    modules
    • Works for Cython and ShedSkin
    • Modules have to be in PYTHONPATH
    (or current directory for ppserver.py)

    View Slide

  20. [email protected] @IanOzsvald - EuroSciPy 2012

    “timeout: timed out”
    timeout: timed out”
    • Beware the timeout problem, the default
    timeout isn't helpful:
    – pptransport.py
    – TRANSPORT_SOCKET_TIMEOUT =
    60*60*24 # from 30s
    • Remember to edit this on all copies of
    pptransport.py

    View Slide

  21. [email protected] @IanOzsvald - EuroSciPy 2012
    Gearman
    Gearman
    • C based (was Perl) job engine
    • Many machine, redundant
    • Optional persistent job listing (using e.g.
    MySQL, Redis)
    • Bindings for Python, Perl, C, Java, PHP,
    Ruby, RESTful interface, cmd line
    • String-based job payload (so we can
    pickle)

    View Slide

  22. [email protected] @IanOzsvald - EuroSciPy 2012
    Gearman worker
    Gearman worker
    • First we need a worker.py with
    calculate_z
    • Will need to unpickle the in-bound
    data and pickle the result
    • We register our task
    • Now we work forever
    • Run with Python for 1 core

    View Slide

  23. [email protected] @IanOzsvald - EuroSciPy 2012
    Gearman blocking client
    Gearman blocking client
    • Register a GearmanClient
    • pickle each chunk of work
    • submit jobs to the client, add to our job
    list
    • #wait_until_completion=True
    • Run the client
    • Try with 2 workers

    View Slide

  24. [email protected] @IanOzsvald - EuroSciPy 2012
    Gearman nonblocking client
    Gearman nonblocking client
    • wait_until_completion=False
    • Submit all the jobs
    • wait_until_jobs_completed(jobs
    )
    • Try with 2 workers
    • Try with 4 or 8 (just like multiprocessing)
    • Annoying to instantiate workers by hand

    View Slide

  25. [email protected] @IanOzsvald - EuroSciPy 2012
    Gearman remote workers
    Gearman remote workers
    • We should try this (might not work)
    • Someone register a worker to my IP
    address
    • If I kill mine and I run the client...
    • Do we get cross-network workers?
    • I might need to change 'localhost'

    View Slide

  26. [email protected] @IanOzsvald - EuroSciPy 2012
    PiCloud
    PiCloud
    • AWS EC2 based Python engines
    • Super easy to upload long running
    (>1hr) jobs, <1hr semi-parallel
    • Can buy lots of cores if you want
    • Has file management using AWS S3
    • More expensive than EC2
    • Billed by millisecond

    View Slide

  27. [email protected] @IanOzsvald - EuroSciPy 2012
    PiCloud
    PiCloud
    • Realtime cores more expensive but as
    parallel as you need
    • Trivial conversion from multiprocessing
    • 20 free hours per month
    • Execution time must far exceed data
    transfer time!

    View Slide

  28. [email protected] @IanOzsvald - EuroSciPy 2012
    IPython Cluster
    IPython Cluster
    • Parallel support inside IPython
    – MPI
    – Portable Batch System
    – Windows HPC Server
    – StarCluster on AWS
    • Can easily push/pull objects around the
    network
    • 'list comprehensions'/map around
    engines

    View Slide

  29. [email protected] @IanOzsvald - EuroSciPy 2012
    IPython Cluster
    IPython Cluster
    $ ipcluster start --n=8
    >>> from IPython.parallel import
    Client
    >>> c = Client()
    >>> print c.ids
    >>> directview = c[:]

    View Slide

  30. [email protected] @IanOzsvald - EuroSciPy 2012
    IPython Cluster
    IPython Cluster
    • Jobs stored in-memory, sqlite, Mongo
    • $ ipcluster start --n=8
    • $ python ipythoncluster.py
    • Load balanced view more efficient for us
    • Greedy assignment leaves some
    engines over-burdened due to uneven
    run times

    View Slide

  31. [email protected] @IanOzsvald - EuroSciPy 2012
    Recommendations
    Recommendations
    • Multiprocessing is easy
    • ParallelPython is trivial step on
    • PiCloud just a step more
    • IPCluster good for interactive research
    • Gearman good for multi-language &
    redundancy
    • AWS good for big ad-hoc jobs

    View Slide

  32. [email protected] @IanOzsvald - EuroSciPy 2012
    Bits to consider
    Bits to consider
    • Cython being wired into Python (GSoC)
    • PyPy advancing nicely
    • GPUs being interwoven with CPUs
    (APU)
    • Learning how to massively parallelise is
    the key

    View Slide

  33. [email protected] @IanOzsvald - EuroSciPy 2012
    Future trends
    Future trends
    • Very-multi-core is obvious
    • Cloud based systems getting easier
    • CUDA-like APU systems are inevitable
    • disco looks interesting, also blaze
    • Celery, R3 are alternatives
    • numpush for local & remote numpy
    • Auto parallelise numpy code?

    View Slide

  34. [email protected] @IanOzsvald - EuroSciPy 2012
    Job/Contract hunting
    Job/Contract hunting
    • Computer Vision cloud API start-up
    didn't go so well strongsteam.com
    • Returning to London, open to travel
    • Looking for HPC/Parallel work, also NLP
    and moving to Big Data

    View Slide

  35. [email protected] @IanOzsvald - EuroSciPy 2012
    Feedback
    Feedback
    • Write-up: http://ianozsvald.com
    • I want feedback (and a testimonial
    please)
    • Should I write a book on this?
    [email protected]
    • Thank you :-)

    View Slide