Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Python and PyPy performance (not) for dummies

Python and PyPy performance (not) for dummies

https://www.youtube.com/watch?v=BZJ6q4eihCs
In this talk we would like to have a short introduction on how Python programs are compiled and executed, with a special attention towards just in time compilation done by PyPy. PyPy is the most advanced Python interpreter around and while it should generally just speed up your programs there is a wide range of performance that you can get out of PyPy, ranging from slightly faster than CPython to C speeds, depending on how you write your programs.

We will split the talk in two parts. In the first part we will explain how things work and what can and what cannot be optimized as well as describe the basic heuristics of JIT compiler and optimizer. In the next part we will do a survey of existing tools for looking at performance of Python programs with specific focus on PyPy.

As a result of this talk, an audience member should be better equipped with tools how to write new software and improve existing software with performance in mind.

The talk will be given by Antonio Cuni and Maciej Fijalkowski, both long time PyPy core developers and expert in the area of Python performance.

Antonio Cuni

July 21, 2015
Tweet

More Decks by Antonio Cuni

Other Decks in Programming

Transcript

  1. Python and PyPy performance
    (not) for dummies
    Antonio Cuni and Maciej Fijałkowski
    EuroPython 2015
    July 21, 2015
    antocuni,fijal (EuroPython 2015) Python and PyPy performance July 21, 2015 1 / 31

    View Slide

  2. About us
    PyPy core devs
    vmprof, cffi, pdb++, fancycompleter, ...
    Consultants
    http://baroquesoftware.com/
    antocuni,fijal (EuroPython 2015) Python and PyPy performance July 21, 2015 2 / 31

    View Slide

  3. Optimization for dummies
    Obligatory citation
    premature optimization is the root of all evil (D. Knuth)
    Pareto principle, or 80-20 rule
    80% of the time will be spent in 20% of the program
    20% of 1 mln is 200 000
    Two golden rules:
    1. Identify the slow spots
    2. Optimize them
    antocuni,fijal (EuroPython 2015) Python and PyPy performance July 21, 2015 3 / 31

    View Slide

  4. This talk
    Two parts
    1. How to identify the slow spots
    2. How to address the problems
    antocuni,fijal (EuroPython 2015) Python and PyPy performance July 21, 2015 4 / 31

    View Slide

  5. Part 1
    identifying the slow spots
    antocuni,fijal (EuroPython 2015) Python and PyPy performance July 21, 2015 5 / 31

    View Slide

  6. What is performance?
    something quantifiable by numbers
    usually, time spent doing task X
    number of requests, latency, etc.
    statistical properties about that metric
    antocuni,fijal (EuroPython 2015) Python and PyPy performance July 21, 2015 6 / 31

    View Slide

  7. Do you have a performance problem?
    what you’re trying to measure
    means to measure it (production, benchmarks, etc.)
    is Python is the cause here?
    environment to quickly measure and check the results
    same as for debugging
    antocuni,fijal (EuroPython 2015) Python and PyPy performance July 21, 2015 7 / 31

    View Slide

  8. When Python is the problem
    tools, timers etc.
    systems are too complicated to guess which will be
    faster
    find your bottlenecks
    20/80 (but 20% of million lines is 200 000 lines,
    remember that)
    antocuni,fijal (EuroPython 2015) Python and PyPy performance July 21, 2015 8 / 31

    View Slide

  9. Profilers landscape
    cProfile, runSnakeRun (high overhead) - event based
    profiler
    plop, vmprof - statistical profilers
    cProfile & vmprof work on pypy
    antocuni,fijal (EuroPython 2015) Python and PyPy performance July 21, 2015 9 / 31

    View Slide

  10. vmprof
    inspired by gperftools
    statistical profiler run by an interrupt (~300Hz on
    modern linux)
    sampling the C stack
    CPython, PyPy, possibly more virtual machines
    antocuni,fijal (EuroPython 2015) Python and PyPy performance July 21, 2015 10 / 31

    View Slide

  11. why not gperftools?
    C stack does not contain python-level frames
    90% PyEval_EvalFrame and other internals
    we want python-level functions
    picture is even more confusing in the presence of the
    JIT
    antocuni,fijal (EuroPython 2015) Python and PyPy performance July 21, 2015 11 / 31

    View Slide

  12. using vmprof
    demo
    http://vmprof.readthedocs.org
    antocuni,fijal (EuroPython 2015) Python and PyPy performance July 21, 2015 12 / 31

    View Slide

  13. using vmprof in production
    low overhead (5-10%), possibly lower in the future
    possibility of realtime monitoring (coming)
    antocuni,fijal (EuroPython 2015) Python and PyPy performance July 21, 2015 13 / 31

    View Slide

  14. vmprof future
    profiler as a service
    realtime advanced visualization
    antocuni,fijal (EuroPython 2015) Python and PyPy performance July 21, 2015 14 / 31

    View Slide

  15. Part 2
    Make it fast
    antocuni,fijal (EuroPython 2015) Python and PyPy performance July 21, 2015 15 / 31

    View Slide

  16. Tools
    Endless list of tools/techniques to increment speed
    C extension
    Cython
    numba
    "performance tricks"
    PyPy
    We’ll concentrate on it
    WARNING: we wrote it, we are biased :)
    gives you most wins for free (*)
    antocuni,fijal (EuroPython 2015) Python and PyPy performance July 21, 2015 16 / 31

    View Slide

  17. What is PyPy
    Alternative, fast Python implementation
    Performance: JIT compiler, advanced GC
    PyPy 2.6.0 (Python version 2.7.9)
    Py3k as usual in progress (3.2.5 out, 3.3 in
    development)
    http://pypy.org
    EP Talks:
    The GIL is dead: PyPy-STM
    (July 23, 16:45 by Armin Rigo)
    PyPy ecosystem: CFFI, numpy, scipy, etc
    (July 24, 15:15 by Romain Guillebert)
    antocuni,fijal (EuroPython 2015) Python and PyPy performance July 21, 2015 17 / 31

    View Slide

  18. Speed: 7x faster than CPython
    antocuni,fijal (EuroPython 2015) Python and PyPy performance July 21, 2015 18 / 31

    View Slide

  19. The JIT
    def main():
    init()
    some_quick_code()
    for x in large_list:
    do_something(x)
    some_other_code()
    while condition():
    expensive_computation()
    antocuni,fijal (EuroPython 2015) Python and PyPy performance July 21, 2015 19 / 31

    View Slide

  20. The JIT
    def main():
    init()
    some_quick_code()
    for x in large_list:
    do_something(x)
    some_other_code()
    while condition():
    expensive_computation()
    NO JIT
    antocuni,fijal (EuroPython 2015) Python and PyPy performance July 21, 2015 20 / 31

    View Slide

  21. The JIT
    def main():
    init()
    some_quick_code()
    for x in large_list:
    do_something(x)
    some_other_code()
    while condition():
    expensive_computation()
    assembler
    assembler
    NO JIT JIT
    antocuni,fijal (EuroPython 2015) Python and PyPy performance July 21, 2015 21 / 31

    View Slide

  22. JIT overview
    Tracing JIT
    detect and compile "hot" code
    Specialization
    Precompute as much as possible
    Constant propagation
    Aggressive inlining
    antocuni,fijal (EuroPython 2015) Python and PyPy performance July 21, 2015 22 / 31

    View Slide

  23. Specialization (1)
    obj.foo()
    which code is executed? (SIMPLIFIED)
    lookup foo in obj.__dict__
    lookup foo in obj.__class__
    lookup foo in obj.__bases__[0], etc.
    finally, execute foo
    without JIT, you need to do these steps again and
    again
    Precompute the lookup?
    antocuni,fijal (EuroPython 2015) Python and PyPy performance July 21, 2015 23 / 31

    View Slide

  24. Specialization (2)
    pretend and assume that obj.__class__ IS
    constant
    "promotion"
    guard
    check our assumption: if it’s false, bail out
    now we can directly jump to foo code
    ...unless foo is in obj.__dict__: GUARD!
    ...unless foo.__class__.__dict__ changed:
    GUARD!
    Too many guard failures?
    Compile some more assembler!
    guards are cheap
    out-of-line guards even more
    antocuni,fijal (EuroPython 2015) Python and PyPy performance July 21, 2015 24 / 31

    View Slide

  25. Specialization (3)
    who decides what to promote/specialize for?
    we, the PyPy devs :)
    heuristics
    instance attributes are never promoted
    class attributes are promoted by default (with some
    exceptions)
    module attributes (i.e., globals) as well
    bytecode constants
    antocuni,fijal (EuroPython 2015) Python and PyPy performance July 21, 2015 25 / 31

    View Slide

  26. Specialization trade-offs
    Too much specialization
    guards fails often
    explosion of assembler
    Not enough specialization
    inefficient code
    antocuni,fijal (EuroPython 2015) Python and PyPy performance July 21, 2015 26 / 31

    View Slide

  27. Guidos points
    antocuni,fijal (EuroPython 2015) Python and PyPy performance July 21, 2015 27 / 31

    View Slide

  28. Don’t do it on PyPy (or at all)
    simple is better than complicated
    avoid string concatenation in the loop
    avoid replacing simple loop with itertools monsters
    "move stuff to C" is (almost) never a good idea
    use cffi when calling C
    avoid C extensions using CPython C API
    avoid creating classes at runtime
    antocuni,fijal (EuroPython 2015) Python and PyPy performance July 21, 2015 28 / 31

    View Slide

  29. Example
    map(operator.attrgetter(’x’), list)
    vs
    [x.x for x in list]
    antocuni,fijal (EuroPython 2015) Python and PyPy performance July 21, 2015 29 / 31

    View Slide

  30. More about PyPy
    we are going to run a PyPy open space (tomorrow
    18:00 @ A4)
    come ask more questions
    antocuni,fijal (EuroPython 2015) Python and PyPy performance July 21, 2015 30 / 31

    View Slide

  31. Q&A?
    Thank you!
    http://baroquesoftware.com
    http://pypy.org
    http://vmprof.readthedocs.org
    antocuni,fijal (EuroPython 2015) Python and PyPy performance July 21, 2015 31 / 31

    View Slide