Slide 1

Slide 1 text

PyPy: becoming fast Antonio Cuni Carl Friedrich Bolz Samuele Pedroni EuroPython 2009 June 30 2009 antocuni, cfbolz, pedronis (EuroPython 2009) PyPy: becoming fast June 30 2009 1 / 18

Slide 2

Slide 2 text

Current status 5th generation of the JIT the right one (hopefully :-)) tracing JIT (like Mozilla TraceMonkey) up to Nx faster on trivial benchmarks N = 10, 20, 30, 60 depending on the moon phase PyPy evil plan: be consistently faster than CPython in the near future antocuni, cfbolz, pedronis (EuroPython 2009) PyPy: becoming fast June 30 2009 1 / 18

Slide 3

Slide 3 text

Main ideas (1) 80/20 rule 80% of the time is spent in 20% of the code Optimize only that 20% antocuni, cfbolz, pedronis (EuroPython 2009) PyPy: becoming fast June 30 2009 2 / 18

Slide 4

Slide 4 text

Main ideas (2) That 20% has to be composed of loops Recognize hot loops Optimize hot loops Compile to native code Execute :-) antocuni, cfbolz, pedronis (EuroPython 2009) PyPy: becoming fast June 30 2009 3 / 18

Slide 5

Slide 5 text

Recognize hot loops Example def fn(n): tot = 0 while n: tot += n n -= 1 return tot Bytecode ... LOAD_FAST 1 (tot) LOAD_FAST 0 (n) INPLACE_ADD STORE_FAST 1 (tot) LOAD_FAST 0 (n) LOAD_CONST 2 (1) INPLACE_SUBTRACT STORE_FAST 0 (n) JUMP_ABSOLUTE 9 ... antocuni, cfbolz, pedronis (EuroPython 2009) PyPy: becoming fast June 30 2009 4 / 18

Slide 6

Slide 6 text

Recognize hot loops Example def fn(n): tot = 0 while n: tot += n n -= 1 return tot Bytecode ... LOAD_FAST 1 (tot) LOAD_FAST 0 (n) INPLACE_ADD STORE_FAST 1 (tot) LOAD_FAST 0 (n) LOAD_CONST 2 (1) INPLACE_SUBTRACT STORE_FAST 0 (n) JUMP_ABSOLUTE 9 ... antocuni, cfbolz, pedronis (EuroPython 2009) PyPy: becoming fast June 30 2009 4 / 18

Slide 7

Slide 7 text

Tracing Execute one iteration of the hot loop Record the operations, as well as the concrete results Linear Validity ensured by guards Recovering logic in case of guard failure antocuni, cfbolz, pedronis (EuroPython 2009) PyPy: becoming fast June 30 2009 5 / 18

Slide 8

Slide 8 text

Tracing example antocuni, cfbolz, pedronis (EuroPython 2009) PyPy: becoming fast June 30 2009 6 / 18

Slide 9

Slide 9 text

Tracing example antocuni, cfbolz, pedronis (EuroPython 2009) PyPy: becoming fast June 30 2009 7 / 18

Slide 10

Slide 10 text

Tracing example antocuni, cfbolz, pedronis (EuroPython 2009) PyPy: becoming fast June 30 2009 8 / 18

Slide 11

Slide 11 text

Tracing example antocuni, cfbolz, pedronis (EuroPython 2009) PyPy: becoming fast June 30 2009 9 / 18

Slide 12

Slide 12 text

Tracing example antocuni, cfbolz, pedronis (EuroPython 2009) PyPy: becoming fast June 30 2009 10 / 18

Slide 13

Slide 13 text

Tracing example antocuni, cfbolz, pedronis (EuroPython 2009) PyPy: becoming fast June 30 2009 11 / 18

Slide 14

Slide 14 text

Tracing example antocuni, cfbolz, pedronis (EuroPython 2009) PyPy: becoming fast June 30 2009 12 / 18

Slide 15

Slide 15 text

Post-tracing phase Generalize or specialize? Generalized loops can be used more often Specialized loops are more efficient A trace is super-specialized antocuni, cfbolz, pedronis (EuroPython 2009) PyPy: becoming fast June 30 2009 13 / 18

Slide 16

Slide 16 text

Perfect specialization Generalize the trace... ...but not too much Most general trace which is specialized enough to be efficient e.g.: turn Python int into C-level words specialized: it works only with int (and not e.g. float) general: it works with all int :-) antocuni, cfbolz, pedronis (EuroPython 2009) PyPy: becoming fast June 30 2009 14 / 18

Slide 17

Slide 17 text

Optimization phase Remove superflous operations Constant folding Escape analysis: remove unneeded allocations antocuni, cfbolz, pedronis (EuroPython 2009) PyPy: becoming fast June 30 2009 15 / 18

Slide 18

Slide 18 text

Code generation In theory: the easy part Theory != pratice The current x86 backend produces suboptimal code but not too bad :-) x86-64: not yet, but relatively low effort super-experimental CLI/.NET backend Contributors welcome :-) antocuni, cfbolz, pedronis (EuroPython 2009) PyPy: becoming fast June 30 2009 16 / 18

Slide 19

Slide 19 text

CLI JIT backend JIT-over-JIT emit .NET bytecode which is then compiled by .NET’s own JIT current status: as fast as IronPython on trivial benchmarks will be faster than IP in the future extremely good results in JIT v2 it makes a dynamic toy language: as fast as C# for numerical benchmarks faster than C# for some OO benchmarks antocuni, cfbolz, pedronis (EuroPython 2009) PyPy: becoming fast June 30 2009 17 / 18

Slide 20

Slide 20 text

CLI JIT backend JIT-over-JIT emit .NET bytecode which is then compiled by .NET’s own JIT current status: as fast as IronPython on trivial benchmarks will be faster than IP in the future extremely good results in JIT v2 it makes a dynamic toy language: as fast as C# for numerical benchmarks faster than C# for some OO benchmarks antocuni, cfbolz, pedronis (EuroPython 2009) PyPy: becoming fast June 30 2009 17 / 18

Slide 21

Slide 21 text

Contact / Q&A PyPy: http://codespeak.net/pypy Blog: http://morepypy.blogspot.com antocuni, cfbolz, pedronis (EuroPython 2009) PyPy: becoming fast June 30 2009 18 / 18