Slide 1

Slide 1 text

PyPy JIT under the hood Antonio Cuni PyCon UK 2012 September 28, 2012 antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 1 / 29

Slide 2

Slide 2 text

About me PyPy core dev PyPy py3k tech leader pdb++, fancycompleter, ... Consultant, trainer You can hire me :-) http://antocuni.eu antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 2 / 29

Slide 3

Slide 3 text

About this talk What is PyPy? (in 30 seconds) Overview of tracing JITs The PyPy JIT generator antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 3 / 29

Slide 4

Slide 4 text

Part 0: What is PyPy? RPython toolchain subset of Python ideal for writing VMs JIT & GC for free Python interpreter written in RPython Whatever (dynamic) language you want smalltalk, prolog, javascript, ... antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 4 / 29

Slide 5

Slide 5 text

Part 1 Overview of tracing JITs antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 5 / 29

Slide 6

Slide 6 text

Compilers When? Batch or Ahead Of Time Just In Time How? Static Dynamic or Adaptive What? Method-based compiler Tracing compiler PyPy: JIT, Dynamic, Tracing antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 6 / 29

Slide 7

Slide 7 text

Compilers When? Batch or Ahead Of Time Just In Time How? Static Dynamic or Adaptive What? Method-based compiler Tracing compiler PyPy: JIT, Dynamic, Tracing antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 6 / 29

Slide 8

Slide 8 text

Compilers When? Batch or Ahead Of Time Just In Time How? Static Dynamic or Adaptive What? Method-based compiler Tracing compiler PyPy: JIT, Dynamic, Tracing antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 6 / 29

Slide 9

Slide 9 text

Compilers When? Batch or Ahead Of Time Just In Time How? Static Dynamic or Adaptive What? Method-based compiler Tracing compiler PyPy: JIT, Dynamic, Tracing antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 6 / 29

Slide 10

Slide 10 text

Assumptions Pareto Principle (80-20 rule) the 20% of the program accounts for the 80% of the runtime hot-spots Fast Path principle optimize only what is necessary fall back for uncommon cases Most of runtime spent in loops Always the same code paths (likely) antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 7 / 29

Slide 11

Slide 11 text

Assumptions Pareto Principle (80-20 rule) the 20% of the program accounts for the 80% of the runtime hot-spots Fast Path principle optimize only what is necessary fall back for uncommon cases Most of runtime spent in loops Always the same code paths (likely) antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 7 / 29

Slide 12

Slide 12 text

Tracing JIT Interpret the program as usual Detect hot loops Tracing phase linear trace Compiling Execute guards to ensure correctness Profit :-) antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 8 / 29

Slide 13

Slide 13 text

Tracing JIT phases Interpretation antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 9 / 29

Slide 14

Slide 14 text

Tracing JIT phases Interpretation Tracing hot loop detected antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 9 / 29

Slide 15

Slide 15 text

Tracing JIT phases Interpretation Tracing hot loop detected Compilation antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 9 / 29

Slide 16

Slide 16 text

Tracing JIT phases Interpretation Tracing hot loop detected Compilation Running antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 9 / 29

Slide 17

Slide 17 text

Tracing JIT phases Interpretation Tracing hot loop detected Compilation Running cold guard failed antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 9 / 29

Slide 18

Slide 18 text

Tracing JIT phases Interpretation Tracing hot loop detected Compilation Running cold guard failed entering compiled loop antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 9 / 29

Slide 19

Slide 19 text

Tracing JIT phases Interpretation Tracing hot loop detected Compilation Running cold guard failed entering compiled loop guard failure → hot antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 9 / 29

Slide 20

Slide 20 text

Tracing JIT phases Interpretation Tracing hot loop detected Compilation Running cold guard failed entering compiled loop guard failure → hot hot guard failed antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 9 / 29

Slide 21

Slide 21 text

Tracing Example (1) java interface Operation { int DoSomething(int x); } class IncrOrDecr implements Operation { public int DoSomething(int x) { if (x < 0) return x-1; else return x+1; } } class tracing { public static void main(String argv[]) { int N = 100; int i = 0; Operation op = new IncrOrDecr(); while (i < N) { i = op.DoSomething(i); } System.out.println(i); } } antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 10 / 29

Slide 22

Slide 22 text

Tracing Example (2) Java bytecode class IncrOrDecr { ... public DoSomething(I)I ILOAD 1 IFGE LABEL_0 ILOAD 1 ICONST_1 ISUB IRETURN LABEL_0 ILOAD 1 ICONST_1 IADD IRETURN } Java bytecode class tracing { ... public static main( [Ljava/lang/String;)V ... LABEL_0 ILOAD 2 ILOAD 1 IF_ICMPGE LABEL_1 ALOAD 3 ILOAD 2 INVOKEINTERFACE Operation.DoSomething (I)I ISTORE 2 GOTO LABEL_0 LABEL_1 ... } antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 11 / 29

Slide 23

Slide 23 text

Tracing Example (2) Java bytecode class IncrOrDecr { ... public DoSomething(I)I ILOAD 1 IFGE LABEL_0 ILOAD 1 ICONST_1 ISUB IRETURN LABEL_0 ILOAD 1 ICONST_1 IADD IRETURN } Java bytecode class tracing { ... public static main( [Ljava/lang/String;)V ... LABEL_0 ILOAD 2 ILOAD 1 IF_ICMPGE LABEL_1 ALOAD 3 ILOAD 2 INVOKEINTERFACE Operation.DoSomething (I)I ISTORE 2 GOTO LABEL_0 LABEL_1 ... } antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 11 / 29

Slide 24

Slide 24 text

Tracing example (3) INSTR: Instruction executed but not recorded INSTR: Instruction added to the trace but not executed Method Java code Trace Value antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 12 / 29

Slide 25

Slide 25 text

Tracing example (3) INSTR: Instruction executed but not recorded INSTR: Instruction added to the trace but not executed Method Java code Trace Value Main while (i < N) { ILOAD 2 3 ILOAD 1 100 IF ICMPGE LABEL 1 f alse GUARD ICMPLT antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 12 / 29

Slide 26

Slide 26 text

Tracing example (3) INSTR: Instruction executed but not recorded INSTR: Instruction added to the trace but not executed Method Java code Trace Value Main while (i < N) { ILOAD 2 3 ILOAD 1 100 IF ICMPGE LABEL 1 f alse GUARD ICMPLT i = op.DoSomething(i); ALOAD 3 IncrOrDecr obj ILOAD 2 3 INVOKEINTERFACE ... GUARD CLASS(IncrOrDecr) antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 12 / 29

Slide 27

Slide 27 text

Tracing example (3) INSTR: Instruction executed but not recorded INSTR: Instruction added to the trace but not executed Method Java code Trace Value Main while (i < N) { ILOAD 2 3 ILOAD 1 100 IF ICMPGE LABEL 1 f alse GUARD ICMPLT i = op.DoSomething(i); ALOAD 3 IncrOrDecr obj ILOAD 2 3 INVOKEINTERFACE ... GUARD CLASS(IncrOrDecr) DoSomething if (x < 0) ILOAD 1 3 IFGE LABEL 0 true GUARD GE antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 12 / 29

Slide 28

Slide 28 text

Tracing example (3) INSTR: Instruction executed but not recorded INSTR: Instruction added to the trace but not executed Method Java code Trace Value Main while (i < N) { ILOAD 2 3 ILOAD 1 100 IF ICMPGE LABEL 1 f alse GUARD ICMPLT i = op.DoSomething(i); ALOAD 3 IncrOrDecr obj ILOAD 2 3 INVOKEINTERFACE ... GUARD CLASS(IncrOrDecr) DoSomething if (x < 0) ILOAD 1 3 IFGE LABEL 0 true GUARD GE return x+1; ILOAD 1 3 ICONST 1 1 IADD 4 IRETURN antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 12 / 29

Slide 29

Slide 29 text

Tracing example (3) INSTR: Instruction executed but not recorded INSTR: Instruction added to the trace but not executed Method Java code Trace Value Main while (i < N) { ILOAD 2 3 ILOAD 1 100 IF ICMPGE LABEL 1 f alse GUARD ICMPLT i = op.DoSomething(i); ALOAD 3 IncrOrDecr obj ILOAD 2 3 INVOKEINTERFACE ... GUARD CLASS(IncrOrDecr) DoSomething if (x < 0) ILOAD 1 3 IFGE LABEL 0 true GUARD GE return x+1; ILOAD 1 3 ICONST 1 1 IADD 4 IRETURN Main ISTORE 2 i = op.DoSomething(i); } GOTO LABEL 0 4 antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 12 / 29

Slide 30

Slide 30 text

Trace trees (1) tracetree.java public static void trace_trees() { int a = 0; int i = 0; int N = 100; while(i < N) { if (i%2 == 0) a++; else a*=2; i++; } } antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 13 / 29

Slide 31

Slide 31 text

Trace trees (2) ILOAD 1 ILOAD 2 GUARD ICMPLT ILOAD 1 ICONST 2 IREM GUARD NE ILOAD 0 ICONST 2 IMUL ISTORE 0 IINC 1 1 antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 14 / 29

Slide 32

Slide 32 text

Trace trees (2) ILOAD 1 ILOAD 2 GUARD ICMPLT ILOAD 1 ICONST 2 IREM GUARD NE ILOAD 0 ICONST 2 IMUL ISTORE 0 IINC 1 1 antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 14 / 29

Slide 33

Slide 33 text

Trace trees (2) ILOAD 1 ILOAD 2 GUARD ICMPLT ILOAD 1 ICONST 2 IREM GUARD NE ILOAD 0 ICONST 2 IMUL ISTORE 0 IINC 1 1 antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 14 / 29

Slide 34

Slide 34 text

Trace trees (2) ILOAD 1 ILOAD 2 GUARD ICMPLT ILOAD 1 ICONST 2 IREM GUARD NE ILOAD 0 ICONST 2 IMUL ISTORE 0 IINC 1 1 BLACKHOLE antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 14 / 29

Slide 35

Slide 35 text

Trace trees (2) ILOAD 1 ILOAD 2 GUARD ICMPLT ILOAD 1 ICONST 2 IREM GUARD NE ILOAD 0 ICONST 2 IMUL ISTORE 0 IINC 1 1 BLACKHOLE INTERPRETER antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 14 / 29

Slide 36

Slide 36 text

Trace trees (2) ILOAD 1 ILOAD 2 GUARD ICMPLT ILOAD 1 ICONST 2 IREM GUARD NE ILOAD 0 ICONST 2 IMUL ISTORE 0 IINC 1 1 BLACKHOLE INTERPRETER antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 14 / 29

Slide 37

Slide 37 text

Trace trees (2) ILOAD 1 ILOAD 2 GUARD ICMPLT ILOAD 1 ICONST 2 IREM GUARD NE ILOAD 0 ICONST 2 IMUL ISTORE 0 IINC 1 1 antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 14 / 29

Slide 38

Slide 38 text

Trace trees (2) ILOAD 1 ILOAD 2 GUARD ICMPLT ILOAD 1 ICONST 2 IREM GUARD NE ILOAD 0 ICONST 2 IMUL ISTORE 0 IINC 1 1 IINC 0 1 IINC 1 1 antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 14 / 29

Slide 39

Slide 39 text

Trace trees (2) ILOAD 1 ILOAD 2 GUARD ICMPLT ILOAD 1 ICONST 2 IREM GUARD NE ILOAD 0 ICONST 2 IMUL ISTORE 0 IINC 1 1 IINC 0 1 IINC 1 1 antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 14 / 29

Slide 40

Slide 40 text

Trace trees (2) ILOAD 1 ILOAD 2 GUARD ICMPLT ILOAD 1 ICONST 2 IREM GUARD NE ILOAD 0 ICONST 2 IMUL ISTORE 0 IINC 1 1 IINC 0 1 IINC 1 1 antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 14 / 29

Slide 41

Slide 41 text

Part 2 The PyPy JIT generator antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 15 / 29

Slide 42

Slide 42 text

General architecture def LOAD_GLOBAL(self): ... def STORE_FAST(self): ... def BINARY_ADD(self): ... RPYTHON antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 16 / 29

Slide 43

Slide 43 text

General architecture def LOAD_GLOBAL(self): ... def STORE_FAST(self): ... def BINARY_ADD(self): ... RPYTHON CODEWRITER antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 16 / 29

Slide 44

Slide 44 text

General architecture def LOAD_GLOBAL(self): ... def STORE_FAST(self): ... def BINARY_ADD(self): ... RPYTHON CODEWRITER ... p0 = getfield_gc(p0, 'func_globals') p2 = getfield_gc(p1, 'strval') call(dict_lookup, p0, p2) .... ... p0 = getfield_gc(p0, 'locals_w') setarrayitem_gc(p0, i0, p1) .... ... promote_class(p0) i0 = getfield_gc(p0, 'intval') promote_class(p1) i1 = getfield_gc(p1, 'intval') i2 = int_add(i0, i1) if (overflowed) goto ... p2 = new_with_vtable('W_IntObject') setfield_gc(p2, i2, 'intval') .... JITCODE antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 16 / 29

Slide 45

Slide 45 text

General architecture def LOAD_GLOBAL(self): ... def STORE_FAST(self): ... def BINARY_ADD(self): ... RPYTHON CODEWRITER ... p0 = getfield_gc(p0, 'func_globals') p2 = getfield_gc(p1, 'strval') call(dict_lookup, p0, p2) .... ... p0 = getfield_gc(p0, 'locals_w') setarrayitem_gc(p0, i0, p1) .... ... promote_class(p0) i0 = getfield_gc(p0, 'intval') promote_class(p1) i1 = getfield_gc(p1, 'intval') i2 = int_add(i0, i1) if (overflowed) goto ... p2 = new_with_vtable('W_IntObject') setfield_gc(p2, i2, 'intval') .... JITCODE compile-time runtime antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 16 / 29

Slide 46

Slide 46 text

General architecture def LOAD_GLOBAL(self): ... def STORE_FAST(self): ... def BINARY_ADD(self): ... RPYTHON CODEWRITER ... p0 = getfield_gc(p0, 'func_globals') p2 = getfield_gc(p1, 'strval') call(dict_lookup, p0, p2) .... ... p0 = getfield_gc(p0, 'locals_w') setarrayitem_gc(p0, i0, p1) .... ... promote_class(p0) i0 = getfield_gc(p0, 'intval') promote_class(p1) i1 = getfield_gc(p1, 'intval') i2 = int_add(i0, i1) if (overflowed) goto ... p2 = new_with_vtable('W_IntObject') setfield_gc(p2, i2, 'intval') .... JITCODE compile-time runtime META-TRACER antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 16 / 29

Slide 47

Slide 47 text

General architecture def LOAD_GLOBAL(self): ... def STORE_FAST(self): ... def BINARY_ADD(self): ... RPYTHON CODEWRITER ... p0 = getfield_gc(p0, 'func_globals') p2 = getfield_gc(p1, 'strval') call(dict_lookup, p0, p2) .... ... p0 = getfield_gc(p0, 'locals_w') setarrayitem_gc(p0, i0, p1) .... ... promote_class(p0) i0 = getfield_gc(p0, 'intval') promote_class(p1) i1 = getfield_gc(p1, 'intval') i2 = int_add(i0, i1) if (overflowed) goto ... p2 = new_with_vtable('W_IntObject') setfield_gc(p2, i2, 'intval') .... JITCODE compile-time runtime META-TRACER OPTIMIZER antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 16 / 29

Slide 48

Slide 48 text

General architecture def LOAD_GLOBAL(self): ... def STORE_FAST(self): ... def BINARY_ADD(self): ... RPYTHON CODEWRITER ... p0 = getfield_gc(p0, 'func_globals') p2 = getfield_gc(p1, 'strval') call(dict_lookup, p0, p2) .... ... p0 = getfield_gc(p0, 'locals_w') setarrayitem_gc(p0, i0, p1) .... ... promote_class(p0) i0 = getfield_gc(p0, 'intval') promote_class(p1) i1 = getfield_gc(p1, 'intval') i2 = int_add(i0, i1) if (overflowed) goto ... p2 = new_with_vtable('W_IntObject') setfield_gc(p2, i2, 'intval') .... JITCODE compile-time runtime META-TRACER OPTIMIZER BACKEND antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 16 / 29

Slide 49

Slide 49 text

General architecture def LOAD_GLOBAL(self): ... def STORE_FAST(self): ... def BINARY_ADD(self): ... RPYTHON CODEWRITER ... p0 = getfield_gc(p0, 'func_globals') p2 = getfield_gc(p1, 'strval') call(dict_lookup, p0, p2) .... ... p0 = getfield_gc(p0, 'locals_w') setarrayitem_gc(p0, i0, p1) .... ... promote_class(p0) i0 = getfield_gc(p0, 'intval') promote_class(p1) i1 = getfield_gc(p1, 'intval') i2 = int_add(i0, i1) if (overflowed) goto ... p2 = new_with_vtable('W_IntObject') setfield_gc(p2, i2, 'intval') .... JITCODE compile-time runtime META-TRACER OPTIMIZER BACKEND ASSEMBLER antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 16 / 29

Slide 50

Slide 50 text

PyPy trace example def fn(): c = a+b ... antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 17 / 29

Slide 51

Slide 51 text

PyPy trace example def fn(): c = a+b ... LOAD_GLOBAL A LOAD_GLOBAL B BINARY_ADD STORE_FAST C antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 17 / 29

Slide 52

Slide 52 text

PyPy trace example def fn(): c = a+b ... LOAD_GLOBAL A LOAD_GLOBAL B BINARY_ADD STORE_FAST C ... p0 = getfield_gc(p0, 'func_globals') p2 = getfield_gc(p1, 'strval') call(dict_lookup, p0, p2) ... antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 17 / 29

Slide 53

Slide 53 text

PyPy trace example def fn(): c = a+b ... LOAD_GLOBAL A LOAD_GLOBAL B BINARY_ADD STORE_FAST C ... p0 = getfield_gc(p0, 'func_globals') p2 = getfield_gc(p1, 'strval') call(dict_lookup, p0, p2) ... ... p0 = getfield_gc(p0, 'func_globals') p2 = getfield_gc(p1, 'strval') call(dict_lookup, p0, p2) ... antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 17 / 29

Slide 54

Slide 54 text

PyPy trace example def fn(): c = a+b ... LOAD_GLOBAL A LOAD_GLOBAL B BINARY_ADD STORE_FAST C ... p0 = getfield_gc(p0, 'func_globals') p2 = getfield_gc(p1, 'strval') call(dict_lookup, p0, p2) ... ... p0 = getfield_gc(p0, 'func_globals') p2 = getfield_gc(p1, 'strval') call(dict_lookup, p0, p2) ... ... guard_class(p0, W_IntObject) i0 = getfield_gc(p0, 'intval') guard_class(p1, W_IntObject) i1 = getfield_gc(p1, 'intval') i2 = int_add(00, i1) guard_not_overflow() p2 = new_with_vtable('W_IntObject') setfield_gc(p2, i2, 'intval') ... antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 17 / 29

Slide 55

Slide 55 text

PyPy trace example def fn(): c = a+b ... LOAD_GLOBAL A LOAD_GLOBAL B BINARY_ADD STORE_FAST C ... p0 = getfield_gc(p0, 'func_globals') p2 = getfield_gc(p1, 'strval') call(dict_lookup, p0, p2) ... ... p0 = getfield_gc(p0, 'func_globals') p2 = getfield_gc(p1, 'strval') call(dict_lookup, p0, p2) ... ... guard_class(p0, W_IntObject) i0 = getfield_gc(p0, 'intval') guard_class(p1, W_IntObject) i1 = getfield_gc(p1, 'intval') i2 = int_add(00, i1) guard_not_overflow() p2 = new_with_vtable('W_IntObject') setfield_gc(p2, i2, 'intval') ... ... p0 = getfield_gc(p0, 'locals_w') setarrayitem_gc(p0, i0, p1) .... antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 17 / 29

Slide 56

Slide 56 text

PyPy optimizer intbounds constant folding / pure operations virtuals string optimizations heap (multiple get/setfield, etc) ffi unroll antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 18 / 29

Slide 57

Slide 57 text

Intbound optimization (1) intbound.py def fn(): i = 0 while i < 5000: i += 2 return i antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 19 / 29

Slide 58

Slide 58 text

Intbound optimization (2) unoptimized ... i17 = int_lt(i15, 5000) guard_true(i17) i19 = int_add_ovf(i15, 2) guard_no_overflow() ... optimized ... i17 = int_lt(i15, 5000) guard_true(i17) i19 = int_add(i15, 2) ... It works often array bound checking intbound info propagates all over the trace antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 20 / 29

Slide 59

Slide 59 text

Intbound optimization (2) unoptimized ... i17 = int_lt(i15, 5000) guard_true(i17) i19 = int_add_ovf(i15, 2) guard_no_overflow() ... optimized ... i17 = int_lt(i15, 5000) guard_true(i17) i19 = int_add(i15, 2) ... It works often array bound checking intbound info propagates all over the trace antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 20 / 29

Slide 60

Slide 60 text

Intbound optimization (2) unoptimized ... i17 = int_lt(i15, 5000) guard_true(i17) i19 = int_add_ovf(i15, 2) guard_no_overflow() ... optimized ... i17 = int_lt(i15, 5000) guard_true(i17) i19 = int_add(i15, 2) ... It works often array bound checking intbound info propagates all over the trace antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 20 / 29

Slide 61

Slide 61 text

Virtuals (1) virtuals.py def fn(): i = 0 while i < 5000: i += 2 return i antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 21 / 29

Slide 62

Slide 62 text

Virtuals (2) unoptimized ... guard_class(p0, W_IntObject) i1 = getfield_pure(p0, ’intval’) i2 = int_add(i1, 2) p3 = new(W_IntObject) setfield_gc(p3, i2, ’intval’) ... optimized ... i2 = int_add(i1, 2) ... The most important optimization (TM) It works both inside the trace and across the loop It works for tons of cases e.g. function frames antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 22 / 29

Slide 63

Slide 63 text

Virtuals (2) unoptimized ... guard_class(p0, W_IntObject) i1 = getfield_pure(p0, ’intval’) i2 = int_add(i1, 2) p3 = new(W_IntObject) setfield_gc(p3, i2, ’intval’) ... optimized ... i2 = int_add(i1, 2) ... The most important optimization (TM) It works both inside the trace and across the loop It works for tons of cases e.g. function frames antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 22 / 29

Slide 64

Slide 64 text

Virtuals (2) unoptimized ... guard_class(p0, W_IntObject) i1 = getfield_pure(p0, ’intval’) i2 = int_add(i1, 2) p3 = new(W_IntObject) setfield_gc(p3, i2, ’intval’) ... optimized ... i2 = int_add(i1, 2) ... The most important optimization (TM) It works both inside the trace and across the loop It works for tons of cases e.g. function frames antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 22 / 29

Slide 65

Slide 65 text

Constant folding (1) constfold.py def fn(): i = 0 while i < 5000: i += 2 return i antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 23 / 29

Slide 66

Slide 66 text

Constant folding (2) unoptimized ... i1 = getfield_pure(p0, ’intval’) i2 = getfield_pure(, ’intval’) i3 = int_add(i1, i2) ... optimized ... i1 = getfield_pure(p0, ’intval’) i3 = int_add(i1, 2) ... It “finishes the job” Works well together with other optimizations (e.g. virtuals) It also does “normal, boring, static” constant-folding antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 24 / 29

Slide 67

Slide 67 text

Constant folding (2) unoptimized ... i1 = getfield_pure(p0, ’intval’) i2 = getfield_pure(, ’intval’) i3 = int_add(i1, i2) ... optimized ... i1 = getfield_pure(p0, ’intval’) i3 = int_add(i1, 2) ... It “finishes the job” Works well together with other optimizations (e.g. virtuals) It also does “normal, boring, static” constant-folding antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 24 / 29

Slide 68

Slide 68 text

Constant folding (2) unoptimized ... i1 = getfield_pure(p0, ’intval’) i2 = getfield_pure(, ’intval’) i3 = int_add(i1, i2) ... optimized ... i1 = getfield_pure(p0, ’intval’) i3 = int_add(i1, 2) ... It “finishes the job” Works well together with other optimizations (e.g. virtuals) It also does “normal, boring, static” constant-folding antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 24 / 29

Slide 69

Slide 69 text

Out of line guards (1) outoflineguards.py N = 2 def fn(): i = 0 while i < 5000: i += N return i antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 25 / 29

Slide 70

Slide 70 text

Out of line guards (2) unoptimized ... quasiimmut_field(, ’val’) guard_not_invalidated() p0 = getfield_gc(, ’val’) ... i2 = getfield_pure(p0, ’intval’) i3 = int_add(i1, i2) optimized ... guard_not_invalidated() ... i3 = int_add(i1, 2) ... Python is too dynamic, but we don’t care :-) No overhead in assembler code Used a bit “everywhere” Credits to Mark Shannon for the name :-) antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 26 / 29

Slide 71

Slide 71 text

Out of line guards (2) unoptimized ... quasiimmut_field(, ’val’) guard_not_invalidated() p0 = getfield_gc(, ’val’) ... i2 = getfield_pure(p0, ’intval’) i3 = int_add(i1, i2) optimized ... guard_not_invalidated() ... i3 = int_add(i1, 2) ... Python is too dynamic, but we don’t care :-) No overhead in assembler code Used a bit “everywhere” Credits to Mark Shannon for the name :-) antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 26 / 29

Slide 72

Slide 72 text

Out of line guards (2) unoptimized ... quasiimmut_field(, ’val’) guard_not_invalidated() p0 = getfield_gc(, ’val’) ... i2 = getfield_pure(p0, ’intval’) i3 = int_add(i1, i2) optimized ... guard_not_invalidated() ... i3 = int_add(i1, 2) ... Python is too dynamic, but we don’t care :-) No overhead in assembler code Used a bit “everywhere” Credits to Mark Shannon for the name :-) antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 26 / 29

Slide 73

Slide 73 text

Guards guard_true guard_false guard_class guard_no_overflow guard_value antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 27 / 29

Slide 74

Slide 74 text

Promotion guard_value specialize code make sure not to overspecialize example: type of objects example: function code objects, ... antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 28 / 29

Slide 75

Slide 75 text

Conclusion PyPy is cool :-) Any question? antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 29 / 29