Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PyPy JIT Under the Hood

Antonio Cuni
September 28, 2012

PyPy JIT Under the Hood

Antonio Cuni

September 28, 2012
Tweet

More Decks by Antonio Cuni

Other Decks in Programming

Transcript

  1. PyPy JIT under the hood
    Antonio Cuni
    PyCon UK 2012
    September 28, 2012
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 1 / 29

    View full-size slide

  2. About me
    PyPy core dev
    PyPy py3k tech leader
    pdb++, fancycompleter, ...
    Consultant, trainer
    You can hire me :-)
    http://antocuni.eu
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 2 / 29

    View full-size slide

  3. About this talk
    What is PyPy? (in 30 seconds)
    Overview of tracing JITs
    The PyPy JIT generator
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 3 / 29

    View full-size slide

  4. Part 0: What is PyPy?
    RPython toolchain
    subset of Python
    ideal for writing VMs
    JIT & GC for free
    Python interpreter
    written in RPython
    Whatever (dynamic) language you want
    smalltalk, prolog, javascript, ...
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 4 / 29

    View full-size slide

  5. Part 1
    Overview of tracing JITs
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 5 / 29

    View full-size slide

  6. Compilers
    When?
    Batch or Ahead Of Time
    Just In Time
    How?
    Static
    Dynamic or Adaptive
    What?
    Method-based compiler
    Tracing compiler
    PyPy: JIT, Dynamic, Tracing
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 6 / 29

    View full-size slide

  7. Compilers
    When?
    Batch or Ahead Of Time
    Just In Time
    How?
    Static
    Dynamic or Adaptive
    What?
    Method-based compiler
    Tracing compiler
    PyPy: JIT, Dynamic, Tracing
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 6 / 29

    View full-size slide

  8. Compilers
    When?
    Batch or Ahead Of Time
    Just In Time
    How?
    Static
    Dynamic or Adaptive
    What?
    Method-based compiler
    Tracing compiler
    PyPy: JIT, Dynamic, Tracing
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 6 / 29

    View full-size slide

  9. Compilers
    When?
    Batch or Ahead Of Time
    Just In Time
    How?
    Static
    Dynamic or Adaptive
    What?
    Method-based compiler
    Tracing compiler
    PyPy: JIT, Dynamic, Tracing
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 6 / 29

    View full-size slide

  10. Assumptions
    Pareto Principle (80-20 rule)
    the 20% of the program accounts for the 80% of the
    runtime
    hot-spots
    Fast Path principle
    optimize only what is necessary
    fall back for uncommon cases
    Most of runtime spent in loops
    Always the same code paths (likely)
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 7 / 29

    View full-size slide

  11. Assumptions
    Pareto Principle (80-20 rule)
    the 20% of the program accounts for the 80% of the
    runtime
    hot-spots
    Fast Path principle
    optimize only what is necessary
    fall back for uncommon cases
    Most of runtime spent in loops
    Always the same code paths (likely)
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 7 / 29

    View full-size slide

  12. Tracing JIT
    Interpret the program as usual
    Detect hot loops
    Tracing phase
    linear trace
    Compiling
    Execute
    guards to ensure correctness
    Profit :-)
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 8 / 29

    View full-size slide

  13. Tracing JIT phases
    Interpretation
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 9 / 29

    View full-size slide

  14. Tracing JIT phases
    Interpretation
    Tracing
    hot loop detected
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 9 / 29

    View full-size slide

  15. Tracing JIT phases
    Interpretation
    Tracing
    hot loop detected
    Compilation
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 9 / 29

    View full-size slide

  16. Tracing JIT phases
    Interpretation
    Tracing
    hot loop detected
    Compilation
    Running
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 9 / 29

    View full-size slide

  17. Tracing JIT phases
    Interpretation
    Tracing
    hot loop detected
    Compilation
    Running
    cold guard failed
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 9 / 29

    View full-size slide

  18. Tracing JIT phases
    Interpretation
    Tracing
    hot loop detected
    Compilation
    Running
    cold guard failed
    entering compiled loop
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 9 / 29

    View full-size slide

  19. Tracing JIT phases
    Interpretation
    Tracing
    hot loop detected
    Compilation
    Running
    cold guard failed
    entering compiled loop
    guard failure → hot
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 9 / 29

    View full-size slide

  20. Tracing JIT phases
    Interpretation
    Tracing
    hot loop detected
    Compilation
    Running
    cold guard failed
    entering compiled loop
    guard failure → hot
    hot guard failed
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 9 / 29

    View full-size slide

  21. Tracing Example (1)
    java
    interface Operation {
    int DoSomething(int x);
    }
    class IncrOrDecr implements Operation {
    public int DoSomething(int x) {
    if (x < 0) return x-1;
    else return x+1;
    }
    }
    class tracing {
    public static void main(String argv[]) {
    int N = 100;
    int i = 0;
    Operation op = new IncrOrDecr();
    while (i < N) {
    i = op.DoSomething(i);
    }
    System.out.println(i);
    }
    }
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 10 / 29

    View full-size slide

  22. Tracing Example (2)
    Java bytecode
    class IncrOrDecr {
    ...
    public DoSomething(I)I
    ILOAD 1
    IFGE LABEL_0
    ILOAD 1
    ICONST_1
    ISUB
    IRETURN
    LABEL_0
    ILOAD 1
    ICONST_1
    IADD
    IRETURN
    }
    Java bytecode
    class tracing {
    ...
    public static main(
    [Ljava/lang/String;)V
    ...
    LABEL_0
    ILOAD 2
    ILOAD 1
    IF_ICMPGE LABEL_1
    ALOAD 3
    ILOAD 2
    INVOKEINTERFACE
    Operation.DoSomething (I)I
    ISTORE 2
    GOTO LABEL_0
    LABEL_1
    ...
    }
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 11 / 29

    View full-size slide

  23. Tracing Example (2)
    Java bytecode
    class IncrOrDecr {
    ...
    public DoSomething(I)I
    ILOAD 1
    IFGE LABEL_0
    ILOAD 1
    ICONST_1
    ISUB
    IRETURN
    LABEL_0
    ILOAD 1
    ICONST_1
    IADD
    IRETURN
    }
    Java bytecode
    class tracing {
    ...
    public static main(
    [Ljava/lang/String;)V
    ...
    LABEL_0
    ILOAD 2
    ILOAD 1
    IF_ICMPGE LABEL_1
    ALOAD 3
    ILOAD 2
    INVOKEINTERFACE
    Operation.DoSomething (I)I
    ISTORE 2
    GOTO LABEL_0
    LABEL_1
    ...
    }
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 11 / 29

    View full-size slide

  24. Tracing example (3)
    INSTR: Instruction executed but not recorded
    INSTR: Instruction added to the trace but not executed
    Method Java code Trace Value
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 12 / 29

    View full-size slide

  25. Tracing example (3)
    INSTR: Instruction executed but not recorded
    INSTR: Instruction added to the trace but not executed
    Method Java code Trace Value
    Main while (i < N) { ILOAD 2 3
    ILOAD 1 100
    IF ICMPGE LABEL 1 f alse
    GUARD ICMPLT
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 12 / 29

    View full-size slide

  26. Tracing example (3)
    INSTR: Instruction executed but not recorded
    INSTR: Instruction added to the trace but not executed
    Method Java code Trace Value
    Main while (i < N) { ILOAD 2 3
    ILOAD 1 100
    IF ICMPGE LABEL 1 f alse
    GUARD ICMPLT
    i = op.DoSomething(i); ALOAD 3 IncrOrDecr obj
    ILOAD 2 3
    INVOKEINTERFACE ...
    GUARD CLASS(IncrOrDecr)
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 12 / 29

    View full-size slide

  27. Tracing example (3)
    INSTR: Instruction executed but not recorded
    INSTR: Instruction added to the trace but not executed
    Method Java code Trace Value
    Main while (i < N) { ILOAD 2 3
    ILOAD 1 100
    IF ICMPGE LABEL 1 f alse
    GUARD ICMPLT
    i = op.DoSomething(i); ALOAD 3 IncrOrDecr obj
    ILOAD 2 3
    INVOKEINTERFACE ...
    GUARD CLASS(IncrOrDecr)
    DoSomething if (x < 0) ILOAD 1 3
    IFGE LABEL 0 true
    GUARD GE
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 12 / 29

    View full-size slide

  28. Tracing example (3)
    INSTR: Instruction executed but not recorded
    INSTR: Instruction added to the trace but not executed
    Method Java code Trace Value
    Main while (i < N) { ILOAD 2 3
    ILOAD 1 100
    IF ICMPGE LABEL 1 f alse
    GUARD ICMPLT
    i = op.DoSomething(i); ALOAD 3 IncrOrDecr obj
    ILOAD 2 3
    INVOKEINTERFACE ...
    GUARD CLASS(IncrOrDecr)
    DoSomething if (x < 0) ILOAD 1 3
    IFGE LABEL 0 true
    GUARD GE
    return x+1; ILOAD 1 3
    ICONST 1 1
    IADD 4
    IRETURN
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 12 / 29

    View full-size slide

  29. Tracing example (3)
    INSTR: Instruction executed but not recorded
    INSTR: Instruction added to the trace but not executed
    Method Java code Trace Value
    Main while (i < N) { ILOAD 2 3
    ILOAD 1 100
    IF ICMPGE LABEL 1 f alse
    GUARD ICMPLT
    i = op.DoSomething(i); ALOAD 3 IncrOrDecr obj
    ILOAD 2 3
    INVOKEINTERFACE ...
    GUARD CLASS(IncrOrDecr)
    DoSomething if (x < 0) ILOAD 1 3
    IFGE LABEL 0 true
    GUARD GE
    return x+1; ILOAD 1 3
    ICONST 1 1
    IADD 4
    IRETURN
    Main ISTORE 2
    i = op.DoSomething(i);
    } GOTO LABEL 0
    4
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 12 / 29

    View full-size slide

  30. Trace trees (1)
    tracetree.java
    public static void trace_trees() {
    int a = 0;
    int i = 0;
    int N = 100;
    while(i < N) {
    if (i%2 == 0)
    a++;
    else
    a*=2;
    i++;
    }
    }
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 13 / 29

    View full-size slide

  31. Trace trees (2)
    ILOAD 1
    ILOAD 2
    GUARD ICMPLT
    ILOAD 1
    ICONST 2
    IREM
    GUARD NE
    ILOAD 0
    ICONST 2
    IMUL
    ISTORE 0
    IINC 1 1
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 14 / 29

    View full-size slide

  32. Trace trees (2)
    ILOAD 1
    ILOAD 2
    GUARD ICMPLT
    ILOAD 1
    ICONST 2
    IREM
    GUARD NE
    ILOAD 0
    ICONST 2
    IMUL
    ISTORE 0
    IINC 1 1
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 14 / 29

    View full-size slide

  33. Trace trees (2)
    ILOAD 1
    ILOAD 2
    GUARD ICMPLT
    ILOAD 1
    ICONST 2
    IREM
    GUARD NE
    ILOAD 0
    ICONST 2
    IMUL
    ISTORE 0
    IINC 1 1
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 14 / 29

    View full-size slide

  34. Trace trees (2)
    ILOAD 1
    ILOAD 2
    GUARD ICMPLT
    ILOAD 1
    ICONST 2
    IREM
    GUARD NE
    ILOAD 0
    ICONST 2
    IMUL
    ISTORE 0
    IINC 1 1
    BLACKHOLE
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 14 / 29

    View full-size slide

  35. Trace trees (2)
    ILOAD 1
    ILOAD 2
    GUARD ICMPLT
    ILOAD 1
    ICONST 2
    IREM
    GUARD NE
    ILOAD 0
    ICONST 2
    IMUL
    ISTORE 0
    IINC 1 1
    BLACKHOLE
    INTERPRETER
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 14 / 29

    View full-size slide

  36. Trace trees (2)
    ILOAD 1
    ILOAD 2
    GUARD ICMPLT
    ILOAD 1
    ICONST 2
    IREM
    GUARD NE
    ILOAD 0
    ICONST 2
    IMUL
    ISTORE 0
    IINC 1 1
    BLACKHOLE
    INTERPRETER
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 14 / 29

    View full-size slide

  37. Trace trees (2)
    ILOAD 1
    ILOAD 2
    GUARD ICMPLT
    ILOAD 1
    ICONST 2
    IREM
    GUARD NE
    ILOAD 0
    ICONST 2
    IMUL
    ISTORE 0
    IINC 1 1
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 14 / 29

    View full-size slide

  38. Trace trees (2)
    ILOAD 1
    ILOAD 2
    GUARD ICMPLT
    ILOAD 1
    ICONST 2
    IREM
    GUARD NE
    ILOAD 0
    ICONST 2
    IMUL
    ISTORE 0
    IINC 1 1
    IINC 0 1
    IINC 1 1
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 14 / 29

    View full-size slide

  39. Trace trees (2)
    ILOAD 1
    ILOAD 2
    GUARD ICMPLT
    ILOAD 1
    ICONST 2
    IREM
    GUARD NE
    ILOAD 0
    ICONST 2
    IMUL
    ISTORE 0
    IINC 1 1
    IINC 0 1
    IINC 1 1
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 14 / 29

    View full-size slide

  40. Trace trees (2)
    ILOAD 1
    ILOAD 2
    GUARD ICMPLT
    ILOAD 1
    ICONST 2
    IREM
    GUARD NE
    ILOAD 0
    ICONST 2
    IMUL
    ISTORE 0
    IINC 1 1
    IINC 0 1
    IINC 1 1
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 14 / 29

    View full-size slide

  41. Part 2
    The PyPy JIT generator
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 15 / 29

    View full-size slide

  42. General architecture
    def LOAD_GLOBAL(self):
    ...
    def STORE_FAST(self):
    ...
    def BINARY_ADD(self):
    ...
    RPYTHON
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 16 / 29

    View full-size slide

  43. General architecture
    def LOAD_GLOBAL(self):
    ...
    def STORE_FAST(self):
    ...
    def BINARY_ADD(self):
    ...
    RPYTHON
    CODEWRITER
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 16 / 29

    View full-size slide

  44. General architecture
    def LOAD_GLOBAL(self):
    ...
    def STORE_FAST(self):
    ...
    def BINARY_ADD(self):
    ...
    RPYTHON
    CODEWRITER
    ...
    p0 = getfield_gc(p0, 'func_globals')
    p2 = getfield_gc(p1, 'strval')
    call(dict_lookup, p0, p2)
    ....
    ...
    p0 = getfield_gc(p0, 'locals_w')
    setarrayitem_gc(p0, i0, p1)
    ....
    ...
    promote_class(p0)
    i0 = getfield_gc(p0, 'intval')
    promote_class(p1)
    i1 = getfield_gc(p1, 'intval')
    i2 = int_add(i0, i1)
    if (overflowed) goto ...
    p2 = new_with_vtable('W_IntObject')
    setfield_gc(p2, i2, 'intval')
    ....
    JITCODE
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 16 / 29

    View full-size slide

  45. General architecture
    def LOAD_GLOBAL(self):
    ...
    def STORE_FAST(self):
    ...
    def BINARY_ADD(self):
    ...
    RPYTHON
    CODEWRITER
    ...
    p0 = getfield_gc(p0, 'func_globals')
    p2 = getfield_gc(p1, 'strval')
    call(dict_lookup, p0, p2)
    ....
    ...
    p0 = getfield_gc(p0, 'locals_w')
    setarrayitem_gc(p0, i0, p1)
    ....
    ...
    promote_class(p0)
    i0 = getfield_gc(p0, 'intval')
    promote_class(p1)
    i1 = getfield_gc(p1, 'intval')
    i2 = int_add(i0, i1)
    if (overflowed) goto ...
    p2 = new_with_vtable('W_IntObject')
    setfield_gc(p2, i2, 'intval')
    ....
    JITCODE
    compile-time
    runtime
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 16 / 29

    View full-size slide

  46. General architecture
    def LOAD_GLOBAL(self):
    ...
    def STORE_FAST(self):
    ...
    def BINARY_ADD(self):
    ...
    RPYTHON
    CODEWRITER
    ...
    p0 = getfield_gc(p0, 'func_globals')
    p2 = getfield_gc(p1, 'strval')
    call(dict_lookup, p0, p2)
    ....
    ...
    p0 = getfield_gc(p0, 'locals_w')
    setarrayitem_gc(p0, i0, p1)
    ....
    ...
    promote_class(p0)
    i0 = getfield_gc(p0, 'intval')
    promote_class(p1)
    i1 = getfield_gc(p1, 'intval')
    i2 = int_add(i0, i1)
    if (overflowed) goto ...
    p2 = new_with_vtable('W_IntObject')
    setfield_gc(p2, i2, 'intval')
    ....
    JITCODE
    compile-time
    runtime
    META-TRACER
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 16 / 29

    View full-size slide

  47. General architecture
    def LOAD_GLOBAL(self):
    ...
    def STORE_FAST(self):
    ...
    def BINARY_ADD(self):
    ...
    RPYTHON
    CODEWRITER
    ...
    p0 = getfield_gc(p0, 'func_globals')
    p2 = getfield_gc(p1, 'strval')
    call(dict_lookup, p0, p2)
    ....
    ...
    p0 = getfield_gc(p0, 'locals_w')
    setarrayitem_gc(p0, i0, p1)
    ....
    ...
    promote_class(p0)
    i0 = getfield_gc(p0, 'intval')
    promote_class(p1)
    i1 = getfield_gc(p1, 'intval')
    i2 = int_add(i0, i1)
    if (overflowed) goto ...
    p2 = new_with_vtable('W_IntObject')
    setfield_gc(p2, i2, 'intval')
    ....
    JITCODE
    compile-time
    runtime
    META-TRACER
    OPTIMIZER
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 16 / 29

    View full-size slide

  48. General architecture
    def LOAD_GLOBAL(self):
    ...
    def STORE_FAST(self):
    ...
    def BINARY_ADD(self):
    ...
    RPYTHON
    CODEWRITER
    ...
    p0 = getfield_gc(p0, 'func_globals')
    p2 = getfield_gc(p1, 'strval')
    call(dict_lookup, p0, p2)
    ....
    ...
    p0 = getfield_gc(p0, 'locals_w')
    setarrayitem_gc(p0, i0, p1)
    ....
    ...
    promote_class(p0)
    i0 = getfield_gc(p0, 'intval')
    promote_class(p1)
    i1 = getfield_gc(p1, 'intval')
    i2 = int_add(i0, i1)
    if (overflowed) goto ...
    p2 = new_with_vtable('W_IntObject')
    setfield_gc(p2, i2, 'intval')
    ....
    JITCODE
    compile-time
    runtime
    META-TRACER
    OPTIMIZER
    BACKEND
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 16 / 29

    View full-size slide

  49. General architecture
    def LOAD_GLOBAL(self):
    ...
    def STORE_FAST(self):
    ...
    def BINARY_ADD(self):
    ...
    RPYTHON
    CODEWRITER
    ...
    p0 = getfield_gc(p0, 'func_globals')
    p2 = getfield_gc(p1, 'strval')
    call(dict_lookup, p0, p2)
    ....
    ...
    p0 = getfield_gc(p0, 'locals_w')
    setarrayitem_gc(p0, i0, p1)
    ....
    ...
    promote_class(p0)
    i0 = getfield_gc(p0, 'intval')
    promote_class(p1)
    i1 = getfield_gc(p1, 'intval')
    i2 = int_add(i0, i1)
    if (overflowed) goto ...
    p2 = new_with_vtable('W_IntObject')
    setfield_gc(p2, i2, 'intval')
    ....
    JITCODE
    compile-time
    runtime
    META-TRACER
    OPTIMIZER
    BACKEND
    ASSEMBLER
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 16 / 29

    View full-size slide

  50. PyPy trace example
    def fn():
    c = a+b
    ...
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 17 / 29

    View full-size slide

  51. PyPy trace example
    def fn():
    c = a+b
    ...
    LOAD_GLOBAL A
    LOAD_GLOBAL B
    BINARY_ADD
    STORE_FAST C
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 17 / 29

    View full-size slide

  52. PyPy trace example
    def fn():
    c = a+b
    ...
    LOAD_GLOBAL A
    LOAD_GLOBAL B
    BINARY_ADD
    STORE_FAST C
    ...
    p0 = getfield_gc(p0, 'func_globals')
    p2 = getfield_gc(p1, 'strval')
    call(dict_lookup, p0, p2)
    ...
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 17 / 29

    View full-size slide

  53. PyPy trace example
    def fn():
    c = a+b
    ...
    LOAD_GLOBAL A
    LOAD_GLOBAL B
    BINARY_ADD
    STORE_FAST C
    ...
    p0 = getfield_gc(p0, 'func_globals')
    p2 = getfield_gc(p1, 'strval')
    call(dict_lookup, p0, p2)
    ...
    ...
    p0 = getfield_gc(p0, 'func_globals')
    p2 = getfield_gc(p1, 'strval')
    call(dict_lookup, p0, p2)
    ...
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 17 / 29

    View full-size slide

  54. PyPy trace example
    def fn():
    c = a+b
    ...
    LOAD_GLOBAL A
    LOAD_GLOBAL B
    BINARY_ADD
    STORE_FAST C
    ...
    p0 = getfield_gc(p0, 'func_globals')
    p2 = getfield_gc(p1, 'strval')
    call(dict_lookup, p0, p2)
    ...
    ...
    p0 = getfield_gc(p0, 'func_globals')
    p2 = getfield_gc(p1, 'strval')
    call(dict_lookup, p0, p2)
    ...
    ...
    guard_class(p0, W_IntObject)
    i0 = getfield_gc(p0, 'intval')
    guard_class(p1, W_IntObject)
    i1 = getfield_gc(p1, 'intval')
    i2 = int_add(00, i1)
    guard_not_overflow()
    p2 = new_with_vtable('W_IntObject')
    setfield_gc(p2, i2, 'intval')
    ...
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 17 / 29

    View full-size slide

  55. PyPy trace example
    def fn():
    c = a+b
    ...
    LOAD_GLOBAL A
    LOAD_GLOBAL B
    BINARY_ADD
    STORE_FAST C
    ...
    p0 = getfield_gc(p0, 'func_globals')
    p2 = getfield_gc(p1, 'strval')
    call(dict_lookup, p0, p2)
    ...
    ...
    p0 = getfield_gc(p0, 'func_globals')
    p2 = getfield_gc(p1, 'strval')
    call(dict_lookup, p0, p2)
    ...
    ...
    guard_class(p0, W_IntObject)
    i0 = getfield_gc(p0, 'intval')
    guard_class(p1, W_IntObject)
    i1 = getfield_gc(p1, 'intval')
    i2 = int_add(00, i1)
    guard_not_overflow()
    p2 = new_with_vtable('W_IntObject')
    setfield_gc(p2, i2, 'intval')
    ...
    ...
    p0 = getfield_gc(p0, 'locals_w')
    setarrayitem_gc(p0, i0, p1)
    ....
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 17 / 29

    View full-size slide

  56. PyPy optimizer
    intbounds
    constant folding / pure operations
    virtuals
    string optimizations
    heap (multiple get/setfield, etc)
    ffi
    unroll
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 18 / 29

    View full-size slide

  57. Intbound optimization (1)
    intbound.py
    def fn():
    i = 0
    while i < 5000:
    i += 2
    return i
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 19 / 29

    View full-size slide

  58. Intbound optimization (2)
    unoptimized
    ...
    i17 = int_lt(i15, 5000)
    guard_true(i17)
    i19 = int_add_ovf(i15, 2)
    guard_no_overflow()
    ...
    optimized
    ...
    i17 = int_lt(i15, 5000)
    guard_true(i17)
    i19 = int_add(i15, 2)
    ...
    It works often
    array bound checking
    intbound info propagates all over the trace
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 20 / 29

    View full-size slide

  59. Intbound optimization (2)
    unoptimized
    ...
    i17 = int_lt(i15, 5000)
    guard_true(i17)
    i19 = int_add_ovf(i15, 2)
    guard_no_overflow()
    ...
    optimized
    ...
    i17 = int_lt(i15, 5000)
    guard_true(i17)
    i19 = int_add(i15, 2)
    ...
    It works often
    array bound checking
    intbound info propagates all over the trace
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 20 / 29

    View full-size slide

  60. Intbound optimization (2)
    unoptimized
    ...
    i17 = int_lt(i15, 5000)
    guard_true(i17)
    i19 = int_add_ovf(i15, 2)
    guard_no_overflow()
    ...
    optimized
    ...
    i17 = int_lt(i15, 5000)
    guard_true(i17)
    i19 = int_add(i15, 2)
    ...
    It works often
    array bound checking
    intbound info propagates all over the trace
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 20 / 29

    View full-size slide

  61. Virtuals (1)
    virtuals.py
    def fn():
    i = 0
    while i < 5000:
    i += 2
    return i
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 21 / 29

    View full-size slide

  62. Virtuals (2)
    unoptimized
    ...
    guard_class(p0, W_IntObject)
    i1 = getfield_pure(p0, ’intval’)
    i2 = int_add(i1, 2)
    p3 = new(W_IntObject)
    setfield_gc(p3, i2, ’intval’)
    ...
    optimized
    ...
    i2 = int_add(i1, 2)
    ...
    The most important optimization (TM)
    It works both inside the trace and across the loop
    It works for tons of cases
    e.g. function frames
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 22 / 29

    View full-size slide

  63. Virtuals (2)
    unoptimized
    ...
    guard_class(p0, W_IntObject)
    i1 = getfield_pure(p0, ’intval’)
    i2 = int_add(i1, 2)
    p3 = new(W_IntObject)
    setfield_gc(p3, i2, ’intval’)
    ...
    optimized
    ...
    i2 = int_add(i1, 2)
    ...
    The most important optimization (TM)
    It works both inside the trace and across the loop
    It works for tons of cases
    e.g. function frames
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 22 / 29

    View full-size slide

  64. Virtuals (2)
    unoptimized
    ...
    guard_class(p0, W_IntObject)
    i1 = getfield_pure(p0, ’intval’)
    i2 = int_add(i1, 2)
    p3 = new(W_IntObject)
    setfield_gc(p3, i2, ’intval’)
    ...
    optimized
    ...
    i2 = int_add(i1, 2)
    ...
    The most important optimization (TM)
    It works both inside the trace and across the loop
    It works for tons of cases
    e.g. function frames
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 22 / 29

    View full-size slide

  65. Constant folding (1)
    constfold.py
    def fn():
    i = 0
    while i < 5000:
    i += 2
    return i
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 23 / 29

    View full-size slide

  66. Constant folding (2)
    unoptimized
    ...
    i1 = getfield_pure(p0, ’intval’)
    i2 = getfield_pure(,
    ’intval’)
    i3 = int_add(i1, i2)
    ...
    optimized
    ...
    i1 = getfield_pure(p0, ’intval’)
    i3 = int_add(i1, 2)
    ...
    It “finishes the job”
    Works well together with other optimizations (e.g.
    virtuals)
    It also does “normal, boring, static” constant-folding
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 24 / 29

    View full-size slide

  67. Constant folding (2)
    unoptimized
    ...
    i1 = getfield_pure(p0, ’intval’)
    i2 = getfield_pure(,
    ’intval’)
    i3 = int_add(i1, i2)
    ...
    optimized
    ...
    i1 = getfield_pure(p0, ’intval’)
    i3 = int_add(i1, 2)
    ...
    It “finishes the job”
    Works well together with other optimizations (e.g.
    virtuals)
    It also does “normal, boring, static” constant-folding
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 24 / 29

    View full-size slide

  68. Constant folding (2)
    unoptimized
    ...
    i1 = getfield_pure(p0, ’intval’)
    i2 = getfield_pure(,
    ’intval’)
    i3 = int_add(i1, i2)
    ...
    optimized
    ...
    i1 = getfield_pure(p0, ’intval’)
    i3 = int_add(i1, 2)
    ...
    It “finishes the job”
    Works well together with other optimizations (e.g.
    virtuals)
    It also does “normal, boring, static” constant-folding
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 24 / 29

    View full-size slide

  69. Out of line guards (1)
    outoflineguards.py
    N = 2
    def fn():
    i = 0
    while i < 5000:
    i += N
    return i
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 25 / 29

    View full-size slide

  70. Out of line guards (2)
    unoptimized
    ...
    quasiimmut_field(, ’val’)
    guard_not_invalidated()
    p0 = getfield_gc(, ’val’)
    ...
    i2 = getfield_pure(p0, ’intval’)
    i3 = int_add(i1, i2)
    optimized
    ...
    guard_not_invalidated()
    ...
    i3 = int_add(i1, 2)
    ...
    Python is too dynamic, but we don’t care :-)
    No overhead in assembler code
    Used a bit “everywhere”
    Credits to Mark Shannon
    for the name :-)
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 26 / 29

    View full-size slide

  71. Out of line guards (2)
    unoptimized
    ...
    quasiimmut_field(, ’val’)
    guard_not_invalidated()
    p0 = getfield_gc(, ’val’)
    ...
    i2 = getfield_pure(p0, ’intval’)
    i3 = int_add(i1, i2)
    optimized
    ...
    guard_not_invalidated()
    ...
    i3 = int_add(i1, 2)
    ...
    Python is too dynamic, but we don’t care :-)
    No overhead in assembler code
    Used a bit “everywhere”
    Credits to Mark Shannon
    for the name :-)
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 26 / 29

    View full-size slide

  72. Out of line guards (2)
    unoptimized
    ...
    quasiimmut_field(, ’val’)
    guard_not_invalidated()
    p0 = getfield_gc(, ’val’)
    ...
    i2 = getfield_pure(p0, ’intval’)
    i3 = int_add(i1, i2)
    optimized
    ...
    guard_not_invalidated()
    ...
    i3 = int_add(i1, 2)
    ...
    Python is too dynamic, but we don’t care :-)
    No overhead in assembler code
    Used a bit “everywhere”
    Credits to Mark Shannon
    for the name :-)
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 26 / 29

    View full-size slide

  73. Guards
    guard_true
    guard_false
    guard_class
    guard_no_overflow
    guard_value
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 27 / 29

    View full-size slide

  74. Promotion
    guard_value
    specialize code
    make sure not to overspecialize
    example: type of objects
    example: function code objects, ...
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 28 / 29

    View full-size slide

  75. Conclusion
    PyPy is cool :-)
    Any question?
    antocuni (PyCon UK 2012) PyPy JIT under the hood September 28, 2012 29 / 29

    View full-size slide