$30 off During Our Annual Pro Sale. View Details »

Writing a basic x86-64 JIT compiler from scratch in stock Python

Writing a basic x86-64 JIT compiler from scratch in stock Python

Shows how to JIT compile simple Python functions to native x86-64 machine code at runtime. Everything is done from scratch, using nothing but the built-in Python modules.

Christian Stigen Larsen

January 23, 2018
Tweet

More Decks by Christian Stigen Larsen

Other Decks in Technology

Transcript

  1. Writing a basic x86-64 JIT
    compiler from scratch in
    stock Python
    Christian Stigen Larsen, 2018-01-23
    https://csl.name

    View Slide

  2. View Slide

  3. from jitcompiler import jit
    @jit
    def foo(a, b):
    return a*a - b*b

    View Slide

  4. from jitcompiler import *
    @jit
    def foo(a, b):
    return a*a - b*b

    View Slide

  5. from jitcompiler import *
    @jit
    def foo(a, b):
    return a*a - b*b

    View Slide

  6. from jitcompiler import *
    @jit
    def foo(a, b):
    return a*a - b*b

    View Slide

  7. from jitcompiler import *
    @jit
    def foo(a, b):
    return a*a - b*b
    >>> foo(2, 3)

    View Slide

  8. from jitcompiler import *
    @jit
    def foo(a, b):
    return a*a - b*b
    >>> foo(2, 3)
    --- JIT-compiling

    View Slide

  9. from jitcompiler import *
    @jit
    def foo(a, b):
    return a*a - b*b
    >>> foo(2, 3)
    --- JIT-compiling
    -5

    View Slide

  10. from jitcompiler import *
    @jit
    def foo(a, b):
    return a*a - b*b
    >>> foo(2, 3)
    --- JIT-compiling
    -5
    >>> foo(3, 4)

    View Slide

  11. from jitcompiler import *
    @jit
    def foo(a, b):
    return a*a - b*b
    >>> foo(2, 3)
    --- JIT-compiling
    -5
    >>> foo(3, 4)
    -7

    View Slide

  12. View Slide

  13. >>> print(disassemble(foo))
    0x100b1d000 48 89 fb mov rbx, rdi
    0x100b1d003 48 89 f8 mov rax, rdi
    0x100b1d006 48 0f af c3 imul rax, rbx
    0x100b1d00a 50 push rax
    0x100b1d00b 48 89 f3 mov rbx, rsi
    0x100b1d00e 48 89 f0 mov rax, rsi
    0x100b1d011 48 0f af c3 imul rax, rbx
    0x100b1d015 48 89 c3 mov rbx, rax
    0x100b1d018 58 pop rax
    0x100b1d019 48 29 d8 sub rax, rbx
    0x100b1d01c c3 ret

    View Slide

  14. >>> print(disassemble(foo))
    0x100b1d000 48 89 fb mov rbx, rdi
    0x100b1d003 48 89 f8 mov rax, rdi
    0x100b1d006 48 0f af c3 imul rax, rbx
    0x100b1d00a 50 push rax
    0x100b1d00b 48 89 f3 mov rbx, rsi
    0x100b1d00e 48 89 f0 mov rax, rsi
    0x100b1d011 48 0f af c3 imul rax, rbx
    0x100b1d015 48 89 c3 mov rbx, rax
    0x100b1d018 58 pop rax
    0x100b1d019 48 29 d8 sub rax, rbx
    0x100b1d01c c3 ret

    View Slide

  15. View Slide

  16. Strategy
    Python Bound Name
    def foo

    View Slide

  17. Python Function
    Strategy
    Bytecode
    Python Bound Name
    def foo

    View Slide

  18. Python Function
    Strategy
    Python Bound Name
    Bytecode
    Bytecode
    def foo

    View Slide

  19. Python Function
    Strategy
    Python Bound Name
    Bytecode
    Bytecode Bytecode
    def foo

    View Slide

  20. Python Function
    Strategy
    Python Bound Name
    Bytecode
    Bytecode Bytecode
    IR
    def foo

    View Slide

  21. Python Function
    Strategy
    Python Bound Name
    Bytecode
    Bytecode Bytecode
    IR
    Optimized IR
    def foo

    View Slide

  22. Python Function
    Strategy
    Python Bound Name
    Bytecode
    Bytecode Bytecode
    IR
    Optimized IR
    x86-64 Machine Code
    def foo

    View Slide

  23. Python Function
    Strategy
    Python Bound Name
    Memory Page
    Bytes …
    Read + Write
    Bytecode
    Bytecode Bytecode
    IR
    Optimized IR
    x86-64 Machine Code
    def foo

    View Slide

  24. Python Function
    Strategy
    Python Bound Name
    Memory Page
    Bytes …
    Read + Write
    Bytecode
    x86-64 Machine Code
    def foo

    View Slide

  25. Python Function
    Strategy
    Python Bound Name
    Memory Page
    Bytes …
    Read + Write
    Bytecode
    x86-64 Machine Code
    Read + Execute
    def foo

    View Slide

  26. Strategy
    Python Bound Name
    Memory Page
    Bytes …
    Read + Write
    x86-64 Machine Code
    Read + Execute
    FFI
    def foo

    View Slide

  27. Intermission: Why JIT?

    View Slide

  28. Intermission: Why JIT?
    Because we can, and it’s awesome.

    View Slide

  29. View Slide

  30. Python Bytecode
    def foo(a, b):
    return a*a - b*b

    View Slide

  31. Python Bytecode
    def foo(a, b):
    return a*a - b*b
    >>> import dis
    >>> dis.dis(foo)

    View Slide

  32. Python Bytecode
    def foo(a, b):
    return a*a - b*b
    >>> import dis
    >>> dis.dis(foo)
    2 0 LOAD_FAST 0 (a)
    3 LOAD_FAST 0 (a)
    6 BINARY_MULTIPLY
    7 LOAD_FAST 1 (b)
    10 LOAD_FAST 1 (b)
    13 BINARY_MULTIPLY
    14 BINARY_SUBTRACT
    15 RETURN_VALUE

    View Slide

  33. Python Bytecode
    def foo(a, b):
    return a*a - b*b
    >>> import dis
    >>> dis.dis(foo)
    2 0 LOAD_FAST 0 (a)
    3 LOAD_FAST 0 (a)
    6 BINARY_MULTIPLY
    7 LOAD_FAST 1 (b)
    10 LOAD_FAST 1 (b)
    13 BINARY_MULTIPLY
    14 BINARY_SUBTRACT
    15 RETURN_VALUE

    View Slide

  34. Python Bytecode
    def foo(a, b):
    return a*a - b*b
    >>> import dis
    >>> dis.dis(foo)
    2 0 LOAD_FAST 0 (a)
    3 LOAD_FAST 0 (a)
    6 BINARY_MULTIPLY
    7 LOAD_FAST 1 (b)
    10 LOAD_FAST 1 (b)
    13 BINARY_MULTIPLY
    14 BINARY_SUBTRACT
    15 RETURN_VALUE

    View Slide

  35. Python Bytecode
    def foo(a, b):
    return a*a - b*b
    >>> import dis
    >>> dis.dis(foo)
    2 0 LOAD_FAST 0 (a)
    3 LOAD_FAST 0 (a)
    6 BINARY_MULTIPLY
    7 LOAD_FAST 1 (b)
    10 LOAD_FAST 1 (b)
    13 BINARY_MULTIPLY
    14 BINARY_SUBTRACT
    15 RETURN_VALUE

    View Slide

  36. Python Bytecode
    def foo(a, b):
    return a*a - b*b
    >>> import dis
    >>> dis.dis(foo)
    2 0 LOAD_FAST 0 (a)
    3 LOAD_FAST 0 (a)
    6 BINARY_MULTIPLY
    7 LOAD_FAST 1 (b)
    10 LOAD_FAST 1 (b)
    13 BINARY_MULTIPLY
    14 BINARY_SUBTRACT
    15 RETURN_VALUE

    View Slide

  37. Python Bytecode
    def foo(a, b):
    return a*a - b*b
    >>> import dis
    >>> dis.dis(foo)
    2 0 LOAD_FAST 0 (a)
    3 LOAD_FAST 0 (a)
    6 BINARY_MULTIPLY
    7 LOAD_FAST 1 (b)
    10 LOAD_FAST 1 (b)
    13 BINARY_MULTIPLY
    14 BINARY_SUBTRACT
    15 RETURN_VALUE

    View Slide

  38. Python Bytecode
    def foo(a, b):
    return a*a - b*b
    >>> import dis
    >>> dis.dis(foo)
    2 0 LOAD_FAST 0 (a)
    3 LOAD_FAST 0 (a)
    6 BINARY_MULTIPLY
    7 LOAD_FAST 1 (b)
    10 LOAD_FAST 1 (b)
    13 BINARY_MULTIPLY
    14 BINARY_SUBTRACT
    15 RETURN_VALUE

    View Slide

  39. Python Bytecode
    def foo(a, b):
    return a*a - b*b
    >>> import dis
    >>> dis.dis(foo)
    2 0 LOAD_FAST 0 (a)
    3 LOAD_FAST 0 (a)
    6 BINARY_MULTIPLY
    7 LOAD_FAST 1 (b)
    10 LOAD_FAST 1 (b)
    13 BINARY_MULTIPLY
    14 BINARY_SUBTRACT
    15 RETURN_VALUE

    View Slide

  40. 0 LOAD_FAST 0 (a)
    3 LOAD_FAST 0 (a)
    6 BINARY_MULTIPLY
    7 LOAD_FAST 1 (b)
    10 LOAD_FAST 1 (b)
    13 BINARY_MULTIPLY
    14 BINARY_SUBTRACT
    15 RETURN_VALUE

    View Slide

  41. 0 LOAD_FAST 0 (a)
    3 LOAD_FAST 0 (a)
    6 BINARY_MULTIPLY
    7 LOAD_FAST 1 (b)
    10 LOAD_FAST 1 (b)
    13 BINARY_MULTIPLY
    14 BINARY_SUBTRACT
    15 RETURN_VALUE

    View Slide

  42. 0 LOAD_FAST 0 (a)
    3 LOAD_FAST 0 (a)
    6 BINARY_MULTIPLY
    7 LOAD_FAST 1 (b)
    10 LOAD_FAST 1 (b)
    13 BINARY_MULTIPLY
    14 BINARY_SUBTRACT
    15 RETURN_VALUE
    a

    View Slide

  43. 0 LOAD_FAST 0 (a)
    3 LOAD_FAST 0 (a)
    6 BINARY_MULTIPLY
    7 LOAD_FAST 1 (b)
    10 LOAD_FAST 1 (b)
    13 BINARY_MULTIPLY
    14 BINARY_SUBTRACT
    15 RETURN_VALUE
    a a

    View Slide

  44. 0 LOAD_FAST 0 (a)
    3 LOAD_FAST 0 (a)
    6 BINARY_MULTIPLY
    7 LOAD_FAST 1 (b)
    10 LOAD_FAST 1 (b)
    13 BINARY_MULTIPLY
    14 BINARY_SUBTRACT
    15 RETURN_VALUE
    a*a

    View Slide

  45. 0 LOAD_FAST 0 (a)
    3 LOAD_FAST 0 (a)
    6 BINARY_MULTIPLY
    7 LOAD_FAST 1 (b)
    10 LOAD_FAST 1 (b)
    13 BINARY_MULTIPLY
    14 BINARY_SUBTRACT
    15 RETURN_VALUE
    a*a b

    View Slide

  46. 0 LOAD_FAST 0 (a)
    3 LOAD_FAST 0 (a)
    6 BINARY_MULTIPLY
    7 LOAD_FAST 1 (b)
    10 LOAD_FAST 1 (b)
    13 BINARY_MULTIPLY
    14 BINARY_SUBTRACT
    15 RETURN_VALUE
    a*a b b

    View Slide

  47. 0 LOAD_FAST 0 (a)
    3 LOAD_FAST 0 (a)
    6 BINARY_MULTIPLY
    7 LOAD_FAST 1 (b)
    10 LOAD_FAST 1 (b)
    13 BINARY_MULTIPLY
    14 BINARY_SUBTRACT
    15 RETURN_VALUE
    a*a b*b

    View Slide

  48. 0 LOAD_FAST 0 (a)
    3 LOAD_FAST 0 (a)
    6 BINARY_MULTIPLY
    7 LOAD_FAST 1 (b)
    10 LOAD_FAST 1 (b)
    13 BINARY_MULTIPLY
    14 BINARY_SUBTRACT
    15 RETURN_VALUE
    a*a-b*b

    View Slide

  49. 0 LOAD_FAST 0 (a)
    3 LOAD_FAST 0 (a)
    6 BINARY_MULTIPLY
    7 LOAD_FAST 1 (b)
    10 LOAD_FAST 1 (b)
    13 BINARY_MULTIPLY
    14 BINARY_SUBTRACT
    15 RETURN_VALUE
    a*a-b*b

    View Slide

  50. 0 LOAD_FAST 0 (a)

    View Slide

  51. 0 LOAD_FAST 0 (a)
    >>> foo.__code__.co_code
    ‘|\x00\x00|\x00\x00\x14|\x01\x00|\x01\x00\x14\x185’

    View Slide

  52. 0 LOAD_FAST 0 (a)
    >>> foo.__code__.co_code
    ‘|\x00\x00|\x00\x00\x14|\x01\x00|\x01\x00\x14\x185’

    View Slide

  53. 0 LOAD_FAST 0 (a)
    >>> foo.__code__.co_code
    ‘|\x00\x00|\x00\x00\x14|\x01\x00|\x01\x00\x14\x185’

    View Slide

  54. 0 LOAD_FAST 0 (a)
    >>> foo.__code__.co_code
    ‘|\x00\x00|\x00\x00\x14|\x01\x00|\x01\x00\x14\x185’

    View Slide

  55. 0 LOAD_FAST 0 (a)
    >>> foo.__code__.co_code
    ‘|\x00\x00|\x00\x00\x14|\x01\x00|\x01\x00\x14\x185’
    >>> bytecode = map(ord, foo.__code__.co_code)

    View Slide

  56. 0 LOAD_FAST 0 (a)
    >>> foo.__code__.co_code
    ‘|\x00\x00|\x00\x00\x14|\x01\x00|\x01\x00\x14\x185’
    >>> bytecode = map(ord, foo.__code__.co_code)
    >>> opcode = bytecode[0]

    View Slide

  57. 0 LOAD_FAST 0 (a)
    >>> foo.__code__.co_code
    ‘|\x00\x00|\x00\x00\x14|\x01\x00|\x01\x00\x14\x185’
    >>> bytecode = map(ord, foo.__code__.co_code)
    >>> opcode = bytecode[0]
    >>> opcode
    124

    View Slide

  58. 0 LOAD_FAST 0 (a)
    >>> foo.__code__.co_code
    ‘|\x00\x00|\x00\x00\x14|\x01\x00|\x01\x00\x14\x185’
    >>> bytecode = map(ord, foo.__code__.co_code)
    >>> opcode = bytecode[0]
    >>> opcode
    124
    >>> dis.opname[opcode]

    View Slide

  59. 0 LOAD_FAST 0 (a)
    >>> foo.__code__.co_code
    ‘|\x00\x00|\x00\x00\x14|\x01\x00|\x01\x00\x14\x185’
    >>> bytecode = map(ord, foo.__code__.co_code)
    >>> opcode = bytecode[0]
    >>> opcode
    124
    >>> dis.opname[opcode]
    ‘LOAD_FAST’

    View Slide

  60. 0 LOAD_FAST 0 (a)
    >>> foo.__code__.co_code
    ‘|\x00\x00|\x00\x00\x14|\x01\x00|\x01\x00\x14\x185’
    >>> bytecode = map(ord, foo.__code__.co_code)
    >>> opcode = bytecode[0]
    >>> opcode
    124
    >>> dis.opname[opcode]
    ‘LOAD_FAST’
    >>> arg = bytecode[1] | bytecode[2] << 8

    View Slide

  61. 0 LOAD_FAST 0 (a)
    >>> foo.__code__.co_code
    ‘|\x00\x00|\x00\x00\x14|\x01\x00|\x01\x00\x14\x185’
    >>> bytecode = map(ord, foo.__code__.co_code)
    >>> opcode = bytecode[0]
    >>> opcode
    124
    >>> dis.opname[opcode]
    ‘LOAD_FAST’
    >>> arg = bytecode[1] | bytecode[2] << 8
    >>> arg
    0

    View Slide

  62. 0 LOAD_FAST 0 (a)
    >>> foo.__code__.co_code
    ‘|\x00\x00|\x00\x00\x14|\x01\x00|\x01\x00\x14\x185’
    >>> bytecode = map(ord, foo.__code__.co_code)
    >>> opcode = bytecode[0]
    >>> opcode
    124
    >>> dis.opname[opcode]
    ‘LOAD_FAST’
    >>> arg = bytecode[1] | bytecode[2] << 8
    >>> arg
    0
    >>> foo.__code__.co_varnames[arg]
    ‘a’

    View Slide

  63. View Slide

  64. >>> def bar(): return 123

    View Slide

  65. >>> dis.dis(bar)
    1 0 LOAD_CONST 1 (123)
    3 RETURN_VALUE
    >>> def bar(): return 123

    View Slide

  66. >>> dis.dis(bar)
    1 0 LOAD_CONST 1 (123)
    3 RETURN_VALUE
    >>> def bar(): return 123

    View Slide

  67. >>> dis.dis(bar)
    1 0 LOAD_CONST 1 (123)
    3 RETURN_VALUE
    >>> def bar(): return 123
    >>> bar.__code__.co_consts[1]
    123

    View Slide

  68. >>> dis.dis(bar)
    1 0 LOAD_CONST 1 (123)
    3 RETURN_VALUE
    >>> def bar(): return 123
    >>> bar.__code__.co_consts[1]
    123

    View Slide

  69. View Slide

  70. class Compiler(object):
    # …

    View Slide

  71. class Compiler(object):
    # …
    def fetch(self):
    byte = self.bytecode[self.index]

    View Slide

  72. class Compiler(object):
    # …
    def fetch(self):
    byte = self.bytecode[self.index]
    self.index += 1
    return byte

    View Slide

  73. class Compiler(object):
    # …
    def fetch(self):
    byte = self.bytecode[self.index]
    self.index += 1
    return byte
    def decode(self):
    opcode = self.fetch()
    opname = dis.opname[opcode]

    View Slide

  74. class Compiler(object):
    # …
    def fetch(self):
    byte = self.bytecode[self.index]
    self.index += 1
    return byte
    def decode(self):
    opcode = self.fetch()
    opname = dis.opname[opcode]
    if takes_arg(opname):
    arg = self.fetch() | self.fetch() << 8
    else:
    arg = None

    View Slide

  75. class Compiler(object):
    # …
    def fetch(self):
    byte = self.bytecode[self.index]
    self.index += 1
    return byte
    def decode(self):
    opcode = self.fetch()
    opname = dis.opname[opcode]
    if takes_arg(opname):
    arg = self.fetch() | self.fetch() << 8
    else:
    arg = None
    return opname, arg

    View Slide

  76. View Slide

  77. def compile(self):
    while self.index < len(self.bytecode):
    op, arg = self.decode()
    if op == “LOAD_FAST”:
    yield “push”, self.variable(arg), None
    elif …

    View Slide

  78. yield “push”, self.variable(arg), None

    View Slide

  79. yield “push”, self.variable(arg), None

    View Slide

  80. yield “push”, self.variable(arg), None
    IR & Assembly

    View Slide

  81. yield “push”, self.variable(arg), None
    IR & Assembly
    rax
    rbx
    rcx

    rsi
    rdi

    Registers
    return address
    Stack

    View Slide

  82. push
    yield “push”, self.variable(arg), None
    IR & Assembly
    rax
    rbx
    rcx

    rsi
    rdi

    Registers
    return address
    Stack

    View Slide

  83. push
    yield “push”, self.variable(arg), None
    IR & Assembly
    rax
    rbx
    rcx

    rsi
    rdi

    Registers
    return address
    Stack

    View Slide

  84. push
    yield “push”, self.variable(arg), None
    IR & Assembly
    rax
    rbx
    rcx

    rsi
    rdi

    Registers
    return address
    Stack

    View Slide

  85. push
    yield “push”, self.variable(arg), None
    IR & Assembly
    rax
    rbx
    rcx

    rsi
    rdi

    Registers
    return address
    Stack

    View Slide

  86. push
    yield “push”, self.variable(arg), None
    IR & Assembly
    rax
    rbx
    rcx

    rsi
    rdi

    Registers
    return address
    Stack
    def variable(self, index):
    passing_order = (“rdi”, “rsi”, “rdx”, “rcx”)
    return passing_order[index]

    View Slide

  87. push
    yield “push”, self.variable(arg), None
    IR & Assembly
    rax
    rbx
    rcx

    rsi
    rdi

    Registers
    return address
    Stack

    View Slide

  88. push
    yield “push”, self.variable(arg), None
    IR & Assembly
    rax
    rbx
    rcx

    rsi
    rdi

    Registers
    return address
    Stack
    rdi

    View Slide

  89. View Slide

  90. def baz(a, b): a = b

    View Slide

  91. def baz(a, b): a = b
    LOAD_FAST 1 (b)
    STORE_FAST 0 (a)
    LOAD_CONST 0 (None)
    RETURN_VALUE

    View Slide

  92. LOAD_FAST 1
    def baz(a, b): a = b
    LOAD_FAST 1 (b)
    STORE_FAST 0 (a)
    LOAD_CONST 0 (None)
    RETURN_VALUE

    View Slide

  93. LOAD_FAST 1 push rsi
    def baz(a, b): a = b
    LOAD_FAST 1 (b)
    STORE_FAST 0 (a)
    LOAD_CONST 0 (None)
    RETURN_VALUE

    View Slide

  94. LOAD_FAST 1 push rsi
    STORE_FAST 0
    def baz(a, b): a = b
    LOAD_FAST 1 (b)
    STORE_FAST 0 (a)
    LOAD_CONST 0 (None)
    RETURN_VALUE

    View Slide

  95. LOAD_FAST 1 push rsi
    STORE_FAST 0 pop rax
    mov rdi, rax
    def baz(a, b): a = b
    LOAD_FAST 1 (b)
    STORE_FAST 0 (a)
    LOAD_CONST 0 (None)
    RETURN_VALUE

    View Slide

  96. LOAD_FAST 1 push rsi
    STORE_FAST 0 pop rax
    mov rdi, rax
    def baz(a, b): a = b
    LOAD_FAST 1 (b)
    STORE_FAST 0 (a)
    LOAD_CONST 0 (None)
    RETURN_VALUE
    LOAD_CONST 0

    View Slide

  97. LOAD_FAST 1 push rsi
    STORE_FAST 0 pop rax
    mov rdi, rax
    def baz(a, b): a = b
    LOAD_FAST 1 (b)
    STORE_FAST 0 (a)
    LOAD_CONST 0 (None)
    RETURN_VALUE
    LOAD_CONST 0 imm rax, 0
    push rax

    View Slide

  98. LOAD_FAST 1 push rsi
    STORE_FAST 0 pop rax
    mov rdi, rax
    def baz(a, b): a = b
    LOAD_FAST 1 (b)
    STORE_FAST 0 (a)
    LOAD_CONST 0 (None)
    RETURN_VALUE
    LOAD_CONST 0 imm rax, 0
    push rax
    RETURN_VALUE

    View Slide

  99. LOAD_FAST 1 push rsi
    STORE_FAST 0 pop rax
    mov rdi, rax
    def baz(a, b): a = b
    LOAD_FAST 1 (b)
    STORE_FAST 0 (a)
    LOAD_CONST 0 (None)
    RETURN_VALUE
    LOAD_CONST 0 imm rax, 0
    push rax
    RETURN_VALUE pop rax
    ret

    View Slide

  100. push rsi
    pop rax
    mov rdi, rax
    def baz(a, b): a = b
    imm rax, 0
    push rax
    pop rax
    ret

    View Slide

  101. push rsi
    pop rax
    mov rdi, rax
    def baz(a, b): a = b
    imm rax, 0
    push rax
    pop rax
    ret
    >>> baz(11, 22)

    View Slide

  102. push rsi
    pop rax
    mov rdi, rax
    def baz(a, b): a = b
    imm rax, 0
    push rax
    pop rax
    ret
    >>> baz(11, 22)
    rax
    rsi
    rdi
    return address

    View Slide

  103. push rsi
    pop rax
    mov rdi, rax
    def baz(a, b): a = b
    imm rax, 0
    push rax
    pop rax
    ret
    >>> baz(11, 22)
    rax
    rsi 22
    rdi 11
    return address

    View Slide

  104. push rsi
    pop rax
    mov rdi, rax
    def baz(a, b): a = b
    imm rax, 0
    push rax
    pop rax
    ret
    >>> baz(11, 22)
    rax
    rsi 22
    rdi 11
    return address

    View Slide

  105. push rsi
    pop rax
    mov rdi, rax
    def baz(a, b): a = b
    imm rax, 0
    push rax
    pop rax
    ret
    >>> baz(11, 22)
    rax
    rsi 22
    rdi 11
    return address

    View Slide

  106. push rsi
    pop rax
    mov rdi, rax
    def baz(a, b): a = b
    imm rax, 0
    push rax
    pop rax
    ret
    >>> baz(11, 22)
    rax
    rsi 22
    rdi 11
    return address
    22

    View Slide

  107. push rsi
    pop rax
    mov rdi, rax
    def baz(a, b): a = b
    imm rax, 0
    push rax
    pop rax
    ret
    >>> baz(11, 22)
    rax
    rsi 22
    rdi 11
    return address
    22

    View Slide

  108. push rsi
    pop rax
    mov rdi, rax
    def baz(a, b): a = b
    imm rax, 0
    push rax
    pop rax
    ret
    >>> baz(11, 22)
    rax 22
    rsi 22
    rdi 11
    return address

    View Slide

  109. push rsi
    pop rax
    mov rdi, rax
    def baz(a, b): a = b
    imm rax, 0
    push rax
    pop rax
    ret
    >>> baz(11, 22)
    rax 22
    rsi 22
    rdi 11
    return address

    View Slide

  110. push rsi
    pop rax
    mov rdi, rax
    def baz(a, b): a = b
    imm rax, 0
    push rax
    pop rax
    ret
    >>> baz(11, 22)
    rax 22
    rsi 22
    rdi 22
    return address

    View Slide

  111. push rsi
    pop rax
    mov rdi, rax
    def baz(a, b): a = b
    imm rax, 0
    push rax
    pop rax
    ret
    >>> baz(11, 22)
    rax 22
    rsi 22
    rdi 22
    return address

    View Slide

  112. push rsi
    pop rax
    mov rdi, rax
    def baz(a, b): a = b
    imm rax, 0
    push rax
    pop rax
    ret
    >>> baz(11, 22)
    rax 0
    rsi 22
    rdi 22
    return address

    View Slide

  113. push rsi
    pop rax
    mov rdi, rax
    def baz(a, b): a = b
    imm rax, 0
    push rax
    pop rax
    ret
    >>> baz(11, 22)
    rax 0
    rsi 22
    rdi 22
    return address

    View Slide

  114. push rsi
    pop rax
    mov rdi, rax
    def baz(a, b): a = b
    imm rax, 0
    push rax
    pop rax
    ret
    >>> baz(11, 22)
    rax 0
    rsi 22
    rdi 22
    return address
    0

    View Slide

  115. push rsi
    pop rax
    mov rdi, rax
    def baz(a, b): a = b
    imm rax, 0
    push rax
    pop rax
    ret
    >>> baz(11, 22)
    rax 0
    rsi 22
    rdi 22
    return address
    0

    View Slide

  116. push rsi
    pop rax
    mov rdi, rax
    def baz(a, b): a = b
    imm rax, 0
    push rax
    pop rax
    ret
    >>> baz(11, 22)
    rax 0
    rsi 22
    rdi 22
    return address

    View Slide

  117. push rsi
    pop rax
    mov rdi, rax
    def baz(a, b): a = b
    imm rax, 0
    push rax
    pop rax
    ret
    >>> baz(11, 22)
    rax 0
    rsi 22
    rdi 22
    return address

    View Slide

  118. push rsi
    pop rax
    mov rdi, rax
    def baz(a, b): a = b
    imm rax, 0
    push rax
    pop rax
    ret
    >>> baz(11, 22)
    rax 0
    rsi 22
    rdi 22

    View Slide

  119. push rsi
    pop rax
    mov rdi, rax
    def baz(a, b): a = b
    imm rax, 0
    push rax
    pop rax
    ret
    >>> baz(11, 22)
    rax 0
    rsi 22
    rdi 22

    View Slide

  120. push rsi
    pop rax
    mov rdi, rax
    def baz(a, b): a = b
    imm rax, 0
    push rax
    pop rax
    ret
    >>> baz(11, 22)
    rax 0
    rsi 22
    rdi 22
    LOAD_FAST push
    STORE_FAST
    pop rax
    mov , rax
    LOAD_CONST
    imm rax
    push rax
    BINARY_MULTIPLY
    pop rax
    pop rbx
    imul rax, rbx
    push rax
    BINARY_ADD /
    INPLACE_ADD
    pop rax
    pop rbx
    add rax, rbx
    push rax
    BINARY_SUBTRACT /
    INPLACE_SUBTRACT
    pop rbx
    pop rax
    sub rax, rbx
    push rax
    UNARY_NEGATIVE
    pop rax
    neg rax
    push rax
    RETURN_VALUE
    pop rax
    ret

    View Slide

  121. push rsi
    pop rax
    mov rdi, rax
    def baz(a, b): a = b
    imm rax, 0
    push rax
    pop rax
    ret
    >>> baz(11, 22)
    rax 0
    rsi 22
    rdi 22
    LOAD_FAST push
    STORE_FAST
    pop rax
    mov , rax
    LOAD_CONST
    imm rax
    push rax
    BINARY_MULTIPLY
    pop rax
    pop rbx
    imul rax, rbx
    push rax
    BINARY_ADD /
    INPLACE_ADD
    pop rax
    pop rbx
    add rax, rbx
    push rax
    BINARY_SUBTRACT /
    INPLACE_SUBTRACT
    pop rbx
    pop rax
    sub rax, rbx
    push rax
    UNARY_NEGATIVE
    pop rax
    neg rax
    push rax
    RETURN_VALUE
    pop rax
    ret

    View Slide

  122. push rsi
    pop rax
    mov rdi, rax
    def baz(a, b): a = b
    imm rax, 0
    push rax
    pop rax
    ret
    >>> baz(11, 22)
    rax 0
    rsi 22
    rdi 22
    LOAD_FAST push
    STORE_FAST
    pop rax
    mov , rax
    LOAD_CONST
    imm rax
    push rax
    BINARY_MULTIPLY
    pop rax
    pop rbx
    imul rax, rbx
    push rax
    BINARY_ADD /
    INPLACE_ADD
    pop rax
    pop rbx
    add rax, rbx
    push rax
    BINARY_SUBTRACT /
    INPLACE_SUBTRACT
    pop rbx
    pop rax
    sub rax, rbx
    push rax
    UNARY_NEGATIVE
    pop rax
    neg rax
    push rax
    RETURN_VALUE
    pop rax
    ret

    View Slide

  123. push rsi
    pop rax
    mov rdi, rax
    def baz(a, b): a = b
    imm rax, 0
    push rax
    pop rax
    ret
    >>> baz(11, 22)
    rax 0
    rsi 22
    rdi 22
    LOAD_FAST push
    STORE_FAST
    pop rax
    mov , rax
    LOAD_CONST
    imm rax
    push rax
    BINARY_MULTIPLY
    pop rax
    pop rbx
    imul rax, rbx
    push rax
    BINARY_ADD /
    INPLACE_ADD
    pop rax
    pop rbx
    add rax, rbx
    push rax
    BINARY_SUBTRACT /
    INPLACE_SUBTRACT
    pop rbx
    pop rax
    sub rax, rbx
    push rax
    UNARY_NEGATIVE
    pop rax
    neg rax
    push rax
    RETURN_VALUE
    pop rax
    ret

    View Slide

  124. push rsi
    pop rax
    mov rdi, rax
    def baz(a, b): a = b
    imm rax, 0
    push rax
    pop rax
    ret
    >>> baz(11, 22)
    rax 0
    rsi 22
    rdi 22
    LOAD_FAST push
    STORE_FAST
    pop rax
    mov , rax
    LOAD_CONST
    imm rax
    push rax
    BINARY_MULTIPLY
    pop rax
    pop rbx
    imul rax, rbx
    push rax
    BINARY_ADD /
    INPLACE_ADD
    pop rax
    pop rbx
    add rax, rbx
    push rax
    BINARY_SUBTRACT /
    INPLACE_SUBTRACT
    pop rbx
    pop rax
    sub rax, rbx
    push rax
    UNARY_NEGATIVE
    pop rax
    neg rax
    push rax
    RETURN_VALUE
    pop rax
    ret

    View Slide

  125. push rsi
    pop rax
    mov rdi, rax
    def baz(a, b): a = b
    imm rax, 0
    push rax
    pop rax
    ret
    >>> baz(11, 22)
    rax 0
    rsi 22
    rdi 22
    LOAD_FAST push
    STORE_FAST
    pop rax
    mov , rax
    LOAD_CONST
    imm rax
    push rax
    BINARY_MULTIPLY
    pop rax
    pop rbx
    imul rax, rbx
    push rax
    BINARY_ADD /
    INPLACE_ADD
    pop rax
    pop rbx
    add rax, rbx
    push rax
    BINARY_SUBTRACT /
    INPLACE_SUBTRACT
    pop rbx
    pop rax
    sub rax, rbx
    push rax
    UNARY_NEGATIVE
    pop rax
    neg rax
    push rax
    RETURN_VALUE
    pop rax
    ret

    View Slide

  126. push rsi
    pop rax
    mov rdi, rax
    def baz(a, b): a = b
    imm rax, 0
    push rax
    pop rax
    ret
    >>> baz(11, 22)
    rax 0
    rsi 22
    rdi 22
    LOAD_FAST push
    STORE_FAST
    pop rax
    mov , rax
    LOAD_CONST
    imm rax
    push rax
    BINARY_MULTIPLY
    pop rax
    pop rbx
    imul rax, rbx
    push rax
    BINARY_ADD /
    INPLACE_ADD
    pop rax
    pop rbx
    add rax, rbx
    push rax
    BINARY_SUBTRACT /
    INPLACE_SUBTRACT
    pop rbx
    pop rax
    sub rax, rbx
    push rax
    UNARY_NEGATIVE
    pop rax
    neg rax
    push rax
    RETURN_VALUE
    pop rax
    ret

    View Slide

  127. push rsi
    pop rax
    mov rdi, rax
    def baz(a, b): a = b
    imm rax, 0
    push rax
    pop rax
    ret
    >>> baz(11, 22)
    rax 0
    rsi 22
    rdi 22
    LOAD_FAST push
    STORE_FAST
    pop rax
    mov , rax
    LOAD_CONST
    imm rax
    push rax
    BINARY_MULTIPLY
    pop rax
    pop rbx
    imul rax, rbx
    push rax
    BINARY_ADD /
    INPLACE_ADD
    pop rax
    pop rbx
    add rax, rbx
    push rax
    BINARY_SUBTRACT /
    INPLACE_SUBTRACT
    pop rbx
    pop rax
    sub rax, rbx
    push rax
    UNARY_NEGATIVE
    pop rax
    neg rax
    push rax
    RETURN_VALUE
    pop rax
    ret

    View Slide

  128. push rsi
    pop rax
    mov rdi, rax
    def baz(a, b): a = b
    imm rax, 0
    push rax
    pop rax
    ret
    >>> baz(11, 22)
    rax 0
    rsi 22
    rdi 22
    LOAD_FAST push
    STORE_FAST
    pop rax
    mov , rax
    LOAD_CONST
    imm rax
    push rax
    BINARY_MULTIPLY
    pop rax
    pop rbx
    imul rax, rbx
    push rax
    BINARY_ADD /
    INPLACE_ADD
    pop rax
    pop rbx
    add rax, rbx
    push rax
    BINARY_SUBTRACT /
    INPLACE_SUBTRACT
    pop rbx
    pop rax
    sub rax, rbx
    push rax
    UNARY_NEGATIVE
    pop rax
    neg rax
    push rax
    RETURN_VALUE
    pop rax
    ret

    View Slide

  129. push rsi
    pop rax
    mov rdi, rax
    def baz(a, b): a = b
    imm rax, 0
    push rax
    pop rax
    ret
    >>> baz(11, 22)
    rax 0
    rsi 22
    rdi 22

    View Slide

  130. push rsi
    pop rax
    mov rdi, rax
    imm rax, 0
    push rax
    pop rax
    ret

    View Slide

  131. push rsi
    pop rax
    mov rdi, rax
    imm rax, 0
    push rax
    pop rax
    ret

    View Slide

  132. push rsi
    pop rax
    mov rdi, rax
    imm rax, 0
    push rax
    pop rax
    ret

    View Slide

  133. push rsi
    pop rax
    mov rdi, rax
    imm rax, 0
    push rax
    pop rax
    ret

    View Slide

  134. push rsi
    pop rax
    mov rdi, rax
    imm rax, 0
    push rax
    pop rax
    ret
    mov rax, rsi

    View Slide

  135. push rsi
    pop rax
    mov rdi, rax
    imm rax, 0
    push rax
    pop rax
    ret
    mov rax, rsi

    View Slide

  136. push rsi
    pop rax
    mov rdi, rax
    imm rax, 0
    push rax
    pop rax
    ret
    mov rax, rsi
    Peephole Optimization

    View Slide

  137. push rsi
    pop rax
    mov rdi, rax
    imm rax, 0
    push rax
    pop rax
    ret
    mov rax, rsi
    Peephole Optimization

    View Slide

  138. push rsi
    pop rax
    mov rdi, rax
    imm rax, 0
    push rax
    pop rax
    ret
    mov rax, rsi
    mov rdi, rax
    Peephole Optimization

    View Slide

  139. push rsi
    pop rax
    mov rdi, rax
    imm rax, 0
    push rax
    pop rax
    ret
    mov rax, rsi
    mov rdi, rax
    Peephole Optimization

    View Slide

  140. push rsi
    pop rax
    mov rdi, rax
    imm rax, 0
    push rax
    pop rax
    ret
    mov rax, rsi
    mov rdi, rax
    imm rax, 0
    Peephole Optimization

    View Slide

  141. push rsi
    pop rax
    mov rdi, rax
    imm rax, 0
    push rax
    pop rax
    ret
    mov rax, rsi
    mov rdi, rax
    imm rax, 0
    Peephole Optimization

    View Slide

  142. push rsi
    pop rax
    mov rdi, rax
    imm rax, 0
    push rax
    pop rax
    ret
    mov rax, rsi
    mov rdi, rax
    imm rax, 0
    Peephole Optimization

    View Slide

  143. push rsi
    pop rax
    mov rdi, rax
    imm rax, 0
    push rax
    pop rax
    ret
    mov rax, rsi
    mov rdi, rax
    imm rax, 0
    ret
    Peephole Optimization

    View Slide

  144. mov rax, rsi
    mov rdi, rax
    imm rax, 0
    ret
    Peephole Optimization

    View Slide

  145. mov rdi, rsi
    imm rax, 0
    ret
    Peephole Optimization

    View Slide

  146. mov rdi, rsi
    imm rax, 0
    ret
    Peephole Optimization

    View Slide

  147. mov rdi, rsi
    imm rax, 0
    ret
    Peephole Optimization

    View Slide

  148. View Slide

  149. class Assembler(object):
    # …
    def emit(self, *args):
    for code in args:
    self.block[self.index] = code
    self.index += 1
    Machine Code Generation

    View Slide

  150. class Assembler(object):
    # …
    def emit(self, *args):
    for code in args:
    self.block[self.index] = code
    self.index += 1
    Machine Code Generation
    def push(self, a, dummy):
    self.emit(0x50 | self.registers(a))

    View Slide

  151. class Assembler(object):
    # …
    def emit(self, *args):
    for code in args:
    self.block[self.index] = code
    self.index += 1
    Machine Code Generation
    def push(self, a, dummy):
    self.emit(0x50 | self.registers(a))

    View Slide

  152. class Assembler(object):
    # …
    def emit(self, *args):
    for code in args:
    self.block[self.index] = code
    self.index += 1
    Machine Code Generation
    def push(self, a, dummy):
    self.emit(0x50 | self.registers(a))
    def registers(self, a):
    order = (“rax”, “rcx”, …)
    return order.index(a)

    View Slide

  153. class Assembler(object):
    # …
    def emit(self, *args):
    for code in args:
    self.block[self.index] = code
    self.index += 1
    Machine Code Generation
    def push(self, a, dummy):
    self.emit(0x50 | self.registers(a))
    def registers(self, a):
    order = (“rax”, “rcx”, …)
    return order.index(a)

    View Slide

  154. Machine Code Generation

    View Slide

  155. def add(self, a, b):
    self.emit(0x48,
    0x01,
    0xc0 | self.registers(a, b))
    def immediate(self, a, constant):
    self.emit(0x48,
    0xb8 | self.registers(a),
    *self.little_endian(constant))
    Machine Code Generation

    View Slide

  156. def add(self, a, b):
    self.emit(0x48,
    0x01,
    0xc0 | self.registers(a, b))
    def immediate(self, a, constant):
    self.emit(0x48,
    0xb8 | self.registers(a),
    *self.little_endian(constant))
    Machine Code Generation

    View Slide

  157. def add(self, a, b):
    self.emit(0x48,
    0x01,
    0xc0 | self.registers(a, b))
    def immediate(self, a, constant):
    self.emit(0x48,
    0xb8 | self.registers(a),
    *self.little_endian(constant))
    Machine Code Generation

    View Slide

  158. def add(self, a, b):
    self.emit(0x48,
    0x01,
    0xc0 | self.registers(a, b))
    def immediate(self, a, constant):
    self.emit(0x48,
    0xb8 | self.registers(a),
    *self.little_endian(constant))
    Machine Code Generation

    View Slide

  159. def add(self, a, b):
    self.emit(0x48,
    0x01,
    0xc0 | self.registers(a, b))
    def immediate(self, a, constant):
    self.emit(0x48,
    0xb8 | self.registers(a),
    *self.little_endian(constant))
    Machine Code Generation

    View Slide

  160. Machine Code Generation

    View Slide

  161. def ret(self, a, b):
    self.emit(0xc3)
    def push(self, a, _):
    self.emit(0x50 | self.registers(a))
    def pop(self, a, _):
    self.emit(0x58 | self.registers(a))
    def imul(self, a, b):
    self.emit(0x48, 0x0f, 0xaf, 0xc0 | self.registers(a, b))
    def add(self, a, b):
    self.emit(0x48, 0x01, 0xc0 | self.registers(b, a))
    def sub(self, a, b):
    self.emit(0x48, 0x29, 0xc0 | self.registers(b, a))
    def neg(self, a, _):
    self.emit(0x48, 0xf7, 0xd8 | self.register(a))
    def mov(self, a, b):
    self.emit(0x48, 0x89, 0xc0 | self.registers(b, a))
    def immediate(self, a, number):
    self.emit(0x48, 0xb8 | self.registers(a), *self.little_endian(number))
    Machine Code Generation

    View Slide

  162. View Slide

  163. Memory Management

    View Slide

  164. Memory Management
    • Allocate one page of memory using mmap

    View Slide

  165. Memory Management
    • Allocate one page of memory using mmap
    • Use mprotect to change from R+W to R+X

    View Slide

  166. Memory Management

    View Slide

  167. Memory Management
    def create_block(size):
    ptr = mmap(0, size, MMAP.PROT_WRIT
    | MMAP.PROT_READ
    | MMAP.MAP_ANONYMOUS, 0, 0)
    if ptr == MAP_FAILED:
    raise RuntimeError(…)
    return ptr

    View Slide

  168. Memory Management
    def create_block(size):
    ptr = mmap(0, size, MMAP.PROT_WRIT
    | MMAP.PROT_READ
    | MMAP.MAP_ANONYMOUS, 0, 0)
    if ptr == MAP_FAILED:
    raise RuntimeError(…)
    return ptr

    View Slide

  169. Memory Management
    def create_block(size):
    ptr = mmap(0, size, MMAP.PROT_WRIT
    | MMAP.PROT_READ
    | MMAP.MAP_ANONYMOUS, 0, 0)
    if ptr == MAP_FAILED:
    raise RuntimeError(…)
    return ptr

    View Slide

  170. Memory Management
    def create_block(size):
    ptr = mmap(0, size, MMAP.PROT_WRIT
    | MMAP.PROT_READ
    | MMAP.MAP_ANONYMOUS, 0, 0)
    if ptr == MAP_FAILED:
    raise RuntimeError(…)
    return ptr

    View Slide

  171. Memory Management

    View Slide

  172. Memory Management
    def make_executable(block, size):
    if mprotect(block, size, MMAP.PROT_READ
    | MMAP.PROT_EXEC) != 0:
    raise RuntimeError(…)

    View Slide

  173. Memory Management
    def make_executable(block, size):
    if mprotect(block, size, MMAP.PROT_READ
    | MMAP.PROT_EXEC) != 0:
    raise RuntimeError(…)

    View Slide

  174. View Slide

  175. FFI

    View Slide

  176. FFI
    • Use ctypes as FFI

    View Slide

  177. FFI
    • Use ctypes as FFI
    • Grab machine code address, create signature and bind to
    function.

    View Slide

  178. FFI
    • Use ctypes as FFI
    • Grab machine code address, create signature and bind to
    function.
    • Ctypes also seem to save and restore registers

    View Slide

  179. FFI
    • Use ctypes as FFI
    • Grab machine code address, create signature and bind to
    function.
    • Ctypes also seem to save and restore registers
    • More overhead: Marshalling arguments

    View Slide

  180. FFI

    View Slide

  181. FFI
    signature = ctypes.CFUNCTYPE([ctypes.c_int64,
    ctypes.c_int64])
    signature.restype = ctypes.c_int64
    func = signature(assembler.address)
    print(func(1, 2))

    View Slide

  182. FFI
    signature = ctypes.CFUNCTYPE([ctypes.c_int64,
    ctypes.c_int64])
    signature.restype = ctypes.c_int64
    func = signature(assembler.address)
    print(func(1, 2))

    View Slide

  183. FFI
    signature = ctypes.CFUNCTYPE([ctypes.c_int64,
    ctypes.c_int64])
    signature.restype = ctypes.c_int64
    func = signature(assembler.address)
    print(func(1, 2))

    View Slide

  184. FFI
    signature = ctypes.CFUNCTYPE([ctypes.c_int64,
    ctypes.c_int64])
    signature.restype = ctypes.c_int64
    func = signature(assembler.address)
    print(func(1, 2))

    View Slide

  185. FFI
    signature = ctypes.CFUNCTYPE([ctypes.c_int64,
    ctypes.c_int64])
    signature.restype = ctypes.c_int64
    func = signature(assembler.address)
    print(func(1, 2))

    View Slide

  186. View Slide

  187. The “@jit” Decorator
    def jit(func):
    def front(*args, **kw):
    if not hasattr(front, “func”):
    front.func = compile_native(func)
    return front.func(*args, **kw)
    return front
    @jit
    def foo(a, b):
    return a*a - b*b

    View Slide

  188. The “@jit” Decorator
    def jit(func):
    def front(*args, **kw):
    if not hasattr(front, “func”):
    front.func = compile_native(func)
    return front.func(*args, **kw)
    return front
    @jit
    def foo(a, b):
    return a*a - b*b

    View Slide

  189. Demo

    View Slide

  190. Things we haven’t touched on

    View Slide

  191. Things we haven’t touched on
    • Code generation: SSA, branching

    View Slide

  192. Things we haven’t touched on
    • Code generation: SSA, branching
    • Optimization: TAC, register allocation

    View Slide

  193. Things we haven’t touched on
    • Code generation: SSA, branching
    • Optimization: TAC, register allocation
    • Compatibility: Python object system

    View Slide

  194. Things we haven’t touched on
    • Code generation: SSA, branching
    • Optimization: TAC, register allocation
    • Compatibility: Python object system
    • Performance gotchas

    View Slide

  195. Pointers
    • Full details on blog post at

    https://csl.name/post/python-compiler/
    • PeachPy
    • Truffle + Graal, LLVM, libjit, etc.
    • NUMBA

    View Slide

  196. Thanks y’all!

    View Slide