Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Writing a basic x86-64 JIT compiler from scratc...

Writing a basic x86-64 JIT compiler from scratch in stock Python

Shows how to JIT compile simple Python functions to native x86-64 machine code at runtime. Everything is done from scratch, using nothing but the built-in Python modules.

Christian Stigen Larsen

January 23, 2018
Tweet

More Decks by Christian Stigen Larsen

Other Decks in Technology

Transcript

  1. Writing a basic x86-64 JIT compiler from scratch in stock

    Python Christian Stigen Larsen, 2018-01-23 https://csl.name
  2. from jitcompiler import * @jit def foo(a, b): return a*a

    - b*b >>> foo(2, 3) --- JIT-compiling <function foo at 0x100c28c08>
  3. from jitcompiler import * @jit def foo(a, b): return a*a

    - b*b >>> foo(2, 3) --- JIT-compiling <function foo at 0x100c28c08> -5
  4. from jitcompiler import * @jit def foo(a, b): return a*a

    - b*b >>> foo(2, 3) --- JIT-compiling <function foo at 0x100c28c08> -5 >>> foo(3, 4)
  5. from jitcompiler import * @jit def foo(a, b): return a*a

    - b*b >>> foo(2, 3) --- JIT-compiling <function foo at 0x100c28c08> -5 >>> foo(3, 4) -7
  6. >>> print(disassemble(foo)) 0x100b1d000 48 89 fb mov rbx, rdi 0x100b1d003

    48 89 f8 mov rax, rdi 0x100b1d006 48 0f af c3 imul rax, rbx 0x100b1d00a 50 push rax 0x100b1d00b 48 89 f3 mov rbx, rsi 0x100b1d00e 48 89 f0 mov rax, rsi 0x100b1d011 48 0f af c3 imul rax, rbx 0x100b1d015 48 89 c3 mov rbx, rax 0x100b1d018 58 pop rax 0x100b1d019 48 29 d8 sub rax, rbx 0x100b1d01c c3 ret
  7. >>> print(disassemble(foo)) 0x100b1d000 48 89 fb mov rbx, rdi 0x100b1d003

    48 89 f8 mov rax, rdi 0x100b1d006 48 0f af c3 imul rax, rbx 0x100b1d00a 50 push rax 0x100b1d00b 48 89 f3 mov rbx, rsi 0x100b1d00e 48 89 f0 mov rax, rsi 0x100b1d011 48 0f af c3 imul rax, rbx 0x100b1d015 48 89 c3 mov rbx, rax 0x100b1d018 58 pop rax 0x100b1d019 48 29 d8 sub rax, rbx 0x100b1d01c c3 ret
  8. Python Function Strategy Python Bound Name Memory Page Bytes …

    Read + Write Bytecode Bytecode Bytecode IR Optimized IR x86-64 Machine Code def foo
  9. Python Function Strategy Python Bound Name Memory Page Bytes …

    Read + Write Bytecode x86-64 Machine Code def foo
  10. Python Function Strategy Python Bound Name Memory Page Bytes …

    Read + Write Bytecode x86-64 Machine Code Read + Execute def foo
  11. Strategy Python Bound Name Memory Page Bytes … Read +

    Write x86-64 Machine Code Read + Execute FFI def foo
  12. Python Bytecode def foo(a, b): return a*a - b*b >>>

    import dis >>> dis.dis(foo) 2 0 LOAD_FAST 0 (a) 3 LOAD_FAST 0 (a) 6 BINARY_MULTIPLY 7 LOAD_FAST 1 (b) 10 LOAD_FAST 1 (b) 13 BINARY_MULTIPLY 14 BINARY_SUBTRACT 15 RETURN_VALUE
  13. Python Bytecode def foo(a, b): return a*a - b*b >>>

    import dis >>> dis.dis(foo) 2 0 LOAD_FAST 0 (a) 3 LOAD_FAST 0 (a) 6 BINARY_MULTIPLY 7 LOAD_FAST 1 (b) 10 LOAD_FAST 1 (b) 13 BINARY_MULTIPLY 14 BINARY_SUBTRACT 15 RETURN_VALUE
  14. Python Bytecode def foo(a, b): return a*a - b*b >>>

    import dis >>> dis.dis(foo) 2 0 LOAD_FAST 0 (a) 3 LOAD_FAST 0 (a) 6 BINARY_MULTIPLY 7 LOAD_FAST 1 (b) 10 LOAD_FAST 1 (b) 13 BINARY_MULTIPLY 14 BINARY_SUBTRACT 15 RETURN_VALUE
  15. Python Bytecode def foo(a, b): return a*a - b*b >>>

    import dis >>> dis.dis(foo) 2 0 LOAD_FAST 0 (a) 3 LOAD_FAST 0 (a) 6 BINARY_MULTIPLY 7 LOAD_FAST 1 (b) 10 LOAD_FAST 1 (b) 13 BINARY_MULTIPLY 14 BINARY_SUBTRACT 15 RETURN_VALUE
  16. Python Bytecode def foo(a, b): return a*a - b*b >>>

    import dis >>> dis.dis(foo) 2 0 LOAD_FAST 0 (a) 3 LOAD_FAST 0 (a) 6 BINARY_MULTIPLY 7 LOAD_FAST 1 (b) 10 LOAD_FAST 1 (b) 13 BINARY_MULTIPLY 14 BINARY_SUBTRACT 15 RETURN_VALUE
  17. Python Bytecode def foo(a, b): return a*a - b*b >>>

    import dis >>> dis.dis(foo) 2 0 LOAD_FAST 0 (a) 3 LOAD_FAST 0 (a) 6 BINARY_MULTIPLY 7 LOAD_FAST 1 (b) 10 LOAD_FAST 1 (b) 13 BINARY_MULTIPLY 14 BINARY_SUBTRACT 15 RETURN_VALUE
  18. Python Bytecode def foo(a, b): return a*a - b*b >>>

    import dis >>> dis.dis(foo) 2 0 LOAD_FAST 0 (a) 3 LOAD_FAST 0 (a) 6 BINARY_MULTIPLY 7 LOAD_FAST 1 (b) 10 LOAD_FAST 1 (b) 13 BINARY_MULTIPLY 14 BINARY_SUBTRACT 15 RETURN_VALUE
  19. Python Bytecode def foo(a, b): return a*a - b*b >>>

    import dis >>> dis.dis(foo) 2 0 LOAD_FAST 0 (a) 3 LOAD_FAST 0 (a) 6 BINARY_MULTIPLY 7 LOAD_FAST 1 (b) 10 LOAD_FAST 1 (b) 13 BINARY_MULTIPLY 14 BINARY_SUBTRACT 15 RETURN_VALUE
  20. 0 LOAD_FAST 0 (a) 3 LOAD_FAST 0 (a) 6 BINARY_MULTIPLY

    7 LOAD_FAST 1 (b) 10 LOAD_FAST 1 (b) 13 BINARY_MULTIPLY 14 BINARY_SUBTRACT 15 RETURN_VALUE
  21. 0 LOAD_FAST 0 (a) 3 LOAD_FAST 0 (a) 6 BINARY_MULTIPLY

    7 LOAD_FAST 1 (b) 10 LOAD_FAST 1 (b) 13 BINARY_MULTIPLY 14 BINARY_SUBTRACT 15 RETURN_VALUE
  22. 0 LOAD_FAST 0 (a) 3 LOAD_FAST 0 (a) 6 BINARY_MULTIPLY

    7 LOAD_FAST 1 (b) 10 LOAD_FAST 1 (b) 13 BINARY_MULTIPLY 14 BINARY_SUBTRACT 15 RETURN_VALUE a
  23. 0 LOAD_FAST 0 (a) 3 LOAD_FAST 0 (a) 6 BINARY_MULTIPLY

    7 LOAD_FAST 1 (b) 10 LOAD_FAST 1 (b) 13 BINARY_MULTIPLY 14 BINARY_SUBTRACT 15 RETURN_VALUE a a
  24. 0 LOAD_FAST 0 (a) 3 LOAD_FAST 0 (a) 6 BINARY_MULTIPLY

    7 LOAD_FAST 1 (b) 10 LOAD_FAST 1 (b) 13 BINARY_MULTIPLY 14 BINARY_SUBTRACT 15 RETURN_VALUE a*a
  25. 0 LOAD_FAST 0 (a) 3 LOAD_FAST 0 (a) 6 BINARY_MULTIPLY

    7 LOAD_FAST 1 (b) 10 LOAD_FAST 1 (b) 13 BINARY_MULTIPLY 14 BINARY_SUBTRACT 15 RETURN_VALUE a*a b
  26. 0 LOAD_FAST 0 (a) 3 LOAD_FAST 0 (a) 6 BINARY_MULTIPLY

    7 LOAD_FAST 1 (b) 10 LOAD_FAST 1 (b) 13 BINARY_MULTIPLY 14 BINARY_SUBTRACT 15 RETURN_VALUE a*a b b
  27. 0 LOAD_FAST 0 (a) 3 LOAD_FAST 0 (a) 6 BINARY_MULTIPLY

    7 LOAD_FAST 1 (b) 10 LOAD_FAST 1 (b) 13 BINARY_MULTIPLY 14 BINARY_SUBTRACT 15 RETURN_VALUE a*a b*b
  28. 0 LOAD_FAST 0 (a) 3 LOAD_FAST 0 (a) 6 BINARY_MULTIPLY

    7 LOAD_FAST 1 (b) 10 LOAD_FAST 1 (b) 13 BINARY_MULTIPLY 14 BINARY_SUBTRACT 15 RETURN_VALUE a*a-b*b
  29. 0 LOAD_FAST 0 (a) 3 LOAD_FAST 0 (a) 6 BINARY_MULTIPLY

    7 LOAD_FAST 1 (b) 10 LOAD_FAST 1 (b) 13 BINARY_MULTIPLY 14 BINARY_SUBTRACT 15 RETURN_VALUE a*a-b*b
  30. 0 LOAD_FAST 0 (a) >>> foo.__code__.co_code ‘|\x00\x00|\x00\x00\x14|\x01\x00|\x01\x00\x14\x185’ >>> bytecode =

    map(ord, foo.__code__.co_code) >>> opcode = bytecode[0] >>> opcode 124 >>> dis.opname[opcode]
  31. 0 LOAD_FAST 0 (a) >>> foo.__code__.co_code ‘|\x00\x00|\x00\x00\x14|\x01\x00|\x01\x00\x14\x185’ >>> bytecode =

    map(ord, foo.__code__.co_code) >>> opcode = bytecode[0] >>> opcode 124 >>> dis.opname[opcode] ‘LOAD_FAST’
  32. 0 LOAD_FAST 0 (a) >>> foo.__code__.co_code ‘|\x00\x00|\x00\x00\x14|\x01\x00|\x01\x00\x14\x185’ >>> bytecode =

    map(ord, foo.__code__.co_code) >>> opcode = bytecode[0] >>> opcode 124 >>> dis.opname[opcode] ‘LOAD_FAST’ >>> arg = bytecode[1] | bytecode[2] << 8
  33. 0 LOAD_FAST 0 (a) >>> foo.__code__.co_code ‘|\x00\x00|\x00\x00\x14|\x01\x00|\x01\x00\x14\x185’ >>> bytecode =

    map(ord, foo.__code__.co_code) >>> opcode = bytecode[0] >>> opcode 124 >>> dis.opname[opcode] ‘LOAD_FAST’ >>> arg = bytecode[1] | bytecode[2] << 8 >>> arg 0
  34. 0 LOAD_FAST 0 (a) >>> foo.__code__.co_code ‘|\x00\x00|\x00\x00\x14|\x01\x00|\x01\x00\x14\x185’ >>> bytecode =

    map(ord, foo.__code__.co_code) >>> opcode = bytecode[0] >>> opcode 124 >>> dis.opname[opcode] ‘LOAD_FAST’ >>> arg = bytecode[1] | bytecode[2] << 8 >>> arg 0 >>> foo.__code__.co_varnames[arg] ‘a’
  35. >>> dis.dis(bar) 1 0 LOAD_CONST 1 (123) 3 RETURN_VALUE >>>

    def bar(): return 123 >>> bar.__code__.co_consts[1] 123
  36. >>> dis.dis(bar) 1 0 LOAD_CONST 1 (123) 3 RETURN_VALUE >>>

    def bar(): return 123 >>> bar.__code__.co_consts[1] 123
  37. class Compiler(object): # … def fetch(self): byte = self.bytecode[self.index] self.index

    += 1 return byte def decode(self): opcode = self.fetch() opname = dis.opname[opcode]
  38. class Compiler(object): # … def fetch(self): byte = self.bytecode[self.index] self.index

    += 1 return byte def decode(self): opcode = self.fetch() opname = dis.opname[opcode] if takes_arg(opname): arg = self.fetch() | self.fetch() << 8 else: arg = None
  39. class Compiler(object): # … def fetch(self): byte = self.bytecode[self.index] self.index

    += 1 return byte def decode(self): opcode = self.fetch() opname = dis.opname[opcode] if takes_arg(opname): arg = self.fetch() | self.fetch() << 8 else: arg = None return opname, arg
  40. def compile(self): while self.index < len(self.bytecode): op, arg = self.decode()

    if op == “LOAD_FAST”: yield “push”, self.variable(arg), None elif …
  41. yield “push”, self.variable(arg), None IR & Assembly rax rbx rcx

    … rsi rdi … Registers return address Stack
  42. push yield “push”, self.variable(arg), None IR & Assembly rax rbx

    rcx … rsi rdi … Registers return address Stack
  43. push yield “push”, self.variable(arg), None IR & Assembly rax rbx

    rcx … rsi rdi … Registers return address Stack
  44. push yield “push”, self.variable(arg), None IR & Assembly rax rbx

    rcx … rsi rdi … Registers return address Stack
  45. push yield “push”, self.variable(arg), None IR & Assembly rax rbx

    rcx … rsi rdi … Registers return address Stack
  46. push yield “push”, self.variable(arg), None IR & Assembly rax rbx

    rcx … rsi rdi … Registers return address Stack def variable(self, index): passing_order = (“rdi”, “rsi”, “rdx”, “rcx”) return passing_order[index]
  47. push yield “push”, self.variable(arg), None IR & Assembly rax rbx

    rcx … rsi rdi … Registers return address Stack
  48. push yield “push”, self.variable(arg), None IR & Assembly rax rbx

    rcx … rsi rdi … Registers return address Stack rdi
  49. def baz(a, b): a = b LOAD_FAST 1 (b) STORE_FAST

    0 (a) LOAD_CONST 0 (None) RETURN_VALUE
  50. LOAD_FAST 1 def baz(a, b): a = b LOAD_FAST 1

    (b) STORE_FAST 0 (a) LOAD_CONST 0 (None) RETURN_VALUE
  51. LOAD_FAST 1 push rsi def baz(a, b): a = b

    LOAD_FAST 1 (b) STORE_FAST 0 (a) LOAD_CONST 0 (None) RETURN_VALUE
  52. LOAD_FAST 1 push rsi STORE_FAST 0 def baz(a, b): a

    = b LOAD_FAST 1 (b) STORE_FAST 0 (a) LOAD_CONST 0 (None) RETURN_VALUE
  53. LOAD_FAST 1 push rsi STORE_FAST 0 pop rax mov rdi,

    rax def baz(a, b): a = b LOAD_FAST 1 (b) STORE_FAST 0 (a) LOAD_CONST 0 (None) RETURN_VALUE
  54. LOAD_FAST 1 push rsi STORE_FAST 0 pop rax mov rdi,

    rax def baz(a, b): a = b LOAD_FAST 1 (b) STORE_FAST 0 (a) LOAD_CONST 0 (None) RETURN_VALUE LOAD_CONST 0
  55. LOAD_FAST 1 push rsi STORE_FAST 0 pop rax mov rdi,

    rax def baz(a, b): a = b LOAD_FAST 1 (b) STORE_FAST 0 (a) LOAD_CONST 0 (None) RETURN_VALUE LOAD_CONST 0 imm rax, 0 push rax
  56. LOAD_FAST 1 push rsi STORE_FAST 0 pop rax mov rdi,

    rax def baz(a, b): a = b LOAD_FAST 1 (b) STORE_FAST 0 (a) LOAD_CONST 0 (None) RETURN_VALUE LOAD_CONST 0 imm rax, 0 push rax RETURN_VALUE
  57. LOAD_FAST 1 push rsi STORE_FAST 0 pop rax mov rdi,

    rax def baz(a, b): a = b LOAD_FAST 1 (b) STORE_FAST 0 (a) LOAD_CONST 0 (None) RETURN_VALUE LOAD_CONST 0 imm rax, 0 push rax RETURN_VALUE pop rax ret
  58. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret
  59. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22)
  60. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax rsi rdi return address
  61. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax rsi 22 rdi 11 return address
  62. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax rsi 22 rdi 11 return address
  63. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax rsi 22 rdi 11 return address
  64. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax rsi 22 rdi 11 return address 22
  65. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax rsi 22 rdi 11 return address 22
  66. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 22 rsi 22 rdi 11 return address
  67. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 22 rsi 22 rdi 11 return address
  68. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 22 rsi 22 rdi 22 return address
  69. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 22 rsi 22 rdi 22 return address
  70. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 0 rsi 22 rdi 22 return address
  71. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 0 rsi 22 rdi 22 return address
  72. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 0 rsi 22 rdi 22 return address 0
  73. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 0 rsi 22 rdi 22 return address 0
  74. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 0 rsi 22 rdi 22 return address
  75. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 0 rsi 22 rdi 22 return address
  76. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 0 rsi 22 rdi 22
  77. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 0 rsi 22 rdi 22
  78. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 0 rsi 22 rdi 22 LOAD_FAST push <reg> STORE_FAST pop rax mov <reg>, rax LOAD_CONST imm rax push rax BINARY_MULTIPLY pop rax pop rbx imul rax, rbx push rax BINARY_ADD / INPLACE_ADD pop rax pop rbx add rax, rbx push rax BINARY_SUBTRACT / INPLACE_SUBTRACT pop rbx pop rax sub rax, rbx push rax UNARY_NEGATIVE pop rax neg rax push rax RETURN_VALUE pop rax ret
  79. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 0 rsi 22 rdi 22 LOAD_FAST push <reg> STORE_FAST pop rax mov <reg>, rax LOAD_CONST imm rax push rax BINARY_MULTIPLY pop rax pop rbx imul rax, rbx push rax BINARY_ADD / INPLACE_ADD pop rax pop rbx add rax, rbx push rax BINARY_SUBTRACT / INPLACE_SUBTRACT pop rbx pop rax sub rax, rbx push rax UNARY_NEGATIVE pop rax neg rax push rax RETURN_VALUE pop rax ret
  80. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 0 rsi 22 rdi 22 LOAD_FAST push <reg> STORE_FAST pop rax mov <reg>, rax LOAD_CONST imm rax push rax BINARY_MULTIPLY pop rax pop rbx imul rax, rbx push rax BINARY_ADD / INPLACE_ADD pop rax pop rbx add rax, rbx push rax BINARY_SUBTRACT / INPLACE_SUBTRACT pop rbx pop rax sub rax, rbx push rax UNARY_NEGATIVE pop rax neg rax push rax RETURN_VALUE pop rax ret
  81. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 0 rsi 22 rdi 22 LOAD_FAST push <reg> STORE_FAST pop rax mov <reg>, rax LOAD_CONST imm rax push rax BINARY_MULTIPLY pop rax pop rbx imul rax, rbx push rax BINARY_ADD / INPLACE_ADD pop rax pop rbx add rax, rbx push rax BINARY_SUBTRACT / INPLACE_SUBTRACT pop rbx pop rax sub rax, rbx push rax UNARY_NEGATIVE pop rax neg rax push rax RETURN_VALUE pop rax ret
  82. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 0 rsi 22 rdi 22 LOAD_FAST push <reg> STORE_FAST pop rax mov <reg>, rax LOAD_CONST imm rax push rax BINARY_MULTIPLY pop rax pop rbx imul rax, rbx push rax BINARY_ADD / INPLACE_ADD pop rax pop rbx add rax, rbx push rax BINARY_SUBTRACT / INPLACE_SUBTRACT pop rbx pop rax sub rax, rbx push rax UNARY_NEGATIVE pop rax neg rax push rax RETURN_VALUE pop rax ret
  83. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 0 rsi 22 rdi 22 LOAD_FAST push <reg> STORE_FAST pop rax mov <reg>, rax LOAD_CONST imm rax push rax BINARY_MULTIPLY pop rax pop rbx imul rax, rbx push rax BINARY_ADD / INPLACE_ADD pop rax pop rbx add rax, rbx push rax BINARY_SUBTRACT / INPLACE_SUBTRACT pop rbx pop rax sub rax, rbx push rax UNARY_NEGATIVE pop rax neg rax push rax RETURN_VALUE pop rax ret
  84. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 0 rsi 22 rdi 22 LOAD_FAST push <reg> STORE_FAST pop rax mov <reg>, rax LOAD_CONST imm rax push rax BINARY_MULTIPLY pop rax pop rbx imul rax, rbx push rax BINARY_ADD / INPLACE_ADD pop rax pop rbx add rax, rbx push rax BINARY_SUBTRACT / INPLACE_SUBTRACT pop rbx pop rax sub rax, rbx push rax UNARY_NEGATIVE pop rax neg rax push rax RETURN_VALUE pop rax ret
  85. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 0 rsi 22 rdi 22 LOAD_FAST push <reg> STORE_FAST pop rax mov <reg>, rax LOAD_CONST imm rax push rax BINARY_MULTIPLY pop rax pop rbx imul rax, rbx push rax BINARY_ADD / INPLACE_ADD pop rax pop rbx add rax, rbx push rax BINARY_SUBTRACT / INPLACE_SUBTRACT pop rbx pop rax sub rax, rbx push rax UNARY_NEGATIVE pop rax neg rax push rax RETURN_VALUE pop rax ret
  86. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 0 rsi 22 rdi 22 LOAD_FAST push <reg> STORE_FAST pop rax mov <reg>, rax LOAD_CONST imm rax push rax BINARY_MULTIPLY pop rax pop rbx imul rax, rbx push rax BINARY_ADD / INPLACE_ADD pop rax pop rbx add rax, rbx push rax BINARY_SUBTRACT / INPLACE_SUBTRACT pop rbx pop rax sub rax, rbx push rax UNARY_NEGATIVE pop rax neg rax push rax RETURN_VALUE pop rax ret
  87. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 0 rsi 22 rdi 22
  88. push rsi pop rax mov rdi, rax imm rax, 0

    push rax pop rax ret mov rax, rsi
  89. push rsi pop rax mov rdi, rax imm rax, 0

    push rax pop rax ret mov rax, rsi
  90. push rsi pop rax mov rdi, rax imm rax, 0

    push rax pop rax ret mov rax, rsi Peephole Optimization
  91. push rsi pop rax mov rdi, rax imm rax, 0

    push rax pop rax ret mov rax, rsi Peephole Optimization
  92. push rsi pop rax mov rdi, rax imm rax, 0

    push rax pop rax ret mov rax, rsi mov rdi, rax Peephole Optimization
  93. push rsi pop rax mov rdi, rax imm rax, 0

    push rax pop rax ret mov rax, rsi mov rdi, rax Peephole Optimization
  94. push rsi pop rax mov rdi, rax imm rax, 0

    push rax pop rax ret mov rax, rsi mov rdi, rax imm rax, 0 Peephole Optimization
  95. push rsi pop rax mov rdi, rax imm rax, 0

    push rax pop rax ret mov rax, rsi mov rdi, rax imm rax, 0 Peephole Optimization
  96. push rsi pop rax mov rdi, rax imm rax, 0

    push rax pop rax ret mov rax, rsi mov rdi, rax imm rax, 0 Peephole Optimization
  97. push rsi pop rax mov rdi, rax imm rax, 0

    push rax pop rax ret mov rax, rsi mov rdi, rax imm rax, 0 ret Peephole Optimization
  98. class Assembler(object): # … def emit(self, *args): for code in

    args: self.block[self.index] = code self.index += 1 Machine Code Generation
  99. class Assembler(object): # … def emit(self, *args): for code in

    args: self.block[self.index] = code self.index += 1 Machine Code Generation def push(self, a, dummy): self.emit(0x50 | self.registers(a))
  100. class Assembler(object): # … def emit(self, *args): for code in

    args: self.block[self.index] = code self.index += 1 Machine Code Generation def push(self, a, dummy): self.emit(0x50 | self.registers(a))
  101. class Assembler(object): # … def emit(self, *args): for code in

    args: self.block[self.index] = code self.index += 1 Machine Code Generation def push(self, a, dummy): self.emit(0x50 | self.registers(a)) def registers(self, a): order = (“rax”, “rcx”, …) return order.index(a)
  102. class Assembler(object): # … def emit(self, *args): for code in

    args: self.block[self.index] = code self.index += 1 Machine Code Generation def push(self, a, dummy): self.emit(0x50 | self.registers(a)) def registers(self, a): order = (“rax”, “rcx”, …) return order.index(a)
  103. def add(self, a, b): self.emit(0x48, 0x01, 0xc0 | self.registers(a, b))

    def immediate(self, a, constant): self.emit(0x48, 0xb8 | self.registers(a), *self.little_endian(constant)) Machine Code Generation
  104. def add(self, a, b): self.emit(0x48, 0x01, 0xc0 | self.registers(a, b))

    def immediate(self, a, constant): self.emit(0x48, 0xb8 | self.registers(a), *self.little_endian(constant)) Machine Code Generation
  105. def add(self, a, b): self.emit(0x48, 0x01, 0xc0 | self.registers(a, b))

    def immediate(self, a, constant): self.emit(0x48, 0xb8 | self.registers(a), *self.little_endian(constant)) Machine Code Generation
  106. def add(self, a, b): self.emit(0x48, 0x01, 0xc0 | self.registers(a, b))

    def immediate(self, a, constant): self.emit(0x48, 0xb8 | self.registers(a), *self.little_endian(constant)) Machine Code Generation
  107. def add(self, a, b): self.emit(0x48, 0x01, 0xc0 | self.registers(a, b))

    def immediate(self, a, constant): self.emit(0x48, 0xb8 | self.registers(a), *self.little_endian(constant)) Machine Code Generation
  108. def ret(self, a, b): self.emit(0xc3) def push(self, a, _): self.emit(0x50

    | self.registers(a)) def pop(self, a, _): self.emit(0x58 | self.registers(a)) def imul(self, a, b): self.emit(0x48, 0x0f, 0xaf, 0xc0 | self.registers(a, b)) def add(self, a, b): self.emit(0x48, 0x01, 0xc0 | self.registers(b, a)) def sub(self, a, b): self.emit(0x48, 0x29, 0xc0 | self.registers(b, a)) def neg(self, a, _): self.emit(0x48, 0xf7, 0xd8 | self.register(a)) def mov(self, a, b): self.emit(0x48, 0x89, 0xc0 | self.registers(b, a)) def immediate(self, a, number): self.emit(0x48, 0xb8 | self.registers(a), *self.little_endian(number)) Machine Code Generation
  109. Memory Management • Allocate one page of memory using mmap

    • Use mprotect to change from R+W to R+X
  110. Memory Management def create_block(size): ptr = mmap(0, size, MMAP.PROT_WRIT |

    MMAP.PROT_READ | MMAP.MAP_ANONYMOUS, 0, 0) if ptr == MAP_FAILED: raise RuntimeError(…) return ptr
  111. Memory Management def create_block(size): ptr = mmap(0, size, MMAP.PROT_WRIT |

    MMAP.PROT_READ | MMAP.MAP_ANONYMOUS, 0, 0) if ptr == MAP_FAILED: raise RuntimeError(…) return ptr
  112. Memory Management def create_block(size): ptr = mmap(0, size, MMAP.PROT_WRIT |

    MMAP.PROT_READ | MMAP.MAP_ANONYMOUS, 0, 0) if ptr == MAP_FAILED: raise RuntimeError(…) return ptr
  113. Memory Management def create_block(size): ptr = mmap(0, size, MMAP.PROT_WRIT |

    MMAP.PROT_READ | MMAP.MAP_ANONYMOUS, 0, 0) if ptr == MAP_FAILED: raise RuntimeError(…) return ptr
  114. FFI

  115. FFI • Use ctypes as FFI • Grab machine code

    address, create signature and bind to function.
  116. FFI • Use ctypes as FFI • Grab machine code

    address, create signature and bind to function. • Ctypes also seem to save and restore registers
  117. FFI • Use ctypes as FFI • Grab machine code

    address, create signature and bind to function. • Ctypes also seem to save and restore registers • More overhead: Marshalling arguments
  118. FFI

  119. The “@jit” Decorator def jit(func): def front(*args, **kw): if not

    hasattr(front, “func”): front.func = compile_native(func) return front.func(*args, **kw) return front @jit def foo(a, b): return a*a - b*b
  120. The “@jit” Decorator def jit(func): def front(*args, **kw): if not

    hasattr(front, “func”): front.func = compile_native(func) return front.func(*args, **kw) return front @jit def foo(a, b): return a*a - b*b
  121. Things we haven’t touched on • Code generation: SSA, branching

    • Optimization: TAC, register allocation
  122. Things we haven’t touched on • Code generation: SSA, branching

    • Optimization: TAC, register allocation • Compatibility: Python object system
  123. Things we haven’t touched on • Code generation: SSA, branching

    • Optimization: TAC, register allocation • Compatibility: Python object system • Performance gotchas
  124. Pointers • Full details on blog post at
 https://csl.name/post/python-compiler/ •

    PeachPy • Truffle + Graal, LLVM, libjit, etc. • NUMBA