Writing a basic x86-64 JIT compiler from scratch in stock Python

Writing a basic x86-64 JIT compiler from scratch in stock Python

Shows how to JIT compile simple Python functions to native x86-64 machine code at runtime. Everything is done from scratch, using nothing but the built-in Python modules.

62ec120256167ee34435f007becc2c13?s=128

Christian Stigen Larsen

January 23, 2018
Tweet

Transcript

  1. Writing a basic x86-64 JIT compiler from scratch in stock

    Python Christian Stigen Larsen, 2018-01-23 https://csl.name
  2. None
  3. from jitcompiler import jit @jit def foo(a, b): return a*a

    - b*b
  4. from jitcompiler import * @jit def foo(a, b): return a*a

    - b*b
  5. from jitcompiler import * @jit def foo(a, b): return a*a

    - b*b
  6. from jitcompiler import * @jit def foo(a, b): return a*a

    - b*b
  7. from jitcompiler import * @jit def foo(a, b): return a*a

    - b*b >>> foo(2, 3)
  8. from jitcompiler import * @jit def foo(a, b): return a*a

    - b*b >>> foo(2, 3) --- JIT-compiling <function foo at 0x100c28c08>
  9. from jitcompiler import * @jit def foo(a, b): return a*a

    - b*b >>> foo(2, 3) --- JIT-compiling <function foo at 0x100c28c08> -5
  10. from jitcompiler import * @jit def foo(a, b): return a*a

    - b*b >>> foo(2, 3) --- JIT-compiling <function foo at 0x100c28c08> -5 >>> foo(3, 4)
  11. from jitcompiler import * @jit def foo(a, b): return a*a

    - b*b >>> foo(2, 3) --- JIT-compiling <function foo at 0x100c28c08> -5 >>> foo(3, 4) -7
  12. None
  13. >>> print(disassemble(foo)) 0x100b1d000 48 89 fb mov rbx, rdi 0x100b1d003

    48 89 f8 mov rax, rdi 0x100b1d006 48 0f af c3 imul rax, rbx 0x100b1d00a 50 push rax 0x100b1d00b 48 89 f3 mov rbx, rsi 0x100b1d00e 48 89 f0 mov rax, rsi 0x100b1d011 48 0f af c3 imul rax, rbx 0x100b1d015 48 89 c3 mov rbx, rax 0x100b1d018 58 pop rax 0x100b1d019 48 29 d8 sub rax, rbx 0x100b1d01c c3 ret
  14. >>> print(disassemble(foo)) 0x100b1d000 48 89 fb mov rbx, rdi 0x100b1d003

    48 89 f8 mov rax, rdi 0x100b1d006 48 0f af c3 imul rax, rbx 0x100b1d00a 50 push rax 0x100b1d00b 48 89 f3 mov rbx, rsi 0x100b1d00e 48 89 f0 mov rax, rsi 0x100b1d011 48 0f af c3 imul rax, rbx 0x100b1d015 48 89 c3 mov rbx, rax 0x100b1d018 58 pop rax 0x100b1d019 48 29 d8 sub rax, rbx 0x100b1d01c c3 ret
  15. None
  16. Strategy Python Bound Name def foo

  17. Python Function Strategy Bytecode Python Bound Name def foo

  18. Python Function Strategy Python Bound Name Bytecode Bytecode def foo

  19. Python Function Strategy Python Bound Name Bytecode Bytecode Bytecode def

    foo
  20. Python Function Strategy Python Bound Name Bytecode Bytecode Bytecode IR

    def foo
  21. Python Function Strategy Python Bound Name Bytecode Bytecode Bytecode IR

    Optimized IR def foo
  22. Python Function Strategy Python Bound Name Bytecode Bytecode Bytecode IR

    Optimized IR x86-64 Machine Code def foo
  23. Python Function Strategy Python Bound Name Memory Page Bytes …

    Read + Write Bytecode Bytecode Bytecode IR Optimized IR x86-64 Machine Code def foo
  24. Python Function Strategy Python Bound Name Memory Page Bytes …

    Read + Write Bytecode x86-64 Machine Code def foo
  25. Python Function Strategy Python Bound Name Memory Page Bytes …

    Read + Write Bytecode x86-64 Machine Code Read + Execute def foo
  26. Strategy Python Bound Name Memory Page Bytes … Read +

    Write x86-64 Machine Code Read + Execute FFI def foo
  27. Intermission: Why JIT?

  28. Intermission: Why JIT? Because we can, and it’s awesome.

  29. None
  30. Python Bytecode def foo(a, b): return a*a - b*b

  31. Python Bytecode def foo(a, b): return a*a - b*b >>>

    import dis >>> dis.dis(foo)
  32. Python Bytecode def foo(a, b): return a*a - b*b >>>

    import dis >>> dis.dis(foo) 2 0 LOAD_FAST 0 (a) 3 LOAD_FAST 0 (a) 6 BINARY_MULTIPLY 7 LOAD_FAST 1 (b) 10 LOAD_FAST 1 (b) 13 BINARY_MULTIPLY 14 BINARY_SUBTRACT 15 RETURN_VALUE
  33. Python Bytecode def foo(a, b): return a*a - b*b >>>

    import dis >>> dis.dis(foo) 2 0 LOAD_FAST 0 (a) 3 LOAD_FAST 0 (a) 6 BINARY_MULTIPLY 7 LOAD_FAST 1 (b) 10 LOAD_FAST 1 (b) 13 BINARY_MULTIPLY 14 BINARY_SUBTRACT 15 RETURN_VALUE
  34. Python Bytecode def foo(a, b): return a*a - b*b >>>

    import dis >>> dis.dis(foo) 2 0 LOAD_FAST 0 (a) 3 LOAD_FAST 0 (a) 6 BINARY_MULTIPLY 7 LOAD_FAST 1 (b) 10 LOAD_FAST 1 (b) 13 BINARY_MULTIPLY 14 BINARY_SUBTRACT 15 RETURN_VALUE
  35. Python Bytecode def foo(a, b): return a*a - b*b >>>

    import dis >>> dis.dis(foo) 2 0 LOAD_FAST 0 (a) 3 LOAD_FAST 0 (a) 6 BINARY_MULTIPLY 7 LOAD_FAST 1 (b) 10 LOAD_FAST 1 (b) 13 BINARY_MULTIPLY 14 BINARY_SUBTRACT 15 RETURN_VALUE
  36. Python Bytecode def foo(a, b): return a*a - b*b >>>

    import dis >>> dis.dis(foo) 2 0 LOAD_FAST 0 (a) 3 LOAD_FAST 0 (a) 6 BINARY_MULTIPLY 7 LOAD_FAST 1 (b) 10 LOAD_FAST 1 (b) 13 BINARY_MULTIPLY 14 BINARY_SUBTRACT 15 RETURN_VALUE
  37. Python Bytecode def foo(a, b): return a*a - b*b >>>

    import dis >>> dis.dis(foo) 2 0 LOAD_FAST 0 (a) 3 LOAD_FAST 0 (a) 6 BINARY_MULTIPLY 7 LOAD_FAST 1 (b) 10 LOAD_FAST 1 (b) 13 BINARY_MULTIPLY 14 BINARY_SUBTRACT 15 RETURN_VALUE
  38. Python Bytecode def foo(a, b): return a*a - b*b >>>

    import dis >>> dis.dis(foo) 2 0 LOAD_FAST 0 (a) 3 LOAD_FAST 0 (a) 6 BINARY_MULTIPLY 7 LOAD_FAST 1 (b) 10 LOAD_FAST 1 (b) 13 BINARY_MULTIPLY 14 BINARY_SUBTRACT 15 RETURN_VALUE
  39. Python Bytecode def foo(a, b): return a*a - b*b >>>

    import dis >>> dis.dis(foo) 2 0 LOAD_FAST 0 (a) 3 LOAD_FAST 0 (a) 6 BINARY_MULTIPLY 7 LOAD_FAST 1 (b) 10 LOAD_FAST 1 (b) 13 BINARY_MULTIPLY 14 BINARY_SUBTRACT 15 RETURN_VALUE
  40. 0 LOAD_FAST 0 (a) 3 LOAD_FAST 0 (a) 6 BINARY_MULTIPLY

    7 LOAD_FAST 1 (b) 10 LOAD_FAST 1 (b) 13 BINARY_MULTIPLY 14 BINARY_SUBTRACT 15 RETURN_VALUE
  41. 0 LOAD_FAST 0 (a) 3 LOAD_FAST 0 (a) 6 BINARY_MULTIPLY

    7 LOAD_FAST 1 (b) 10 LOAD_FAST 1 (b) 13 BINARY_MULTIPLY 14 BINARY_SUBTRACT 15 RETURN_VALUE
  42. 0 LOAD_FAST 0 (a) 3 LOAD_FAST 0 (a) 6 BINARY_MULTIPLY

    7 LOAD_FAST 1 (b) 10 LOAD_FAST 1 (b) 13 BINARY_MULTIPLY 14 BINARY_SUBTRACT 15 RETURN_VALUE a
  43. 0 LOAD_FAST 0 (a) 3 LOAD_FAST 0 (a) 6 BINARY_MULTIPLY

    7 LOAD_FAST 1 (b) 10 LOAD_FAST 1 (b) 13 BINARY_MULTIPLY 14 BINARY_SUBTRACT 15 RETURN_VALUE a a
  44. 0 LOAD_FAST 0 (a) 3 LOAD_FAST 0 (a) 6 BINARY_MULTIPLY

    7 LOAD_FAST 1 (b) 10 LOAD_FAST 1 (b) 13 BINARY_MULTIPLY 14 BINARY_SUBTRACT 15 RETURN_VALUE a*a
  45. 0 LOAD_FAST 0 (a) 3 LOAD_FAST 0 (a) 6 BINARY_MULTIPLY

    7 LOAD_FAST 1 (b) 10 LOAD_FAST 1 (b) 13 BINARY_MULTIPLY 14 BINARY_SUBTRACT 15 RETURN_VALUE a*a b
  46. 0 LOAD_FAST 0 (a) 3 LOAD_FAST 0 (a) 6 BINARY_MULTIPLY

    7 LOAD_FAST 1 (b) 10 LOAD_FAST 1 (b) 13 BINARY_MULTIPLY 14 BINARY_SUBTRACT 15 RETURN_VALUE a*a b b
  47. 0 LOAD_FAST 0 (a) 3 LOAD_FAST 0 (a) 6 BINARY_MULTIPLY

    7 LOAD_FAST 1 (b) 10 LOAD_FAST 1 (b) 13 BINARY_MULTIPLY 14 BINARY_SUBTRACT 15 RETURN_VALUE a*a b*b
  48. 0 LOAD_FAST 0 (a) 3 LOAD_FAST 0 (a) 6 BINARY_MULTIPLY

    7 LOAD_FAST 1 (b) 10 LOAD_FAST 1 (b) 13 BINARY_MULTIPLY 14 BINARY_SUBTRACT 15 RETURN_VALUE a*a-b*b
  49. 0 LOAD_FAST 0 (a) 3 LOAD_FAST 0 (a) 6 BINARY_MULTIPLY

    7 LOAD_FAST 1 (b) 10 LOAD_FAST 1 (b) 13 BINARY_MULTIPLY 14 BINARY_SUBTRACT 15 RETURN_VALUE a*a-b*b
  50. 0 LOAD_FAST 0 (a)

  51. 0 LOAD_FAST 0 (a) >>> foo.__code__.co_code ‘|\x00\x00|\x00\x00\x14|\x01\x00|\x01\x00\x14\x185’

  52. 0 LOAD_FAST 0 (a) >>> foo.__code__.co_code ‘|\x00\x00|\x00\x00\x14|\x01\x00|\x01\x00\x14\x185’

  53. 0 LOAD_FAST 0 (a) >>> foo.__code__.co_code ‘|\x00\x00|\x00\x00\x14|\x01\x00|\x01\x00\x14\x185’

  54. 0 LOAD_FAST 0 (a) >>> foo.__code__.co_code ‘|\x00\x00|\x00\x00\x14|\x01\x00|\x01\x00\x14\x185’

  55. 0 LOAD_FAST 0 (a) >>> foo.__code__.co_code ‘|\x00\x00|\x00\x00\x14|\x01\x00|\x01\x00\x14\x185’ >>> bytecode =

    map(ord, foo.__code__.co_code)
  56. 0 LOAD_FAST 0 (a) >>> foo.__code__.co_code ‘|\x00\x00|\x00\x00\x14|\x01\x00|\x01\x00\x14\x185’ >>> bytecode =

    map(ord, foo.__code__.co_code) >>> opcode = bytecode[0]
  57. 0 LOAD_FAST 0 (a) >>> foo.__code__.co_code ‘|\x00\x00|\x00\x00\x14|\x01\x00|\x01\x00\x14\x185’ >>> bytecode =

    map(ord, foo.__code__.co_code) >>> opcode = bytecode[0] >>> opcode 124
  58. 0 LOAD_FAST 0 (a) >>> foo.__code__.co_code ‘|\x00\x00|\x00\x00\x14|\x01\x00|\x01\x00\x14\x185’ >>> bytecode =

    map(ord, foo.__code__.co_code) >>> opcode = bytecode[0] >>> opcode 124 >>> dis.opname[opcode]
  59. 0 LOAD_FAST 0 (a) >>> foo.__code__.co_code ‘|\x00\x00|\x00\x00\x14|\x01\x00|\x01\x00\x14\x185’ >>> bytecode =

    map(ord, foo.__code__.co_code) >>> opcode = bytecode[0] >>> opcode 124 >>> dis.opname[opcode] ‘LOAD_FAST’
  60. 0 LOAD_FAST 0 (a) >>> foo.__code__.co_code ‘|\x00\x00|\x00\x00\x14|\x01\x00|\x01\x00\x14\x185’ >>> bytecode =

    map(ord, foo.__code__.co_code) >>> opcode = bytecode[0] >>> opcode 124 >>> dis.opname[opcode] ‘LOAD_FAST’ >>> arg = bytecode[1] | bytecode[2] << 8
  61. 0 LOAD_FAST 0 (a) >>> foo.__code__.co_code ‘|\x00\x00|\x00\x00\x14|\x01\x00|\x01\x00\x14\x185’ >>> bytecode =

    map(ord, foo.__code__.co_code) >>> opcode = bytecode[0] >>> opcode 124 >>> dis.opname[opcode] ‘LOAD_FAST’ >>> arg = bytecode[1] | bytecode[2] << 8 >>> arg 0
  62. 0 LOAD_FAST 0 (a) >>> foo.__code__.co_code ‘|\x00\x00|\x00\x00\x14|\x01\x00|\x01\x00\x14\x185’ >>> bytecode =

    map(ord, foo.__code__.co_code) >>> opcode = bytecode[0] >>> opcode 124 >>> dis.opname[opcode] ‘LOAD_FAST’ >>> arg = bytecode[1] | bytecode[2] << 8 >>> arg 0 >>> foo.__code__.co_varnames[arg] ‘a’
  63. None
  64. >>> def bar(): return 123

  65. >>> dis.dis(bar) 1 0 LOAD_CONST 1 (123) 3 RETURN_VALUE >>>

    def bar(): return 123
  66. >>> dis.dis(bar) 1 0 LOAD_CONST 1 (123) 3 RETURN_VALUE >>>

    def bar(): return 123
  67. >>> dis.dis(bar) 1 0 LOAD_CONST 1 (123) 3 RETURN_VALUE >>>

    def bar(): return 123 >>> bar.__code__.co_consts[1] 123
  68. >>> dis.dis(bar) 1 0 LOAD_CONST 1 (123) 3 RETURN_VALUE >>>

    def bar(): return 123 >>> bar.__code__.co_consts[1] 123
  69. None
  70. class Compiler(object): # …

  71. class Compiler(object): # … def fetch(self): byte = self.bytecode[self.index]

  72. class Compiler(object): # … def fetch(self): byte = self.bytecode[self.index] self.index

    += 1 return byte
  73. class Compiler(object): # … def fetch(self): byte = self.bytecode[self.index] self.index

    += 1 return byte def decode(self): opcode = self.fetch() opname = dis.opname[opcode]
  74. class Compiler(object): # … def fetch(self): byte = self.bytecode[self.index] self.index

    += 1 return byte def decode(self): opcode = self.fetch() opname = dis.opname[opcode] if takes_arg(opname): arg = self.fetch() | self.fetch() << 8 else: arg = None
  75. class Compiler(object): # … def fetch(self): byte = self.bytecode[self.index] self.index

    += 1 return byte def decode(self): opcode = self.fetch() opname = dis.opname[opcode] if takes_arg(opname): arg = self.fetch() | self.fetch() << 8 else: arg = None return opname, arg
  76. None
  77. def compile(self): while self.index < len(self.bytecode): op, arg = self.decode()

    if op == “LOAD_FAST”: yield “push”, self.variable(arg), None elif …
  78. yield “push”, self.variable(arg), None

  79. yield “push”, self.variable(arg), None

  80. yield “push”, self.variable(arg), None IR & Assembly

  81. yield “push”, self.variable(arg), None IR & Assembly rax rbx rcx

    … rsi rdi … Registers return address Stack
  82. push yield “push”, self.variable(arg), None IR & Assembly rax rbx

    rcx … rsi rdi … Registers return address Stack
  83. push yield “push”, self.variable(arg), None IR & Assembly rax rbx

    rcx … rsi rdi … Registers return address Stack
  84. push yield “push”, self.variable(arg), None IR & Assembly rax rbx

    rcx … rsi rdi … Registers return address Stack
  85. push yield “push”, self.variable(arg), None IR & Assembly rax rbx

    rcx … rsi rdi … Registers return address Stack
  86. push yield “push”, self.variable(arg), None IR & Assembly rax rbx

    rcx … rsi rdi … Registers return address Stack def variable(self, index): passing_order = (“rdi”, “rsi”, “rdx”, “rcx”) return passing_order[index]
  87. push yield “push”, self.variable(arg), None IR & Assembly rax rbx

    rcx … rsi rdi … Registers return address Stack
  88. push yield “push”, self.variable(arg), None IR & Assembly rax rbx

    rcx … rsi rdi … Registers return address Stack rdi
  89. None
  90. def baz(a, b): a = b

  91. def baz(a, b): a = b LOAD_FAST 1 (b) STORE_FAST

    0 (a) LOAD_CONST 0 (None) RETURN_VALUE
  92. LOAD_FAST 1 def baz(a, b): a = b LOAD_FAST 1

    (b) STORE_FAST 0 (a) LOAD_CONST 0 (None) RETURN_VALUE
  93. LOAD_FAST 1 push rsi def baz(a, b): a = b

    LOAD_FAST 1 (b) STORE_FAST 0 (a) LOAD_CONST 0 (None) RETURN_VALUE
  94. LOAD_FAST 1 push rsi STORE_FAST 0 def baz(a, b): a

    = b LOAD_FAST 1 (b) STORE_FAST 0 (a) LOAD_CONST 0 (None) RETURN_VALUE
  95. LOAD_FAST 1 push rsi STORE_FAST 0 pop rax mov rdi,

    rax def baz(a, b): a = b LOAD_FAST 1 (b) STORE_FAST 0 (a) LOAD_CONST 0 (None) RETURN_VALUE
  96. LOAD_FAST 1 push rsi STORE_FAST 0 pop rax mov rdi,

    rax def baz(a, b): a = b LOAD_FAST 1 (b) STORE_FAST 0 (a) LOAD_CONST 0 (None) RETURN_VALUE LOAD_CONST 0
  97. LOAD_FAST 1 push rsi STORE_FAST 0 pop rax mov rdi,

    rax def baz(a, b): a = b LOAD_FAST 1 (b) STORE_FAST 0 (a) LOAD_CONST 0 (None) RETURN_VALUE LOAD_CONST 0 imm rax, 0 push rax
  98. LOAD_FAST 1 push rsi STORE_FAST 0 pop rax mov rdi,

    rax def baz(a, b): a = b LOAD_FAST 1 (b) STORE_FAST 0 (a) LOAD_CONST 0 (None) RETURN_VALUE LOAD_CONST 0 imm rax, 0 push rax RETURN_VALUE
  99. LOAD_FAST 1 push rsi STORE_FAST 0 pop rax mov rdi,

    rax def baz(a, b): a = b LOAD_FAST 1 (b) STORE_FAST 0 (a) LOAD_CONST 0 (None) RETURN_VALUE LOAD_CONST 0 imm rax, 0 push rax RETURN_VALUE pop rax ret
  100. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret
  101. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22)
  102. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax rsi rdi return address
  103. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax rsi 22 rdi 11 return address
  104. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax rsi 22 rdi 11 return address
  105. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax rsi 22 rdi 11 return address
  106. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax rsi 22 rdi 11 return address 22
  107. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax rsi 22 rdi 11 return address 22
  108. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 22 rsi 22 rdi 11 return address
  109. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 22 rsi 22 rdi 11 return address
  110. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 22 rsi 22 rdi 22 return address
  111. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 22 rsi 22 rdi 22 return address
  112. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 0 rsi 22 rdi 22 return address
  113. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 0 rsi 22 rdi 22 return address
  114. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 0 rsi 22 rdi 22 return address 0
  115. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 0 rsi 22 rdi 22 return address 0
  116. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 0 rsi 22 rdi 22 return address
  117. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 0 rsi 22 rdi 22 return address
  118. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 0 rsi 22 rdi 22
  119. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 0 rsi 22 rdi 22
  120. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 0 rsi 22 rdi 22 LOAD_FAST push <reg> STORE_FAST pop rax mov <reg>, rax LOAD_CONST imm rax push rax BINARY_MULTIPLY pop rax pop rbx imul rax, rbx push rax BINARY_ADD / INPLACE_ADD pop rax pop rbx add rax, rbx push rax BINARY_SUBTRACT / INPLACE_SUBTRACT pop rbx pop rax sub rax, rbx push rax UNARY_NEGATIVE pop rax neg rax push rax RETURN_VALUE pop rax ret
  121. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 0 rsi 22 rdi 22 LOAD_FAST push <reg> STORE_FAST pop rax mov <reg>, rax LOAD_CONST imm rax push rax BINARY_MULTIPLY pop rax pop rbx imul rax, rbx push rax BINARY_ADD / INPLACE_ADD pop rax pop rbx add rax, rbx push rax BINARY_SUBTRACT / INPLACE_SUBTRACT pop rbx pop rax sub rax, rbx push rax UNARY_NEGATIVE pop rax neg rax push rax RETURN_VALUE pop rax ret
  122. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 0 rsi 22 rdi 22 LOAD_FAST push <reg> STORE_FAST pop rax mov <reg>, rax LOAD_CONST imm rax push rax BINARY_MULTIPLY pop rax pop rbx imul rax, rbx push rax BINARY_ADD / INPLACE_ADD pop rax pop rbx add rax, rbx push rax BINARY_SUBTRACT / INPLACE_SUBTRACT pop rbx pop rax sub rax, rbx push rax UNARY_NEGATIVE pop rax neg rax push rax RETURN_VALUE pop rax ret
  123. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 0 rsi 22 rdi 22 LOAD_FAST push <reg> STORE_FAST pop rax mov <reg>, rax LOAD_CONST imm rax push rax BINARY_MULTIPLY pop rax pop rbx imul rax, rbx push rax BINARY_ADD / INPLACE_ADD pop rax pop rbx add rax, rbx push rax BINARY_SUBTRACT / INPLACE_SUBTRACT pop rbx pop rax sub rax, rbx push rax UNARY_NEGATIVE pop rax neg rax push rax RETURN_VALUE pop rax ret
  124. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 0 rsi 22 rdi 22 LOAD_FAST push <reg> STORE_FAST pop rax mov <reg>, rax LOAD_CONST imm rax push rax BINARY_MULTIPLY pop rax pop rbx imul rax, rbx push rax BINARY_ADD / INPLACE_ADD pop rax pop rbx add rax, rbx push rax BINARY_SUBTRACT / INPLACE_SUBTRACT pop rbx pop rax sub rax, rbx push rax UNARY_NEGATIVE pop rax neg rax push rax RETURN_VALUE pop rax ret
  125. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 0 rsi 22 rdi 22 LOAD_FAST push <reg> STORE_FAST pop rax mov <reg>, rax LOAD_CONST imm rax push rax BINARY_MULTIPLY pop rax pop rbx imul rax, rbx push rax BINARY_ADD / INPLACE_ADD pop rax pop rbx add rax, rbx push rax BINARY_SUBTRACT / INPLACE_SUBTRACT pop rbx pop rax sub rax, rbx push rax UNARY_NEGATIVE pop rax neg rax push rax RETURN_VALUE pop rax ret
  126. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 0 rsi 22 rdi 22 LOAD_FAST push <reg> STORE_FAST pop rax mov <reg>, rax LOAD_CONST imm rax push rax BINARY_MULTIPLY pop rax pop rbx imul rax, rbx push rax BINARY_ADD / INPLACE_ADD pop rax pop rbx add rax, rbx push rax BINARY_SUBTRACT / INPLACE_SUBTRACT pop rbx pop rax sub rax, rbx push rax UNARY_NEGATIVE pop rax neg rax push rax RETURN_VALUE pop rax ret
  127. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 0 rsi 22 rdi 22 LOAD_FAST push <reg> STORE_FAST pop rax mov <reg>, rax LOAD_CONST imm rax push rax BINARY_MULTIPLY pop rax pop rbx imul rax, rbx push rax BINARY_ADD / INPLACE_ADD pop rax pop rbx add rax, rbx push rax BINARY_SUBTRACT / INPLACE_SUBTRACT pop rbx pop rax sub rax, rbx push rax UNARY_NEGATIVE pop rax neg rax push rax RETURN_VALUE pop rax ret
  128. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 0 rsi 22 rdi 22 LOAD_FAST push <reg> STORE_FAST pop rax mov <reg>, rax LOAD_CONST imm rax push rax BINARY_MULTIPLY pop rax pop rbx imul rax, rbx push rax BINARY_ADD / INPLACE_ADD pop rax pop rbx add rax, rbx push rax BINARY_SUBTRACT / INPLACE_SUBTRACT pop rbx pop rax sub rax, rbx push rax UNARY_NEGATIVE pop rax neg rax push rax RETURN_VALUE pop rax ret
  129. push rsi pop rax mov rdi, rax def baz(a, b):

    a = b imm rax, 0 push rax pop rax ret >>> baz(11, 22) rax 0 rsi 22 rdi 22
  130. push rsi pop rax mov rdi, rax imm rax, 0

    push rax pop rax ret
  131. push rsi pop rax mov rdi, rax imm rax, 0

    push rax pop rax ret
  132. push rsi pop rax mov rdi, rax imm rax, 0

    push rax pop rax ret
  133. push rsi pop rax mov rdi, rax imm rax, 0

    push rax pop rax ret
  134. push rsi pop rax mov rdi, rax imm rax, 0

    push rax pop rax ret mov rax, rsi
  135. push rsi pop rax mov rdi, rax imm rax, 0

    push rax pop rax ret mov rax, rsi
  136. push rsi pop rax mov rdi, rax imm rax, 0

    push rax pop rax ret mov rax, rsi Peephole Optimization
  137. push rsi pop rax mov rdi, rax imm rax, 0

    push rax pop rax ret mov rax, rsi Peephole Optimization
  138. push rsi pop rax mov rdi, rax imm rax, 0

    push rax pop rax ret mov rax, rsi mov rdi, rax Peephole Optimization
  139. push rsi pop rax mov rdi, rax imm rax, 0

    push rax pop rax ret mov rax, rsi mov rdi, rax Peephole Optimization
  140. push rsi pop rax mov rdi, rax imm rax, 0

    push rax pop rax ret mov rax, rsi mov rdi, rax imm rax, 0 Peephole Optimization
  141. push rsi pop rax mov rdi, rax imm rax, 0

    push rax pop rax ret mov rax, rsi mov rdi, rax imm rax, 0 Peephole Optimization
  142. push rsi pop rax mov rdi, rax imm rax, 0

    push rax pop rax ret mov rax, rsi mov rdi, rax imm rax, 0 Peephole Optimization
  143. push rsi pop rax mov rdi, rax imm rax, 0

    push rax pop rax ret mov rax, rsi mov rdi, rax imm rax, 0 ret Peephole Optimization
  144. mov rax, rsi mov rdi, rax imm rax, 0 ret

    Peephole Optimization
  145. mov rdi, rsi imm rax, 0 ret Peephole Optimization

  146. mov rdi, rsi imm rax, 0 ret Peephole Optimization

  147. mov rdi, rsi imm rax, 0 ret Peephole Optimization

  148. None
  149. class Assembler(object): # … def emit(self, *args): for code in

    args: self.block[self.index] = code self.index += 1 Machine Code Generation
  150. class Assembler(object): # … def emit(self, *args): for code in

    args: self.block[self.index] = code self.index += 1 Machine Code Generation def push(self, a, dummy): self.emit(0x50 | self.registers(a))
  151. class Assembler(object): # … def emit(self, *args): for code in

    args: self.block[self.index] = code self.index += 1 Machine Code Generation def push(self, a, dummy): self.emit(0x50 | self.registers(a))
  152. class Assembler(object): # … def emit(self, *args): for code in

    args: self.block[self.index] = code self.index += 1 Machine Code Generation def push(self, a, dummy): self.emit(0x50 | self.registers(a)) def registers(self, a): order = (“rax”, “rcx”, …) return order.index(a)
  153. class Assembler(object): # … def emit(self, *args): for code in

    args: self.block[self.index] = code self.index += 1 Machine Code Generation def push(self, a, dummy): self.emit(0x50 | self.registers(a)) def registers(self, a): order = (“rax”, “rcx”, …) return order.index(a)
  154. Machine Code Generation

  155. def add(self, a, b): self.emit(0x48, 0x01, 0xc0 | self.registers(a, b))

    def immediate(self, a, constant): self.emit(0x48, 0xb8 | self.registers(a), *self.little_endian(constant)) Machine Code Generation
  156. def add(self, a, b): self.emit(0x48, 0x01, 0xc0 | self.registers(a, b))

    def immediate(self, a, constant): self.emit(0x48, 0xb8 | self.registers(a), *self.little_endian(constant)) Machine Code Generation
  157. def add(self, a, b): self.emit(0x48, 0x01, 0xc0 | self.registers(a, b))

    def immediate(self, a, constant): self.emit(0x48, 0xb8 | self.registers(a), *self.little_endian(constant)) Machine Code Generation
  158. def add(self, a, b): self.emit(0x48, 0x01, 0xc0 | self.registers(a, b))

    def immediate(self, a, constant): self.emit(0x48, 0xb8 | self.registers(a), *self.little_endian(constant)) Machine Code Generation
  159. def add(self, a, b): self.emit(0x48, 0x01, 0xc0 | self.registers(a, b))

    def immediate(self, a, constant): self.emit(0x48, 0xb8 | self.registers(a), *self.little_endian(constant)) Machine Code Generation
  160. Machine Code Generation

  161. def ret(self, a, b): self.emit(0xc3) def push(self, a, _): self.emit(0x50

    | self.registers(a)) def pop(self, a, _): self.emit(0x58 | self.registers(a)) def imul(self, a, b): self.emit(0x48, 0x0f, 0xaf, 0xc0 | self.registers(a, b)) def add(self, a, b): self.emit(0x48, 0x01, 0xc0 | self.registers(b, a)) def sub(self, a, b): self.emit(0x48, 0x29, 0xc0 | self.registers(b, a)) def neg(self, a, _): self.emit(0x48, 0xf7, 0xd8 | self.register(a)) def mov(self, a, b): self.emit(0x48, 0x89, 0xc0 | self.registers(b, a)) def immediate(self, a, number): self.emit(0x48, 0xb8 | self.registers(a), *self.little_endian(number)) Machine Code Generation
  162. None
  163. Memory Management

  164. Memory Management • Allocate one page of memory using mmap

  165. Memory Management • Allocate one page of memory using mmap

    • Use mprotect to change from R+W to R+X
  166. Memory Management

  167. Memory Management def create_block(size): ptr = mmap(0, size, MMAP.PROT_WRIT |

    MMAP.PROT_READ | MMAP.MAP_ANONYMOUS, 0, 0) if ptr == MAP_FAILED: raise RuntimeError(…) return ptr
  168. Memory Management def create_block(size): ptr = mmap(0, size, MMAP.PROT_WRIT |

    MMAP.PROT_READ | MMAP.MAP_ANONYMOUS, 0, 0) if ptr == MAP_FAILED: raise RuntimeError(…) return ptr
  169. Memory Management def create_block(size): ptr = mmap(0, size, MMAP.PROT_WRIT |

    MMAP.PROT_READ | MMAP.MAP_ANONYMOUS, 0, 0) if ptr == MAP_FAILED: raise RuntimeError(…) return ptr
  170. Memory Management def create_block(size): ptr = mmap(0, size, MMAP.PROT_WRIT |

    MMAP.PROT_READ | MMAP.MAP_ANONYMOUS, 0, 0) if ptr == MAP_FAILED: raise RuntimeError(…) return ptr
  171. Memory Management

  172. Memory Management def make_executable(block, size): if mprotect(block, size, MMAP.PROT_READ |

    MMAP.PROT_EXEC) != 0: raise RuntimeError(…)
  173. Memory Management def make_executable(block, size): if mprotect(block, size, MMAP.PROT_READ |

    MMAP.PROT_EXEC) != 0: raise RuntimeError(…)
  174. None
  175. FFI

  176. FFI • Use ctypes as FFI

  177. FFI • Use ctypes as FFI • Grab machine code

    address, create signature and bind to function.
  178. FFI • Use ctypes as FFI • Grab machine code

    address, create signature and bind to function. • Ctypes also seem to save and restore registers
  179. FFI • Use ctypes as FFI • Grab machine code

    address, create signature and bind to function. • Ctypes also seem to save and restore registers • More overhead: Marshalling arguments
  180. FFI

  181. FFI signature = ctypes.CFUNCTYPE([ctypes.c_int64, ctypes.c_int64]) signature.restype = ctypes.c_int64 func =

    signature(assembler.address) print(func(1, 2))
  182. FFI signature = ctypes.CFUNCTYPE([ctypes.c_int64, ctypes.c_int64]) signature.restype = ctypes.c_int64 func =

    signature(assembler.address) print(func(1, 2))
  183. FFI signature = ctypes.CFUNCTYPE([ctypes.c_int64, ctypes.c_int64]) signature.restype = ctypes.c_int64 func =

    signature(assembler.address) print(func(1, 2))
  184. FFI signature = ctypes.CFUNCTYPE([ctypes.c_int64, ctypes.c_int64]) signature.restype = ctypes.c_int64 func =

    signature(assembler.address) print(func(1, 2))
  185. FFI signature = ctypes.CFUNCTYPE([ctypes.c_int64, ctypes.c_int64]) signature.restype = ctypes.c_int64 func =

    signature(assembler.address) print(func(1, 2))
  186. None
  187. The “@jit” Decorator def jit(func): def front(*args, **kw): if not

    hasattr(front, “func”): front.func = compile_native(func) return front.func(*args, **kw) return front @jit def foo(a, b): return a*a - b*b
  188. The “@jit” Decorator def jit(func): def front(*args, **kw): if not

    hasattr(front, “func”): front.func = compile_native(func) return front.func(*args, **kw) return front @jit def foo(a, b): return a*a - b*b
  189. Demo

  190. Things we haven’t touched on

  191. Things we haven’t touched on • Code generation: SSA, branching

  192. Things we haven’t touched on • Code generation: SSA, branching

    • Optimization: TAC, register allocation
  193. Things we haven’t touched on • Code generation: SSA, branching

    • Optimization: TAC, register allocation • Compatibility: Python object system
  194. Things we haven’t touched on • Code generation: SSA, branching

    • Optimization: TAC, register allocation • Compatibility: Python object system • Performance gotchas
  195. Pointers • Full details on blog post at
 https://csl.name/post/python-compiler/ •

    PeachPy • Truffle + Graal, LLVM, libjit, etc. • NUMBA
  196. Thanks y’all!