Slide 1

Slide 1 text

Writing a JIT in Python (wtf?) Xuanyi Chew

Slide 2

Slide 2 text

QUESTION ASKED ON FRIDAY

Slide 3

Slide 3 text

Why Not Python? •  Spent that night thinking about it •  Wrote a prototype JIT … thing in Python the next morning

Slide 4

Slide 4 text

Basic JIT Ideas •  Transforms code at run time into machine code •  Run machine code •  Get results from machine code being run

Slide 5

Slide 5 text

Basic JIT Ideas •  Transforms code at run time into machine code •  Run machine code •  Get results from machine code being run Main Objections! Excuses

Slide 6

Slide 6 text

Then I Started Thinking… •  Linux provides mmap(2)/mprotect(2)   •  No/restricted access to these system calls from Python •  libc.so.6 is available

Slide 7

Slide 7 text

ctypes.pythonapi   •  3 iterations in, discovered ctypes.pythonapi  

Slide 8

Slide 8 text

New Plan •  Transforms code at run time into machine code •  Run machine code •  Get results from machine code being run

Slide 9

Slide 9 text

DEMO TIME

Slide 10

Slide 10 text

Line By Line Explanation from  ctypes  import  *     import  os,  sys     argv  =  int(sys.argv[1])     PROT_NONE  =  0x0   PROT_READ  =  0x1   PROT_WRITE  =  0x2   PROT_EXEC  =  0x4   BAD! NEVER DO THIS Import OS Stuff Convert sys.argv[1] to int (look ma, no try-except) PROT_xxx -> mprotect flags PROT_NONE = set memory to inaccessible PROT_READ = set memory to readable PROT_WRITE = set memory to writable PROT_EXEC = set memory to executable

Slide 11

Slide 11 text

Line By Line Explanation buf  =  ''.join(map(chr,  [    0x55,    0x48,  0x89,  0xe5,    0x89,  0x7d,  0xfc,    0x89,  0x75,  0xf8,    0x8b,  0x45,  0xf8,    0x8b,  0x55,  0xfc,    0x01,  0xd0,    0x5d,    0xc3   ]))     pushq %rbp movq %rsp, %rbp movl %edi*, -4(%rbp) movl %esi* -8(%rbp) movl -8(%rbp), %eax movl -4(%rbp), %edx addl %eax, %edx popq %rbp ret * x86_64 Linux system call convention: %rdi, %rsi, %rdx, %r10, %r8, %r9 are used to pass function parameters

Slide 12

Slide 12 text

Line By Line Explanation stringBuffer  =   create_string_buffer(buf)     codeAddress  =   addressof(stringBuffer)     pageSize  =   pythonapi.getpagesize()     sizeOfCode  =   sizeof(stringBuffer)   Creates a []char C type value Get the memory address Get the memory page size of the OS* Get the size of the array * Fun fact: There are at least 3 ways of getting this. pythonapi is the cleanest

Slide 13

Slide 13 text

Line By Line Explanation mask  =  pageSize  –  1     addrPtr  =  ~mask  &   codeAddress     loc  =  mask  &  codeAddress     Create Mask Get pointer to address (Making sure it is in one page) Prepare calculation of code length

Slide 14

Slide 14 text

Line By Line Explanation returnedValue  =   pythonapi.mprotect(addrPtr,   loc  +  sizeOfCode,   PROT_READ|PROT_WRITE| PROT_EXEC)     function  =   cast(stringBuffer,   CFUNCTYPE(c_long,  c_long))     print(repr(function(argv,ar gv2)))   Call mprotect(), and set the region of memory to be read/ write/executable (VERY UNSAFE) Cast as function taking 2 longs as arguments Call function, and print result * man mprotect: mprotect() changes protection for the calling process's memory page(s) containing any part of the address range in the interval [addr, addr+len-1]. addr must be aligned to a page boundary.