Save 37% off PRO during our Black Friday Sale! »

Writing a JIT in Python

3e45c02f2ae5f812a55c4975124da6b2?s=47 Xuanyi
August 07, 2014

Writing a JIT in Python

I wanted to give this lightning talk at PyconAU but time ran out so I didn't get to give this talk (register early, folks!)

I ended up giving this talk at Sydney Python the following Thursday

The code can be found here: https://gist.github.com/chewxy/17e6920b608208647a74

3e45c02f2ae5f812a55c4975124da6b2?s=128

Xuanyi

August 07, 2014
Tweet

Transcript

  1. Writing a JIT in Python (wtf?) Xuanyi Chew

  2. QUESTION ASKED ON FRIDAY

  3. Why Not Python? •  Spent that night thinking about it

    •  Wrote a prototype JIT … thing in Python the next morning
  4. Basic JIT Ideas •  Transforms code at run time into

    machine code •  Run machine code •  Get results from machine code being run
  5. Basic JIT Ideas •  Transforms code at run time into

    machine code •  Run machine code •  Get results from machine code being run Main Objections! Excuses
  6. Then I Started Thinking… •  Linux provides mmap(2)/mprotect(2)   • 

    No/restricted access to these system calls from Python •  libc.so.6 is available
  7. ctypes.pythonapi   •  3 iterations in, discovered ctypes.pythonapi  

  8. New Plan •  Transforms code at run time into machine

    code •  Run machine code •  Get results from machine code being run
  9. DEMO TIME

  10. Line By Line Explanation from  ctypes  import  *    

    import  os,  sys     argv  =  int(sys.argv[1])     PROT_NONE  =  0x0   PROT_READ  =  0x1   PROT_WRITE  =  0x2   PROT_EXEC  =  0x4   BAD! NEVER DO THIS Import OS Stuff Convert sys.argv[1] to int (look ma, no try-except) PROT_xxx -> mprotect flags PROT_NONE = set memory to inaccessible PROT_READ = set memory to readable PROT_WRITE = set memory to writable PROT_EXEC = set memory to executable
  11. Line By Line Explanation buf  =  ''.join(map(chr,  [    0x55,

       0x48,  0x89,  0xe5,    0x89,  0x7d,  0xfc,    0x89,  0x75,  0xf8,    0x8b,  0x45,  0xf8,    0x8b,  0x55,  0xfc,    0x01,  0xd0,    0x5d,    0xc3   ]))     pushq %rbp movq %rsp, %rbp movl %edi*, -4(%rbp) movl %esi* -8(%rbp) movl -8(%rbp), %eax movl -4(%rbp), %edx addl %eax, %edx popq %rbp ret * x86_64 Linux system call convention: %rdi, %rsi, %rdx, %r10, %r8, %r9 are used to pass function parameters
  12. Line By Line Explanation stringBuffer  =   create_string_buffer(buf)    

    codeAddress  =   addressof(stringBuffer)     pageSize  =   pythonapi.getpagesize()     sizeOfCode  =   sizeof(stringBuffer)   Creates a []char C type value Get the memory address Get the memory page size of the OS* Get the size of the array * Fun fact: There are at least 3 ways of getting this. pythonapi is the cleanest
  13. Line By Line Explanation mask  =  pageSize  –  1  

      addrPtr  =  ~mask  &   codeAddress     loc  =  mask  &  codeAddress     Create Mask Get pointer to address (Making sure it is in one page) Prepare calculation of code length
  14. Line By Line Explanation returnedValue  =   pythonapi.mprotect(addrPtr,   loc

     +  sizeOfCode,   PROT_READ|PROT_WRITE| PROT_EXEC)     function  =   cast(stringBuffer,   CFUNCTYPE(c_long,  c_long))     print(repr(function(argv,ar gv2)))   Call mprotect(), and set the region of memory to be read/ write/executable (VERY UNSAFE) Cast as function taking 2 longs as arguments Call function, and print result * man mprotect: mprotect() changes protection for the calling process's memory page(s) containing any part of the address range in the interval [addr, addr+len-1]. addr must be aligned to a page boundary.