Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Writing a JIT in Python

Xuanyi
August 07, 2014

Writing a JIT in Python

I wanted to give this lightning talk at PyconAU but time ran out so I didn't get to give this talk (register early, folks!)

I ended up giving this talk at Sydney Python the following Thursday

The code can be found here: https://gist.github.com/chewxy/17e6920b608208647a74

Xuanyi

August 07, 2014
Tweet

More Decks by Xuanyi

Other Decks in Programming

Transcript

  1. Why Not Python? •  Spent that night thinking about it

    •  Wrote a prototype JIT … thing in Python the next morning
  2. Basic JIT Ideas •  Transforms code at run time into

    machine code •  Run machine code •  Get results from machine code being run
  3. Basic JIT Ideas •  Transforms code at run time into

    machine code •  Run machine code •  Get results from machine code being run Main Objections! Excuses
  4. Then I Started Thinking… •  Linux provides mmap(2)/mprotect(2)   • 

    No/restricted access to these system calls from Python •  libc.so.6 is available
  5. New Plan •  Transforms code at run time into machine

    code •  Run machine code •  Get results from machine code being run
  6. Line By Line Explanation from  ctypes  import  *    

    import  os,  sys     argv  =  int(sys.argv[1])     PROT_NONE  =  0x0   PROT_READ  =  0x1   PROT_WRITE  =  0x2   PROT_EXEC  =  0x4   BAD! NEVER DO THIS Import OS Stuff Convert sys.argv[1] to int (look ma, no try-except) PROT_xxx -> mprotect flags PROT_NONE = set memory to inaccessible PROT_READ = set memory to readable PROT_WRITE = set memory to writable PROT_EXEC = set memory to executable
  7. Line By Line Explanation buf  =  ''.join(map(chr,  [    0x55,

       0x48,  0x89,  0xe5,    0x89,  0x7d,  0xfc,    0x89,  0x75,  0xf8,    0x8b,  0x45,  0xf8,    0x8b,  0x55,  0xfc,    0x01,  0xd0,    0x5d,    0xc3   ]))     pushq %rbp movq %rsp, %rbp movl %edi*, -4(%rbp) movl %esi* -8(%rbp) movl -8(%rbp), %eax movl -4(%rbp), %edx addl %eax, %edx popq %rbp ret * x86_64 Linux system call convention: %rdi, %rsi, %rdx, %r10, %r8, %r9 are used to pass function parameters
  8. Line By Line Explanation stringBuffer  =   create_string_buffer(buf)    

    codeAddress  =   addressof(stringBuffer)     pageSize  =   pythonapi.getpagesize()     sizeOfCode  =   sizeof(stringBuffer)   Creates a []char C type value Get the memory address Get the memory page size of the OS* Get the size of the array * Fun fact: There are at least 3 ways of getting this. pythonapi is the cleanest
  9. Line By Line Explanation mask  =  pageSize  –  1  

      addrPtr  =  ~mask  &   codeAddress     loc  =  mask  &  codeAddress     Create Mask Get pointer to address (Making sure it is in one page) Prepare calculation of code length
  10. Line By Line Explanation returnedValue  =   pythonapi.mprotect(addrPtr,   loc

     +  sizeOfCode,   PROT_READ|PROT_WRITE| PROT_EXEC)     function  =   cast(stringBuffer,   CFUNCTYPE(c_long,  c_long))     print(repr(function(argv,ar gv2)))   Call mprotect(), and set the region of memory to be read/ write/executable (VERY UNSAFE) Cast as function taking 2 longs as arguments Call function, and print result * man mprotect: mprotect() changes protection for the calling process's memory page(s) containing any part of the address range in the interval [addr, addr+len-1]. addr must be aligned to a page boundary.