Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CPython bytecode evolution

CPython bytecode evolution

The presentation from SPbPython community / PiterPy meetup about a CPython bytecode evolution. What has changed in CPython since the release of CPython 3.6. One example with EXTENDED_ARG opcode was shown.

Delimitry

May 15, 2018
Tweet

More Decks by Delimitry

Other Decks in Programming

Transcript

  1. Bytecode According to https://docs.python.org/3/glossary.html#term-bytecode - Internal representation of a Python

    program in the CPython interpreter - Cached in .pyc files so that executing the same file is faster the second time - Run on a VM that executes the machine code corresponding to each bytecode
  2. CPython source code to bytecode def foo(v): if not v:

    return 0 print(v) foo(123) compilation 64 00 64 01 84 00 5A 00 65 00 64 02 83 01 01 00 64 03 53 00 bytecode source code 1) Parse source code into a parse tree (Parser/pgen.c) 2) Transform parse tree (CST) to an AST (Python/ast.c) 3) Transform AST into a Control Flow Graph (CFG) (Python/compile.c) 4) Emit the bytecode based on the CFG (Python/compile.c) 5) Optimize the bytecode with peephole optimizations (Python/peephole.c)
  3. CPython bytecode def foo(v): if not v: return 0 print(v)

    foo(123) .py file # CPython 2.7 >>> print(foo.__code__.co_code.encode('hex')) '7c0000730a00640100537c0000474864000053' # bytecode
  4. Bytecode disassembling def foo(v): if not v: return 0 print(v)

    foo(123) .py file # CPython 2.7 >>> import dis >>> dis.dis(foo) 4 0 LOAD_FAST 0 (v) 3 POP_JUMP_IF_TRUE 10 5 6 LOAD_CONST 1 (0) 9 RETURN_VALUE 6 >> 10 LOAD_FAST 0 (v) 13 PRINT_ITEM 14 PRINT_NEWLINE 15 LOAD_CONST 0 (None) 18 RETURN_VALUE
  5. Bytecode disassembling def foo(v): if not v: return 0 print(v)

    foo(123) .pyc file # CPython 3.6 >>> magic, ts, size = f.read(4), f.read(4), f.read(4) >>> code = marshal.load(f) >>> dis.disassemble(code) 1 0 LOAD_CONST 0 (<code object foo at ..., file "foo.py", line 1>) 2 LOAD_CONST 1 ('foo') 4 MAKE_FUNCTION 0 6 STORE_NAME 0 (foo) 6 8 LOAD_NAME 0 (foo) 10 LOAD_CONST 2 (123) 12 CALL_FUNCTION 1 14 POP_TOP 16 LOAD_CONST 3 (None) 18 RETURN_VALUE 33 0D 0D 0A DA 80 F4 5A 48 00 00 00 E3 00 00 00 00 00 00 00 00 00 00 00 00 02 00 00 .py file
  6. CPython implementation details https://docs.python.org/3/library/dis.html No guarantees are made that bytecode

    will not be added, removed, or changed between versions of Python https://docs.python.org/3/glossary.html#term-bytecode Bytecodes are not expected to work between different Python VMs, nor to be stable between Python releases
  7. CPython < 3.6 Opcodes without an argument (only 1 byte)

    Opcodes with an argument have 2 bytes argument (3 bytes) 0x53 RETURN_VALUE 0x65 LOAD_NAME 5 ('val') 0x05 0x00 /Include/opcode.h: #define HAVE_ARGUMENT 90 // 0x5a #define HAS_ARG(op) ((op) >= HAVE_ARGUMENT)
  8. CPython 3.6: Wordcode WPython project by Cesare Di Mauro https://bugs.python.org/issue26647

    “ceval: use Wordcode, 16-bit bytecode” Contributed by Demur Rumed, with input and reviews from Serhiy Storchaka and Victor Stinner
  9. CPython 3.6: Wordcode Every instruction has an argument (opcode +

    arg = 2 bytes), but opcodes < HAVE_ARGUMENT (0x5a) ignore it 0x53 RETURN_VALUE 0x65 LOAD_NAME 5 ('val') 0x05 0x00 https://bugs.python.org/msg266417 Faster (27): up to 11% faster Slower (1): the worst slowdown is only 7% Not significant (14)
  10. Ideas for research In CPython < 3.6: When an opcodes

    argument > 65535 In CPython >= 3.6: When an opcodes argument > 255 When an opcodes argument > 65535 When an opcodes argument > 16777215
  11. Let’s test bytecode # comment for profiling import dis def

    a(): a0 = 0 a1 = 1 a2 = 2 a3 = 3 ... a65535 = 65535 a() # comment for profiling dis.dis(a) $ ls -la -rw-r--r-- 1 user users 1223050 May 7 22:29 test_pybc.py $ python2.7 -m compileall test_pybc.py $ mv test_pybc.pyc test_pybc27.pyc $ ls -la -rw-r--r-- 1 user users 1562519 May 7 22:32 test_pybc27.pyc $ python3.5 -m compileall test_pybc.py $ mv __pycache__/test_pybc.cpython-35.pyc test_pybc35.pyc $ ls -la -rw-r--r-- 1 user users 1365898 May 7 22:32 test_pybc35.pyc $ python3.6 -m compileall test_pybc.py $ mv __pycache__/test_pybc.cpython-36.pyc test_pybc36.pyc $ ls -la -rw-r--r-- 1 user users 1496453 May 7 22:32 test_pybc36.pyc
  12. EXTENDED_ARG opcode https://github.com/python/cpython/blob/2.7/Include/opcode.h #define EXTENDED_ARG 145 // 0x91 https://github.com/python/cpython/blob/3.0/Include/opcode.h #define

    EXTENDED_ARG 143 // 0x8f (new value!) https://github.com/python/cpython/blob/3.2/Include/opcode.h #define EXTENDED_ARG 144 // 0x90 (new value!) https://github.com/python/cpython/blob/3.7/Include/opcode.h #define EXTENDED_ARG 144 // 0x90
  13. Eval in CPython 2.7–3.5 https://github.com/python/cpython/blob/3.5/Python/ceval.c#L797 PyObject * PyEval_EvalFrameEx(PyFrameObject *f, int

    throwflag) { ... unsigned char *next_instr; int opcode; /* Current opcode */ int oparg; /* Current opcode argument, if any */ ... for (;;) { /* Extract opcode and argument */ opcode = NEXTOP(); oparg = 0; if (HAS_ARG(opcode)) oparg = NEXTARG(); dispatch_opcode: ... switch (opcode) { ... TARGET(EXTENDED_ARG) { opcode = NEXTOP(); oparg = oparg<<16 | NEXTARG(); goto dispatch_opcode; } ...
  14. Eval in CPython 3.6 https://github.com/python/cpython/blob/3.6/Python/ceval.c#L758 PyObject * _PyEval_EvalFrameDefault(PyFrameObject *f, int

    throwflag) { ... const _Py_CODEUNIT *next_instr; int opcode; /* Current opcode */ int oparg; /* Current opcode argument, if any */ ... for (;;) { /* Extract opcode and argument */ NEXTOPARG(); dispatch_opcode: ... switch (opcode) { ... TARGET(EXTENDED_ARG) { int oldoparg = oparg; NEXTOPARG(); oparg |= oldoparg << 8; goto dispatch_opcode; } ...
  15. CPython 2.7–3.5 5 0 LOAD_CONST 1 (0) 3 STORE_FAST 0

    (a0) ... 260 1530 LOAD_CONST 256 (255) 1533 STORE_FAST 255 (a255) ... 517 3072 LOAD_CONST 513 (512) 3075 STORE_FAST 512 (a512) ... 65539 393204 LOAD_CONST 65535 (65534) 393207 STORE_FAST 65534 (a65534) 65540 393210 EXTENDED_ARG 1 393213 LOAD_CONST 65536 (65535) 393216 STORE_FAST 65535 (a65535) Full bytecode size = 393219 Opcodes number = 131073
  16. CPython 2.7–3.5 393204 LOAD_CONST 65535 (65534) 393207 STORE_FAST 65534 (a65534)

    393210 EXTENDED_ARG 1 393213 LOAD_CONST 65536 (65535) 393216 STORE_FAST 65535 (a65535) arg = (1 << 16) | 0 = 65536 | 0 = 65536 91 0100 64 0000 7d ffff 64 ffff 7d feff TARGET(EXTENDED_ARG) { opcode = NEXTOP(); oparg = oparg << 16 | NEXTARG(); goto dispatch_opcode; }
  17. CPython 3.6 5 0 LOAD_CONST 1 (0) 2 STORE_FAST 0

    (a0) ... 260 1020 EXTENDED_ARG 1 1022 LOAD_CONST 256 (255) 1024 STORE_FAST 255 (a255) ... 517 3074 EXTENDED_ARG 2 3076 LOAD_CONST 513 (512) 3078 EXTENDED_ARG 2 3080 STORE_FAST 512 (a512) ... 65540 523258 EXTENDED_ARG 1 523260 EXTENDED_ARG 256 523262 LOAD_CONST 65536 (65535) 523264 EXTENDED_ARG 255 523266 STORE_FAST 65535 (a65535) Full bytecode size = 523268 Opcodes number = 261634 Starting with line 517 bytecode size in CPython 3.6 is more than in CPython < 3.6 (3074 > 3072)
  18. CPython 3.6 1020 EXTENDED_ARG 1 1022 LOAD_CONST 256 (255) 1024

    STORE_FAST 255 (a255) ... 523258 EXTENDED_ARG 1 523260 EXTENDED_ARG 256 523262 LOAD_CONST 65536 (65535) 523264 EXTENDED_ARG 255 523266 STORE_FAST 65535 (a65535) arg = (1 << 8) | 0 = 256 | 0 = 256 TARGET(EXTENDED_ARG) { int oldoparg = oparg; NEXTOPARG(); // get new oparg oparg = oparg | (oldoparg << 8); goto dispatch_opcode; 90 01 90 00 64 00 90 ff 7d ff arg = (1 << 8) | 0 = 256 | 0 = 256 arg = (256 << 8) | 0 = 65536 | 0 = 65536 arg = (255 << 8) | 255 = 65280 | 255 = 65535 90 01 64 00 7d ff
  19. Results $ time python3.6 test_pybc.py real 0m0.698s (10-20% faster than

    2.7) user 0m0.640s sys 0m0.060s $ time python3.6 test_pybc36.pyc real 0m0.094s (15-30% slower than 2.7) user 0m0.082s sys 0m0.012s $ time python2.7 test_pybc.py real 0m0.773s user 0m0.672s sys 0m0.096s $ time python2.7 test_pybc27.pyc real 0m0.077s user 0m0.068s sys 0m0.012s CPython 2.7 CPython 3.6 131073 opcodes bytecode 393219 bytes test_pybc27.pyc size = 1562519 261634 opcodes (130561 more) bytecode 523268 bytes (130049 more) test_pybc36.pyc size = 1496453
  20. Results $ python2.7 test_pybc_time.py 1.11467504501 $ python3.5 test_pybc_time.py 0.8957379780185875 (20-30%

    faster!!!) $ python3.6 test_pybc_time.py 1.3842481109895743 (20-30% slower) def a(): a0 = 0 ... a65535 = 65535 import timeit print(timeit.timeit(a, number=1000)) $ time python3.5 test_pybc.py real 0m0.955s (20-35% slower than 2.7) user 0m0.892s sys 0m0.064s $ time python3.5 test_pybc.pyc real 0m0.100s (20-35% slower than 2.7) user 0m0.084s sys 0m0.016s But:
  21. Bug in CPython docs >= 3.6 https://docs.python.org/3.6/library/dis.html#opcode-EXTENDED_ARG EXTENDED_ARG(ext) Prefixes any

    opcode which has an argument too big to fit into the default two bytes. ext holds two additional bytes which, taken together with the subsequent opcode’s argument, comprise a four-byte argument, ext being the two most-significant bytes. https://bugs.python.org/issue32625
  22. Marshal PyCodeObject in CPython 2.7 https://github.com/python/cpython/blob/2.7/Python/marshal.c PyCodeObject *co = (PyCodeObject

    *)v; w_byte(TYPE_CODE, p); w_long(co->co_argcount, p); w_long(co->co_nlocals, p); w_long(co->co_stacksize, p); w_long(co->co_flags, p); w_object(co->co_code, p); w_object(co->co_consts, p); w_object(co->co_names, p); w_object(co->co_varnames, p); w_object(co->co_freevars, p); w_object(co->co_cellvars, p); w_object(co->co_filename, p); w_object(co->co_name, p); w_long(co->co_firstlineno, p); w_object(co->co_lnotab, p);
  23. Marshal PyCodeObject in CPython >=3.6 https://github.com/python/cpython/blob/3.6/Python/marshal.c PyCodeObject *co = (PyCodeObject

    *)v; W_TYPE(TYPE_CODE, p); w_long(co->co_argcount, p); w_long(co->co_kwonlyargcount, p); w_long(co->co_nlocals, p); w_long(co->co_stacksize, p); w_long(co->co_flags, p); w_object(co->co_code, p); w_object(co->co_consts, p); w_object(co->co_names, p); w_object(co->co_varnames, p); w_object(co->co_freevars, p); w_object(co->co_cellvars, p); w_object(co->co_filename, p); w_object(co->co_name, p); w_long(co->co_firstlineno, p); w_object(co->co_lnotab, p);
  24. Opcode and oparg read macros CPython < 3.6 https://github.com/python/cpython/blob/3.5/Python/ceval.c#L998 #define

    NEXTOP() (*next_instr++) #define NEXTARG() (next_instr += 2, (next_instr[-1]<<8) + next_instr[-2]) CPython >= 3.6 https://github.com/python/cpython/blob/3.6/Include/code.h#L10 typedef uint16_t _Py_CODEUNIT; #define _Py_OPCODE(word) ((word) >> 8) #define _Py_OPARG(word) ((word) & 255) https://github.com/python/cpython/blob/3.6/Python/ceval.c#L905 #define NEXTOPARG() do { \ _Py_CODEUNIT word = *next_instr; \ opcode = _Py_OPCODE(word); \ oparg = _Py_OPARG(word); \ next_instr++; \ } while (0)