Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Architecture of CPython - Part 1

Architecture of CPython - Part 1

Stéphane Wirtel

November 13, 2016
Tweet

More Decks by Stéphane Wirtel

Other Decks in Programming

Transcript

  1. Architecture of CPython the Bricks ! PART 1 by Stephane

    Wirtel PyCon.Canada 2016 - Toronto - 11/13/2016 1 / 94
  2. Example of Bytecode In a restaurant... I am the "Virtual"

    Machine... SAY "Hello"" SET CMD "Fish and Chips" 16 / 94
  3. Example of Bytecode In a restaurant... I am the "Virtual"

    Machine... SAY "Hello"" SET CMD "Fish and Chips" ORDER CMD 17 / 94
  4. Example of Bytecode In a restaurant... I am the "Virtual"

    Machine... SAY "Hello"" SET CMD "Fish and Chips" ORDER CMD PREPARE "Money" 18 / 94
  5. Example of Bytecode In a restaurant... I am the "Virtual"

    Machine... SAY "Hello"" SET CMD "Fish and Chips" ORDER CMD PREPARE "Money" WAIT_MINUTES 5 19 / 94
  6. Example of Bytecode In a restaurant... I am the "Virtual"

    Machine... SAY "Hello"" SET CMD "Fish and Chips" ORDER CMD PREPARE "Money" WAIT_MINUTES 5 RECEIVE CMD 20 / 94
  7. Example of Bytecode In a restaurant... I am the "Virtual"

    Machine... SAY "Hello"" SET CMD "Fish and Chips" ORDER CMD PREPARE "Money" WAIT_MINUTES 5 RECEIVE CMD EAT CMD 21 / 94
  8. Example of Bytecode In a restaurant... I am the "Virtual"

    Machine... SAY "Hello"" SET CMD "Fish and Chips" ORDER CMD PREPARE "Money" WAIT_MINUTES 5 RECEIVE CMD EAT CMD PAY "Restaurant" 22 / 94
  9. Code > Lexer > Parser > AST > Compiler >>>

    expr = 'x = 2 + 2' 26 / 94
  10. Code > Lexer > Parser > AST > Compiler >>>

    expr = 'x = 2 + 2' Lexer >>> import tokenize >>> strIO = io.StringIO(expr).readline >>> tokens = tokenize.generate_tokens(strIO) >>> pp(list(tokens)) [ TokenInfo(type=NAME, string='x'), TokenInfo(type=OP, string='='), TokenInfo(type=NUMBER, string='2'), TokenInfo(type=OP, string='+'), TokenInfo(type=NUMBER, string='2'), TokenInfo(type=ENDMARKER, string=''), ] 27 / 94
  11. Code > Lexer > Parser > AST > Compiler >>>

    expr = 'x = 2 + 2' Parser and AST >>> import ast >>> tree = ast.parse(expr) >>> ast.dump(tree) Module( body=[ Assign( targets=[Name(id='x')], value=BinOp( left=Num(n=2), op=Add(), right=Num(n=2) ) ) ] ) 28 / 94
  12. Code > Lexer > Parser > AST > Compiler >>>

    expr = 'x = 2 + 2' Compiler >>> import dis >>> dis.dis(expr) 1 0 LOAD_CONST 2 (4) 3 STORE_NAME 0 (x) 6 LOAD_CONST 1 (None) 9 RETURN_VALUE Design of the CPython Compiler PEP-0339 29 / 94
  13. Structure of a .pyc file Here is the source code

    print("Hello") will generate this bytecode 1 0 LOAD_NAME 0 (print) 3 LOAD_CONST 0 ('Hello') 6 CALL_FUNCTION 1 (1 positional, 0 keyword pair) 9 POP_TOP 10 LOAD_CONST 1 (None) 13 RETURN_VALUE 32 / 94
  14. Structure of a .pyc file And here is the dump

    of the binary file 00 160d0d0a 1b911558 0f000000 e3000000 |.......X........| 10 00000001 00000000 00020000 00400000 |.............@..| 20 00730e00 00006500 00640000 83010001 |.s....e..d......| 30 64010053 29025a05 48656c6c 6f4e2901 |d..S).Z.HelloN).| 40 da057072 696e74a9 00720200 00007202 |..print..r....r.| 50 000000fa 0764656d 6f2e7079 da083c6d |.....demo.py..<m| 60 6f64756c 653e0100 00007300 000000 |odule>....s....| 6f 33 / 94
  15. Structure of a .pyc file demo.pyc (Python 3.5) [Code] File

    Name: demo.py Object Name: <module> Arg Count: 0 KW Only Arg Count: 0 Locals: 0 Stack Size: 2 Flags: 0x00000040 (CO_NOFREE) [Names] 'print' [Var Names] [Free Vars] [Cell Vars] [Constants] 'Hello' None [Disassembly] 0 LOAD_NAME 0: print 3 LOAD_CONST 0: 'Hello' 6 CALL_FUNCTION 1 9 POP_TOP 10 LOAD_CONST 1: None 13 RETURN_VALUE 34 / 94
  16. Structure of a .pyc file 0 6 12 Magic Number

    Time Stamp Header Size M ar shalling. . . M ar shalling. . . 35 / 94
  17. Structure of a .pyc file 00 160d0d0a 1b911558 0f000000 e3000000

    |.......X........| 10 00000001 00000000 00020000 00400000 |.............@..| 20 00730e00 00006500 00640000 83010001 |.s....e..d......| 30 64010053 29025a05 48656c6c 6f4e2901 |d..S).Z.HelloN).| 40 da057072 696e74a9 00720200 00007202 |..print..r....r.| 50 000000fa 0764656d 6f2e7079 da083c6d |.....demo.py..<m| 60 6f64756c 653e0100 00007300 000000 |odule>....s....| 6f 36 / 94
  18. Structure of a .pyc file 00 160d0d0a 1b911558 0f000000 e3000000

    |.......X........| 10 00000001 00000000 00020000 00400000 |.............@..| 20 00730e00 00006500 00640000 83010001 |.s....e..d......| 30 64010053 29025a05 48656c6c 6f4e2901 |d..S).Z.HelloN).| 40 da057072 696e74a9 00720200 00007202 |..print..r....r.| 50 000000fa 0764656d 6f2e7079 da083c6d |.....demo.py..<m| 60 6f64756c 653e0100 00007300 000000 |odule>....s....| 6f 0x160D0D0A -> Magic Number 0x1B911558 -> Time Stamp 0x0F000000 -> Header Size Marshalling in Blue is the code and the variables 37 / 94
  19. Structure of a .pyc file Magic Number 0x160D0D0A is based

    on the version of Python # Python 2.7a0 62211 (introduce MAP_ADD and SET_ADD) # Python 3.0a5: 3131 (lexical exception stacking, including POP_EXCEPT) # Python 3.1a0: 3151 (optimize conditional branches: # Python 3.2a2 3180 (add DELETE_DEREF) # Python 3.3a4 3230 (revert changes to implicit __class__ closure) # Python 3.4rc2 3310 (alter __qualname__ computation) # Python 3.5b2 3350 (add GET_YIELD_FROM_ITER opcode #24400) # Python 3.5.2 3351 (fix BUILD_MAP_UNPACK_WITH_CALL opcode #27286) # Python 3.6b2 3378 (add BUILD_TUPLE_UNPACK_WITH_CALL #28257) See Imp/importlib/_bootstrap_external.py 39 / 94
  20. Structure of a .pyc file Magic Number 0x160D0D0A is based

    on the version of Python # Python 2.7a0 62211 (introduce MAP_ADD and SET_ADD) # Python 3.0a5: 3131 (lexical exception stacking, including POP_EXCEPT) # Python 3.1a0: 3151 (optimize conditional branches: # Python 3.2a2 3180 (add DELETE_DEREF) # Python 3.3a4 3230 (revert changes to implicit __class__ closure) # Python 3.4rc2 3310 (alter __qualname__ computation) # Python 3.5b2 3350 (add GET_YIELD_FROM_ITER opcode #24400) # Python 3.5.2 3351 (fix BUILD_MAP_UNPACK_WITH_CALL opcode #27286) # Python 3.6b2 3378 (add BUILD_TUPLE_UNPACK_WITH_CALL #28257) >>> import binascii >>> version = (3350).to_bytes(2, 'little') + b'\r\n' >>> binascii.hexlify(version) b'160d0d0a' See Imp/importlib/_bootstrap_external.py 40 / 94
  21. Structure of a .pyc file Time Stamp 0x1B911558 -> Time

    Stamp, Unix modification >>> b = bytearray.fromhex('1B911558') >>> s = struct.unpack('=L', b) >>> time.asctime(time.localtime(s[0])) 'Sun Oct 30 07:20:11 2016' 41 / 94
  22. Structure of a .pyc file Marshalling Specific format to Python

    Contains Python Objects (tuple, list, dict, set, number, string, ...) 44 / 94
  23. Structure of a .pyc file Marshalling Specific format to Python

    Contains Python Objects (tuple, list, dict, set, number, string, ...) For each kind of object, there is 1 byte for the type #define TYPE_NULL '0' #define TYPE_NONE 'N' #define TYPE_FALSE 'F' #define TYPE_TRUE 'T' ... #define TYPE_INT 'i' #define TYPE_FLOAT 'f' #define TYPE_STRING 's' #define TYPE_TUPLE '(' #define TYPE_LIST '[' #define TYPE_CODE 'c' ... #define TYPE_ASCII 'a' #define TYPE_SHORT_ASCII 'z' ... 45 / 94
  24. Structure of a .pyc file Marshalling - TYPE_CODE - 'c'

    #define TYPE_CODE 'c' 0 16 32 Ty p e ArgCount KwOnly ArgCount NumLocals StackSize Flags Size/Code (by tes) Consts (tuple) Names (tuple) VarNames (tuple) FreeVars (tuple) C ellVars (t uple) FileName (by tes) Name (by tes) Table 46 / 94
  25. Structure of a .pyc file Marshalling - TYPE_CODE - 'c'

    00 160d0d0a 1b911558 0f000000 e3000000 10 00000001 00000000 00020000 00400000 20 00730e00 00006500 00640000 83010001 30 64010053 29025a05 48656c6c 6f4e2901 40 da057072 696e74a9 00720200 00007202 50 000000fa 0764656d 6f2e7079 da083c6d 60 6f64756c 653e0100 00007300 000000 6f 47 / 94
  26. Structure of a .pyc file Marshalling - TYPE_TUPLE - '('

    #define TYPE_TUPLE '(' 0 8 16 Typ e S i ze (byte) Ob je c ts (Tup le , S tri ng , ...) 48 / 94
  27. Structure of a .pyc file Marshalling - TYPE_STRING - 's'

    #define TYPE_STRING 's' 0 8 16 Typ e S i ze (i nt32) Content (bytes) Content (bytes) Content (bytes) 49 / 94
  28. Structure of a .pyc file Header <header> <magic_number offset="0" size="4"

    bytes="b'\x16\r\r\n'">3.5.0</magic_number> <time_stamp offset="4" size="4" bytes="b'\xec\xfa&quot;X'">2016-11-09T11:31:08</time_stamp <size offset="8" size="4" bytes="b'\x0f\x00\x00\x00'">15</size> </header> 51 / 94
  29. Structure of a .pyc file Body <code> <arg_count offset="13" size="4"

    bytes="b'\x00\x00\x00\x00'">0</arg_count> <kw_only_arg_count offset="17" size="4" bytes="b'\x00\x00\x00\x00'">0</kw_only_arg_count <num_locals offset="21" size="4" bytes="b'\x00\x00\x00\x00'">0</num_locals> <stack_size offset="25" size="4" bytes="b'\x02\x00\x00\x00'">2</stack_size> <flags offset="29" size="4" bytes="b'@\x00\x00\x00'">64</flags> <code> <original> <string size="14">b'e\x00\x00d\x00\x00\x83\x01\x00\x01d\x01\x00S'</string> </original> <array>[101, 0, 0, 100, 0, 0, 131, 1, 0, 1, 100, 1, 0, 83]</array> <bytecodes> <bytecode byte="101" code="LOAD_NAME" operand="0" arguments="0"/> <bytecode byte="100" code="LOAD_CONST" operand="0" arguments="0"/> <bytecode byte="131" code="CALL_FUNCTION" operand="1" arguments="0"/> <bytecode byte="001" code="POP_TOP"/> <bytecode byte="100" code="LOAD_CONST" operand="1" arguments="0"/> <bytecode byte="083" code="RETURN_VALUE"/> </bytecodes> </code> 52 / 94
  30. Dumpyc Started a new project (https://github.com/matrixise/dumpyc) (Yesterday) Only compatible with

    the bytecode of print('Hello') in Python 3.5 NEED Contributions if you want to help me TODO list: Compatible with Python 2.7, 3.3, 3.4, 3.6 and futures versions Generate a XML, JSON, Dict Library with Clean API GUI (TK ?) github.com/matrixise/dumpyc 53 / 94
  31. Virtual Machine Frame Collection of information and context for a

    chunk of code Created and destroyed on the fly Corresponding to each call of a function Has a code object Include/frameobject.h 58 / 94
  32. Virtual Machine Call Stack >>> def bar(y): ... z =

    y + 3 # <--- (3) ... and the interpreter is here. ... return z ... >>> def foo(): ... a = 1 ... b = 2 ... return a + bar(b) # <--- (2) ... which is returning a call to bar ... ... >>> foo() # <--- (1) We're in the middle of a call to foo ... 3 59 / 94
  33. Bytecode Instructions to follow Intermediate Representation of your Code Used

    by "Virtual" Machine Compact Code coded one "BYTE" 65 / 94
  34. Bytecode Instructions to follow Intermediate Representation of your Code Used

    by "Virtual" Machine Compact Code coded one "BYTE" in 3.6, on 2 BYTES 66 / 94
  35. Bytecode Instructions to follow Intermediate Representation of your Code Used

    by "Virtual" Machine Compact Code coded one "BYTE" in 3.6, on 2 BYTES #define POP_TOP 1 #define BINARY_ADD 23 #define RETURN_VALUE 83 #define HAVE_ARGUMENT 90 #define STORE_NAME 90 #define LOAD_CONST 100 #define LOAD_NAME 101 #define CALL_FUNCTION 131 #define HAS_ARG(op) ((op) >= HAVE_ARGUMENT) See in Include/opcode.h 67 / 94
  36. Bytecode >>> x = 'Hello' >>> y = 'World' >>>

    print(x + ' ' + y) 1 0 LOAD_CONST 0 ('Hello') 3 STORE_NAME 0 (x) 2 6 LOAD_CONST 1 ('World') 9 STORE_NAME 1 (y) 3 12 LOAD_NAME 2 (print) 15 LOAD_NAME 0 (x) 18 LOAD_CONST 2 (' ') 21 BINARY_ADD 22 LOAD_NAME 1 (y) 25 BINARY_ADD 26 CALL_FUNCTION 1 (1 positional, 0 keyword pair) 29 POP_TOP 30 LOAD_CONST 3 (None) 33 RETURN_VALUE 68 / 94
  37. Inside the evaluator We have the following instructions: LOAD_CONST STORE_NAME

    LOAD_NAME BINARY_ADD CALL_FUNCTION POP_TOP RETURN_VALUE Read Python Bytecode Instructions TOS = Top of Stack 69 / 94
  38. LOAD_CONST 1 0 LOAD_CONST 0 ('Hello') 3 STORE_NAME 0 (x)

    LOAD_CONST(consti) : Pushes co_consts[consti] onto the stack. 70 / 94
  39. LOAD_CONST 1 0 LOAD_CONST 0 ('Hello') 3 STORE_NAME 0 (x)

    LOAD_CONST(consti) : Pushes co_consts[consti] onto the stack. TARGET(LOAD_CONST) { PyObject *value = GETITEM(consts, oparg); Py_INCREF(value); PUSH(value); FAST_DISPATCH(); } 71 / 94
  40. STORE_NAME 2 6 LOAD_CONST 1 ('World') 9 STORE_NAME 1 (y)

    STORE_NAME(namei) : Implements name=TOS.namei is the index of the name in the attribute co_names of the code object. 72 / 94
  41. STORE_NAME 2 6 LOAD_CONST 1 ('World') 9 STORE_NAME 1 (y)

    STORE_NAME(namei) : Implements name=TOS.namei is the index of the name in the attribute co_names of the code object. TARGET(STORE_NAME) { PyObject *name = GETITEM(names, oparg); PyObject *v = POP(); PyObject *ns = f->f_locals; int err; if (ns == NULL) { PyErr_Format(PyExc_SystemError, "no locals found when storing %R", name); Py_DECREF(v); goto error; } if (PyDict_CheckExact(ns)) err = PyDict_SetItem(ns, name, v); else err = PyObject_SetItem(ns, name, v); Py_DECREF(v); if (err != 0) goto error; DISPATCH(); } 73 / 94
  42. LOAD_NAME 2 6 LOAD_CONST 1 ('World') 9 STORE_NAME 1 (y)

    3 12 LOAD_NAME 2 (print) 15 LOAD_NAME 0 (x) 18 LOAD_CONST 2 (' ') 21 BINARY_ADD LOAD_NAME(namei) : Pushes the value associated with co_names[namei] on the stack 74 / 94
  43. LOAD_NAME 2 6 LOAD_CONST 1 ('World') 9 STORE_NAME 1 (y)

    3 12 LOAD_NAME 2 (print) 15 LOAD_NAME 0 (x) 18 LOAD_CONST 2 (' ') 21 BINARY_ADD LOAD_NAME(namei) : Pushes the value associated with co_names[namei] on the stack TARGET(LOAD_NAME) { PyObject *name = GETITEM(names, oparg); PyObject *locals = f->f_locals; PyObject *v; if (locals == NULL) { PyErr_Format(PyExc_SystemError, "no locals when loading %R", name); goto error; } if (PyDict_CheckExact(locals)) { v = PyDict_GetItem(locals, name); Py_XINCREF(v); } else { v = PyObject_GetItem(locals, name); ... // if error -> goto error label } 75 / 94
  44. BINARY_ADD 3 12 LOAD_NAME 2 (print) 15 LOAD_NAME 0 (x)

    18 LOAD_CONST 2 (' ') 21 BINARY_ADD 22 LOAD_NAME 1 (y) 25 BINARY_ADD BINARY_ADD : Implements TOS = TOS1 + TOS 76 / 94
  45. BINARY_ADD 3 12 LOAD_NAME 2 (print) 15 LOAD_NAME 0 (x)

    18 LOAD_CONST 2 (' ') 21 BINARY_ADD 22 LOAD_NAME 1 (y) 25 BINARY_ADD BINARY_ADD : Implements TOS = TOS1 + TOS TARGET(BINARY_ADD) { PyObject *right = POP(); PyObject *left = TOP(); PyObject *sum; if (PyUnicode_CheckExact(left) && PyUnicode_CheckExact(right)) { sum = unicode_concatenate(left, right, f, next_instr); } else { sum = PyNumber_Add(left, right); Py_DECREF(left); } Py_DECREF(right); SET_TOP(sum); ... } 77 / 94
  46. CALL_FUNCTION 22 LOAD_NAME 1 (y) 25 BINARY_ADD 26 CALL_FUNCTION 1

    (1 positional, 0 keyword pair) 29 POP_TOP CALL_FUNCTION(argc) : Call a function. 78 / 94
  47. CALL_FUNCTION 22 LOAD_NAME 1 (y) 25 BINARY_ADD 26 CALL_FUNCTION 1

    (1 positional, 0 keyword pair) 29 POP_TOP CALL_FUNCTION(argc) : Call a function. TARGET(CALL_FUNCTION) { PyObject **sp, *res; PCALL(PCALL_ALL); sp = stack_pointer; res = call_function(&sp, oparg, NULL); stack_pointer = sp; PUSH(res); if (res == NULL) { goto error; } DISPATCH(); } 79 / 94
  48. POP _TOP 22 LOAD_NAME 1 (y) 25 BINARY_ADD 26 CALL_FUNCTION

    1 (1 positional, 0 keyword pair) 29 POP_TOP POP_TOP: Remove the top-of-stack item. 80 / 94
  49. POP _TOP 22 LOAD_NAME 1 (y) 25 BINARY_ADD 26 CALL_FUNCTION

    1 (1 positional, 0 keyword pair) 29 POP_TOP POP_TOP: Remove the top-of-stack item. TARGET(POP_TOP) { PyObject *value = POP(); Py_DECREF(value); FAST_DISPATCH(); } 81 / 94
  50. RETURN_VALUE 29 POP_TOP 30 LOAD_CONST 3 (None) 33 RETURN_VALUE RETURN_VALUE:

    Returns the TOS to the caller of the function 82 / 94
  51. RETURN_VALUE 29 POP_TOP 30 LOAD_CONST 3 (None) 33 RETURN_VALUE RETURN_VALUE:

    Returns the TOS to the caller of the function TARGET(RETURN_VALUE) { retval = POP(); why = WHY_RETURN; goto fast_block_end; } 83 / 94
  52. RETURN_VALUE 29 POP_TOP 30 LOAD_CONST 3 (None) 33 RETURN_VALUE RETURN_VALUE:

    Returns the TOS to the caller of the function TARGET(RETURN_VALUE) { retval = POP(); why = WHY_RETURN; goto fast_block_end; } why can be: enum why_code { WHY_NOT = 0x0001, /* No error */ WHY_EXCEPTION = 0x0002, /* Exception occurred */ WHY_RETURN = 0x0008, /* 'return' statement */ WHY_BREAK = 0x0010, /* 'break' statement */ WHY_CONTINUE = 0x0020, /* 'continue' statement */ WHY_YIELD = 0x0040, /* 'yield' operator */ WHY_SILENCED = 0x0080 /* Exception silenced by 'with' */ }; 84 / 94
  53. Python/ceval.c 792 PyObject * 793 _PyEval_EvalFrameDefault(PyFrameObject *f, int throwflag) 794

    { ... 1824 #ifdef CASE_TOO_BIG 1825 default: switch (opcode) { 1826 #endif ... 3693 } Link to the code of ceval.c 86 / 94
  54. Python/ceval.c 792 PyObject * 793 _PyEval_EvalFrameDefault(PyFrameObject *f, int throwflag) 794

    { ... 1824 #ifdef CASE_TOO_BIG 1825 default: switch (opcode) { 1826 #endif ... 3693 } Link to the code of ceval.c 2900 lines of CODE 87 / 94
  55. __ltrace__ feature Python 3.7.0a0 (default:897fe8fa14b5+, Oct 15 2016, 09:56:54) >>>

    __ltrace__ = 'an object' >>> def foo(bar): return bar + 1 ... 0: LOAD_CONST, 0 2: LOAD_CONST, 1 4: MAKE_FUNCTION, 0 6: STORE_NAME, 0 8: LOAD_CONST, 2 10: RETURN_VALUE >>> print(foo(2)) 0: LOAD_NAME, 0 2: LOAD_NAME, 1 4: LOAD_CONST, 0 6: CALL_FUNCTION, 1 0: LOAD_FAST, 0 2: LOAD_CONST, 1 4: BINARY_ADD 6: RETURN_VALUE 8: CALL_FUNCTION, 1 3 10: PRINT_EXPR 12: LOAD_CONST, 1 14: RETURN_VALUE >>> 89 / 94
  56. frenchify Python depuis os importe path depuis pprint importe pprint

    comme pp pour i dans range(5): print(i) si i < 3: print("i < 3") ssi i < 7: print("i < 7") classe Utilisateur: déf __init__(moimeme, nom, courriel): moimeme.nom = nom moimeme.courriel = courriel utilisateur = Utilisateur('stephane', '[email protected]') print("nom : ", utilisateur.nom) print("courriel : ", utilisateur.courriel) 90 / 94