Upgrade to Pro — share decks privately, control downloads, hide ads and more …

From Source to Code: How CPython's Compiler Works

From Source to Code: How CPython's Compiler Works

How the heck does CPython take a blob of bytes you call source code and create another blob of bytes called bytecode which it is able to execute to make the magic of Python programs work? This talk's aim is to provide a conceptual answer to that question. The overall process of tokenizing, parsing, creating an AST, and then finally emitting bytecode will be covered.

If you have no clue what any of those previous words meant, don't worry! This talk will be accessible to people who are not compiler experts. We'll also cover how various parts of the compiler are exposed through Python's standard library so you can play with what you learn afterwards.

This was presented at PyCon CA 2013: http://www.youtube.com/watch?v=R31NRWgoIWM

Brett Cannon

August 10, 2013
Tweet

More Decks by Brett Cannon

Other Decks in Programming

Transcript

  1. From Python to Code How CPython's compiler works Dr. Brett

    Cannon PyCon CA 2013 http://bit.ly/BrettCannon-PyConCA2013 http://bit.ly/brettcannon
  2. echo "x=3+2" | ./python -m tokenize - e 1,0-1,1: NAME

    'x' 1,1-1,2: EQUAL '=' 1,2-1,3: NUMBER '3' 1,3-1,4: PLUS '+' 1,4-1,5: NUMBER '2' 1,5-1,6: NEWLINE '\n' 2,0-2,0: ENDMARKER ''
  3. Use a grammar to define structure pass stmt: simple_stmt |

    compound_stmt simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE small_stmt: (expr_stmt | del_stmt | pass_stmt | flow_stmt | import_stmt | global_stmt | nonlocal_stmt | assert_stmt) del_stmt: 'del' exprlist pass_stmt: 'pass' flow_stmt: break_stmt | continue_stmt | return_stmt | raise_stmt | yield_stmt break_stmt: 'break' ...
  4. Concrete syntax tree for pass (268, stmt (269, simple_stmt (270,

    small_stmt (275, pass_stmt (1, 'pass') NAME ) ), (4, '') NEWLINE ) )
  5. 3+2 (258, (327, (302, (306, (307, (308, (309, (312, (313,

    (314, (315, (arith_expr, (term, (factor, (power, (atom, (NUMBER, '3'))))), (PLUS, '+'), (term, (factor, (power, (atom, (NUMBER, '2'))))) ) )))))))))), (4, ''), (0, ''))
  6. Zephyr ASDL expr = BoolOp(boolop op, expr* values) | BinOp(expr

    left, operator op, expr right) | UnaryOp(unaryop op, expr operand) | Lambda(arguments args, expr body) | IfExp(expr test, expr body, expr orelse) | Dict(expr* keys, expr* values) | Set(expr* elts) | ListComp(expr elt, comprehension* generators)
  7. x = 3 + 2, visually = x + 3

    2 targets value left right BinOp Assign
  8. x = 3 + 2, actually Module(body= [Assign( targets=[Name(id='x', ctx=Store())],

    value=BinOp(left=Num(n=3), op=Add(), right=Num(n=2) ) ) ])
  9. Working with the AST import ast class PEP8(ast.NodeVisitor): def visit_Assign(self,

    node): for target in node.targets: self.visit(target) print(' = ', end='') self.visit(node.value) print() def visit_BinOp(self, node): self.visit(node.left) print(' ', end='') self.visit(node.op) print(' ', end='') self.visit(node.right) def visit_Name(self, node): print(node.id, end='') def visit_Num(self, node): print(node.n, end='') def visit_Add(self, node): print('+', end='') >>> nodes = ast.parse('x=3+2;y=4') >>> PEP8().visit(nodes) x = 3 + 2 y = 4
  10. def func(): x = y + 2 1 0 LOAD_GLOBAL

    0 (y) 3 LOAD_CONST 1 (2) 6 BINARY_ADD 7 STORE_FAST 0 (x) 10 LOAD_CONST 0 (None) 13 RETURN_VALUE
  11. Control Flow Graph • Essentially a graph of bytecode blocks

    where edges are jumps to other blocks • Three step process a. Figure out scoping of each variable b. Compile blocks to bytecode without jumps calculated c. Flatten CFG into a list and calculate jumps
  12. Peepholer • Operates at the bytecode level • Simple transformations

    within a single block only • Bytecode required to either stay the same length or shrinks • Always used