Slide 1

Slide 1 text

From Python to Code How CPython's compiler works Dr. Brett Cannon PyCon CA 2013 http://bit.ly/BrettCannon-PyConCA2013 http://bit.ly/brettcannon

Slide 2

Slide 2 text

Based on "Design of CPython’s Compiler" http://docs.python.org/devguide/compiler.html

Slide 3

Slide 3 text

The Five basic steps of compilation Decoding Tokenizing Parsing AST Compiling Front-end Back-end

Slide 4

Slide 4 text

Decoding Bytes to text

Slide 5

Slide 5 text

PEP 263 re.compile("coding[:=]\s*([-\w.]+)")

Slide 6

Slide 6 text

PEP 3120 Using UTF-8 as the Default Source Encoding

Slide 7

Slide 7 text

PEP 3131 Supporting Non-ASCII Identifiers

Slide 8

Slide 8 text

Where in the Stdlib ● importlib.util.decode_source(source_bytes) ● tokenize.detect_encoding(readline)

Slide 9

Slide 9 text

Tokenizing Text to "words"

Slide 10

Slide 10 text

x=3+2 "x" "=" "3" "+" "2"

Slide 11

Slide 11 text

echo "x=3+2" | ./python -m tokenize - e 1,0-1,1: NAME 'x' 1,1-1,2: EQUAL '=' 1,2-1,3: NUMBER '3' 1,3-1,4: PLUS '+' 1,4-1,5: NUMBER '2' 1,5-1,6: NEWLINE '\n' 2,0-2,0: ENDMARKER ''

Slide 12

Slide 12 text

Where in the Stdlib keyword tokenize

Slide 13

Slide 13 text

Parsing "Words" to "sentence structure"

Slide 14

Slide 14 text

Use a grammar to define structure pass stmt: simple_stmt | compound_stmt simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE small_stmt: (expr_stmt | del_stmt | pass_stmt | flow_stmt | import_stmt | global_stmt | nonlocal_stmt | assert_stmt) del_stmt: 'del' exprlist pass_stmt: 'pass' flow_stmt: break_stmt | continue_stmt | return_stmt | raise_stmt | yield_stmt break_stmt: 'break' ...

Slide 15

Slide 15 text

Concrete syntax tree for pass (268, stmt (269, simple_stmt (270, small_stmt (275, pass_stmt (1, 'pass') NAME ) ), (4, '') NEWLINE ) )

Slide 16

Slide 16 text

3+2 (258, (327, (302, (306, (307, (308, (309, (312, (313, (314, (315, (arith_expr, (term, (factor, (power, (atom, (NUMBER, '3'))))), (PLUS, '+'), (term, (factor, (power, (atom, (NUMBER, '2'))))) ) )))))))))), (4, ''), (0, ''))

Slide 17

Slide 17 text

LL(1) Parser

Slide 18

Slide 18 text

Where in the Stdlib parser symbol token

Slide 19

Slide 19 text

Abstract Syntax Tree "Sentence structure" to semantics

Slide 20

Slide 20 text

Zephyr ASDL expr = BoolOp(boolop op, expr* values) | BinOp(expr left, operator op, expr right) | UnaryOp(unaryop op, expr operand) | Lambda(arguments args, expr body) | IfExp(expr test, expr body, expr orelse) | Dict(expr* keys, expr* values) | Set(expr* elts) | ListComp(expr elt, comprehension* generators)

Slide 21

Slide 21 text

x = 3 + 2, visually = x + 3 2 targets value left right BinOp Assign

Slide 22

Slide 22 text

x = 3 + 2, actually Module(body= [Assign( targets=[Name(id='x', ctx=Store())], value=BinOp(left=Num(n=3), op=Add(), right=Num(n=2) ) ) ])

Slide 23

Slide 23 text

Where in the stdlib ast

Slide 24

Slide 24 text

Working with the AST import ast class PEP8(ast.NodeVisitor): def visit_Assign(self, node): for target in node.targets: self.visit(target) print(' = ', end='') self.visit(node.value) print() def visit_BinOp(self, node): self.visit(node.left) print(' ', end='') self.visit(node.op) print(' ', end='') self.visit(node.right) def visit_Name(self, node): print(node.id, end='') def visit_Num(self, node): print(node.n, end='') def visit_Add(self, node): print('+', end='') >>> nodes = ast.parse('x=3+2;y=4') >>> PEP8().visit(nodes) x = 3 + 2 y = 4

Slide 25

Slide 25 text

Compiling Semantics to bytecode

Slide 26

Slide 26 text

Bytecode ● CPython implementation detail ● Stack-based ● 101 instructions

Slide 27

Slide 27 text

def func(): x = y + 2 1 0 LOAD_GLOBAL 0 (y) 3 LOAD_CONST 1 (2) 6 BINARY_ADD 7 STORE_FAST 0 (x) 10 LOAD_CONST 0 (None) 13 RETURN_VALUE

Slide 28

Slide 28 text

Control Flow Graph ● Essentially a graph of bytecode blocks where edges are jumps to other blocks ● Three step process a. Figure out scoping of each variable b. Compile blocks to bytecode without jumps calculated c. Flatten CFG into a list and calculate jumps

Slide 29

Slide 29 text

Peepholer ● Operates at the bytecode level ● Simple transformations within a single block only ● Bytecode required to either stay the same length or shrinks ● Always used

Slide 30

Slide 30 text

Where in the stdlib compile() dis symtable

Slide 31

Slide 31 text

Recap Decoding Tokenizing Parsing AST Compiling Front-end Back-end