Refactoring Code
With the Standard Library
John Reese
Production Engineer, Facebook
@n7cmdr
github.com/jreese
Slide 3
Slide 3 text
• Modify source code
• Change names or interfaces
• Update all references
Refactoring
Slide 4
Slide 4 text
• Consistent style or formatting
• Remove code smells
• Enhance or replace an API
• Support new use cases
• Remove dead code
Why refactor?
Slide 5
Slide 5 text
No content
Slide 6
Slide 6 text
No content
Slide 7
Slide 7 text
• Usually automated refactoring
• Atomic changes to the entire codebase
• Update API and consumers simultaneously
• Ensure no build/tests are broken
Code mods
Slide 8
Slide 8 text
• Modify code as nested objects
• Based on Python grammar
• Semantic context for elements
• “Guaranteed” valid syntax
Syntax tree refactoring
Slide 9
Slide 9 text
Python Grammar
Slide 10
Slide 10 text
• Set of rules
• Rules expand to literals or rules
Backus-Naur Form
Slide 11
Slide 11 text
Backus-Naur Form
Slide 12
Slide 12 text
• Slightly modified format
• Rules can use (), [], *, +
• Includes predefined “tokens”
Backus-Naur-ish
Slide 13
Slide 13 text
Backus-Naur-ish
Python 3.7 grammar (abridged)
Slide 14
Slide 14 text
No content
Slide 15
Slide 15 text
No content
Slide 16
Slide 16 text
power
Slide 17
Slide 17 text
power
atom_expr
Slide 18
Slide 18 text
power
atom_expr
atom
Slide 19
Slide 19 text
power
atom_expr
atom
NAME
Slide 20
Slide 20 text
power
atom_expr
trailer
atom
NAME
Slide 21
Slide 21 text
power
atom_expr
trailer
arglist
atom
NAME
Slide 22
Slide 22 text
power
atom_expr
trailer
arglist
argument
atom
NAME
Slide 23
Slide 23 text
power
atom_expr
trailer
arglist
argument
atom
NAME
STRING
Slide 24
Slide 24 text
Syntax Trees
Slide 25
Slide 25 text
• Tree structure, nodes and leaves
• Decomposed units of grammar
• Semantic representation of code
Abstract Syntax Tree
Slide 26
Slide 26 text
No content
Slide 27
Slide 27 text
Call
Name []
args
[]
func keywords
print
id
Str
‘Hello World’
s
Slide 28
Slide 28 text
No content
Slide 29
Slide 29 text
• Tree structure, nodes and leaves
• Decomposed units of syntax and grammar
• Literal representation of on-disk code
• Whitespace, formatting, comments, etc
Concrete Syntax Tree
Slide 30
Slide 30 text
lib2to3
Slide 31
Slide 31 text
• Concrete syntax tree
• Built for the 2to3 tool
• Can parse all Python grammars
lib2to3
Slide 32
Slide 32 text
• Part of the standard library
• Always up to date with new syntax
• Contains refactoring framework
Why lib2to3?
Slide 33
Slide 33 text
• Leaf for each distinct token
• Node for semantic groupings
• Nodes contain one or more children
• Generic objects, token/symbol type
• Collapsed grammar
Tree Structure
Slide 34
Slide 34 text
power
atom_expr
trailer
arglist
argument
atom
NAME
STRING
Slide 35
Slide 35 text
power
atom_expr
trailer
arglist
argument
atom
NAME
STRING
Slide 36
Slide 36 text
No content
Slide 37
Slide 37 text
No content
Slide 38
Slide 38 text
No content
Slide 39
Slide 39 text
No content
Slide 40
Slide 40 text
No content
Slide 41
Slide 41 text
No content
Slide 42
Slide 42 text
Building Code Mods
Slide 43
Slide 43 text
• Designed for 2to3 tools
• Pattern match to find elements
• In-place transforms to tree
Fixers
Slide 44
Slide 44 text
No content
Slide 45
Slide 45 text
• Search for grammar elements
• Can be arbitrarily nested, combined
• Capture specific nodes or leaves
• Include literals or token types
Pattern Matching
Slide 46
Slide 46 text
No content
Slide 47
Slide 47 text
• Called for each match
• Add, modify, remove, or replace elements
• Not restricted to matched elements
Transforms
Slide 48
Slide 48 text
No content
Slide 49
Slide 49 text
No content
Slide 50
Slide 50 text
No content
Slide 51
Slide 51 text
No content
Slide 52
Slide 52 text
No content
Slide 53
Slide 53 text
No content
Slide 54
Slide 54 text
No content
Slide 55
Slide 55 text
No content
Slide 56
Slide 56 text
No content
Slide 57
Slide 57 text
No content
Slide 58
Slide 58 text
No content
Slide 59
Slide 59 text
• Runs fixers on each file
• Runs transforms at matching nodes
• Collects final tree to diff/write
• Defaults to loading 2to3 fixers
Refactoring Tool
Slide 60
Slide 60 text
No content
Slide 61
Slide 61 text
No content
Slide 62
Slide 62 text
github.com/jreese/pycon
Slide 63
Slide 63 text
Safe refactoring for modern Python
Slide 64
Slide 64 text
• Code mod framework
• Built on lib2to3 primitives
• Fluent API to generate fixers
• Optimized for large codebases
• MIT Licensed
Bowler
Slide 65
Slide 65 text
• Automatic support for new Python releases
• Encourages reuse of components
• Productionizes common refactoring
• Useful as a tool and a library
Why Bowler?
Slide 66
Slide 66 text
• Selectors build a search pattern
• Optionally filter elements
• Modify matched elements
• Compose multiple transforms
• Generate diffs or interactive results
Query pipeline
Slide 67
Slide 67 text
No content
Slide 68
Slide 68 text
No content
Slide 69
Slide 69 text
No content
Slide 70
Slide 70 text
No content
Slide 71
Slide 71 text
No content
Slide 72
Slide 72 text
No content
Slide 73
Slide 73 text
No content
Slide 74
Slide 74 text
No content
Slide 75
Slide 75 text
No content
Slide 76
Slide 76 text
• Facebook Incubator project
• Fluent API is fluid
• Incomplete set of selectors, filters, transforms
• Needs more unit testing
Early access
Slide 77
Slide 77 text
• Less boilerplate
• Linter features
• Integrations
• More testing
• More contributors!
Roadmap
Slide 78
Slide 78 text
https://pybowler.io
Slide 79
Slide 79 text
John Reese
Production Engineer, Facebook
@n7cmdr
github.com/jreese
https://pybowler.io