Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Refactoring Code with the Standard Library

Refactoring Code with the Standard Library

What if you could refactor your entire code base, safely and automatically? How much old code could you fix or replace if you didn’t need to worry about updating every reference by hand? I’ll show you how a concrete syntax tree (CST) can help you do just that using only the standard Python library.

Python includes a concrete syntax tree (CST) in the standard library, useful for mass refactoring code bases of all sizes. I’ll walk through the differences between abstract and concrete syntax trees (AST and CST), why a CST is useful for refactoring, and how you can build basic refactoring tools on top of a CST to modify your entire code base quickly and safely. Lastly, I’ll demonstrate what’s possible with these tools, including upgrading code to new interfaces, or wholesale movement of code between modules.

Presented at PyCon Australia 2018 in Sydney: https://youtu.be/9USGh4Uy-xQ

John Reese

August 25, 2018
Tweet

More Decks by John Reese

Other Decks in Programming

Transcript

  1. • Consistent style or formatting • Remove code smells •

    Enhance or replace an API • Support new use cases • Remove dead code Why refactor?
  2. • Usually automated refactoring • Atomic changes to the entire

    codebase • Update API and consumers simultaneously • Ensure no build/tests are broken Code mods
  3. • Modify code as nested objects • Based on Python

    grammar • Semantic context for elements • “Guaranteed” valid syntax Syntax tree refactoring
  4. • Slightly modified format • Rules can use (), [],

    *, + • Includes predefined “tokens” Backus-Naur-ish
  5. • Tree structure, nodes and leaves • Decomposed units of

    grammar • Semantic representation of code Abstract Syntax Tree
  6. • Tree structure, nodes and leaves • Decomposed units of

    syntax and grammar • Literal representation of on-disk code • Whitespace, formatting, comments, etc Concrete Syntax Tree
  7. • Concrete syntax tree • Built for the 2to3 tool

    • Can parse all Python grammars lib2to3
  8. • Part of the standard library • Always up to

    date with new syntax • Contains refactoring framework Why lib2to3?
  9. • Leaf for each distinct token • Node for semantic

    groupings • Nodes contain one or more children • Generic objects, token/symbol type • Collapsed grammar Tree Structure
  10. • Designed for 2to3 tools • Pattern match to find

    elements • In-place transforms to tree Fixers
  11. • Search for grammar elements • Can be arbitrarily nested,

    combined • Capture specific nodes or leaves • Include literals or token types Pattern Matching
  12. • Called for each match • Add, modify, remove, or

    replace elements • Not restricted to matched elements Transforms
  13. • Runs fixers on each file • Runs transforms at

    matching nodes • Collects final tree to diff/write • Defaults to loading 2to3 fixers Refactoring Tool
  14. • Code mod framework • Built on lib2to3 primitives •

    Fluent API to generate fixers • Optimized for large codebases • MIT Licensed Bowler
  15. • Automatic support for new Python releases • Encourages reuse

    of components • Productionizes common refactoring • Useful as a tool and a library Why Bowler?
  16. • Selectors build a search pattern • Optionally filter elements

    • Modify matched elements • Compose multiple transforms • Generate diffs or interactive results Query pipeline
  17. • Facebook Incubator project • Fluent API is fluid •

    Incomplete set of selectors, filters, transforms • Needs more unit testing Early access