Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Refactoring Code with the Standard Library

Refactoring Code with the Standard Library

What if you could refactor your entire code base, safely and automatically? How much old code could you fix or replace if you didn’t need to worry about updating every reference by hand? I’ll show you how a concrete syntax tree (CST) can help you do just that using only the standard Python library.

Python includes a concrete syntax tree (CST) in the standard library, useful for mass refactoring code bases of all sizes. I’ll walk through the differences between abstract and concrete syntax trees (AST and CST), why a CST is useful for refactoring, and how you can build basic refactoring tools on top of a CST to modify your entire code base quickly and safely. Lastly, I’ll demonstrate what’s possible with these tools, including upgrading code to new interfaces, or wholesale movement of code between modules.

Presented at PyCon Australia 2018 in Sydney: https://youtu.be/9USGh4Uy-xQ


John Reese

August 25, 2018

More Decks by John Reese

Other Decks in Programming


  1. None
  2. Refactoring Code With the Standard Library John Reese Production Engineer,

    Facebook @n7cmdr
  3. • Modify source code • Change names or interfaces •

    Update all references Refactoring
  4. • Consistent style or formatting • Remove code smells •

    Enhance or replace an API • Support new use cases • Remove dead code Why refactor?
  5. None
  6. None
  7. • Usually automated refactoring • Atomic changes to the entire

    codebase • Update API and consumers simultaneously • Ensure no build/tests are broken Code mods
  8. • Modify code as nested objects • Based on Python

    grammar • Semantic context for elements • “Guaranteed” valid syntax Syntax tree refactoring
  9. Python Grammar

  10. • Set of rules • Rules expand to literals or

    rules Backus-Naur Form
  11. Backus-Naur Form

  12. • Slightly modified format • Rules can use (), [],

    *, + • Includes predefined “tokens” Backus-Naur-ish
  13. Backus-Naur-ish Python 3.7 grammar (abridged)

  14. None
  15. None
  16. power

  17. power atom_expr

  18. power atom_expr atom

  19. power atom_expr atom NAME

  20. power atom_expr trailer atom NAME

  21. power atom_expr trailer arglist atom NAME

  22. power atom_expr trailer arglist argument atom NAME

  23. power atom_expr trailer arglist argument atom NAME STRING

  24. Syntax Trees

  25. • Tree structure, nodes and leaves • Decomposed units of

    grammar • Semantic representation of code Abstract Syntax Tree
  26. None
  27. Call Name [] args [] func keywords print id Str

    ‘Hello World’ s
  28. None
  29. • Tree structure, nodes and leaves • Decomposed units of

    syntax and grammar • Literal representation of on-disk code • Whitespace, formatting, comments, etc Concrete Syntax Tree
  30. lib2to3

  31. • Concrete syntax tree • Built for the 2to3 tool

    • Can parse all Python grammars lib2to3
  32. • Part of the standard library • Always up to

    date with new syntax • Contains refactoring framework Why lib2to3?
  33. • Leaf for each distinct token • Node for semantic

    groupings • Nodes contain one or more children • Generic objects, token/symbol type • Collapsed grammar Tree Structure
  34. power atom_expr trailer arglist argument atom NAME STRING

  35. power atom_expr trailer arglist argument atom NAME STRING

  36. None
  37. None
  38. None
  39. None
  40. None
  41. None
  42. Building Code Mods

  43. • Designed for 2to3 tools • Pattern match to find

    elements • In-place transforms to tree Fixers
  44. None
  45. • Search for grammar elements • Can be arbitrarily nested,

    combined • Capture specific nodes or leaves • Include literals or token types Pattern Matching
  46. None
  47. • Called for each match • Add, modify, remove, or

    replace elements • Not restricted to matched elements Transforms
  48. None
  49. None
  50. None
  51. None
  52. None
  53. None
  54. None
  55. None
  56. None
  57. None
  58. None
  59. • Runs fixers on each file • Runs transforms at

    matching nodes • Collects final tree to diff/write • Defaults to loading 2to3 fixers Refactoring Tool
  60. None
  61. None
  62. github.com/jreese/pycon

  63. Safe refactoring for modern Python

  64. • Code mod framework • Built on lib2to3 primitives •

    Fluent API to generate fixers • Optimized for large codebases • MIT Licensed Bowler
  65. • Automatic support for new Python releases • Encourages reuse

    of components • Productionizes common refactoring • Useful as a tool and a library Why Bowler?
  66. • Selectors build a search pattern • Optionally filter elements

    • Modify matched elements • Compose multiple transforms • Generate diffs or interactive results Query pipeline
  67. None
  68. None
  69. None
  70. None
  71. None
  72. None
  73. None
  74. None
  75. None
  76. • Facebook Incubator project • Fluent API is fluid •

    Incomplete set of selectors, filters, transforms • Needs more unit testing Early access
  77. • Less boilerplate • Linter features • Integrations • More

    testing • More contributors! Roadmap
  78. https://pybowler.io

  79. John Reese Production Engineer, Facebook @n7cmdr
 github.com/jreese https://pybowler.io

  80. None