Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Joe Gordon - Syntax Trees and Python - Automated Code Transformations

Joe Gordon - Syntax Trees and Python - Automated Code Transformations

Manually updating a million line code base is tedious. Thankfully syntax trees provide a safe and quick way to automatically apply repetitive transformations. Leveraging syntax tree based tooling (based on lib2to3), has been a critical component of Pinterest's Python 3 upgrade strategy, and saved us countless hours of work. Learn how syntax trees work, how they are used to transform code, and how you can quickly write your own transformations.

https://us.pycon.org/2019/schedule/presentation/205/

PyCon 2019

May 04, 2019
Tweet

More Decks by PyCon 2019

Other Decks in Programming

Transcript

  1. 2 © 2019 Pinterest. All rights reserved. Joe Gordon, Site

    Reliability Engineer Syntax Trees and Python - Automated Code Transformations
  2. 4 © 2019 Pinterest. All rights reserved. Our mission to

    create a life the inspiration everyone they love. To bring
  3. 6 © 2019 Pinterest. All rights reserved. Python at Pinterest

    • 250 Million monthly active users • Used for every request • Over 2.6 million lines along with 600,000 lines of comments
  4. 7 © 2019 Pinterest. All rights reserved. Python 3 Manually

    porting 2.6 million lines of Python 2 to support Python 3 is tedious and would take too long. Safe Transformation shouldn’t introduce new issues Quick Minimal developer time required to apply. Generalization Refactoring anything at scale has these requirements Problem Statement Refactoring at scale
  5. 9 © 2019 Pinterest. All rights reserved. Automated Code Transformations

    • Source → Syntax Tree → transform (on tree) → Source • Apply a series of fixers to transform source code • Safely automate tedious tasks
  6. 10 © 2019 Pinterest. All rights reserved. Automated Code Transformations

    Applications • Applying style guides • Porting code to Python 3 • Refactoring code ◦ Removing a dependency ◦ Moving to a new API At Pinterest Porting code to Python 3
  7. Automated Code Transformations • Go ◦ gofmt ◦ go tool

    fix • Javascript ◦ https://babeljs.io • C/C++/C# ◦ clang-format • ...
  8. 14 © 2019 Pinterest. All rights reserved. Regular Expressions •

    Quick - for simple cases • Unsafe ◦ `__author__ = ‘bob’` ◦ Comments ◦ docstrings
  9. 18 © 2019 Pinterest. All rights reserved. Abstract Syntax Trees

    Linting https://github.com/jparise/flake8-author/blob/master/flake8_author.py#L71
  10. 19 © 2019 Pinterest. All rights reserved. Syntax Trees •

    Safer • Multi line transformations • Can get complex quickly Automated Code Transformation
  11. 20 © 2019 Pinterest. All rights reserved. Syntax Trees “This

    is a very concrete parse tree; we need to keep every token and even the comments and whitespace between tokens.” https://github.com/python/cpython/blob/c57e6e2e52d5d8b4005753bed789d99ebe407fb6/Lib/lib2to3/pytree.py Solves recreating original code from syntax tree Concrete vs Abstract
  12. 21 © 2019 Pinterest. All rights reserved. lib2to3 • Concrete

    syntax tree • Added in Python 2.6 • Bundled with fixers for porting code to Python 3 ◦ Example: Except X,T to except X as T • Preserves formatting information ◦ node.prefix ◦ node.get_suffix() • Track if node was changed >>> node Leaf(22, '=') >>> node.get_suffix() ' ' Syntax Trees ast • Abstract syntax tree • Added in Python 2.6 • Good for static code analysis Python libraries
  13. 23 © 2019 Pinterest. All rights reserved. Using lib2to3 •

    Automated Python 2 to 3 code translation • Concrete Syntax Tree • Complex interface • Powerful and safe • Useful framework around fixers Reference: http://python3porting.com/fixers.html https://docs.python.org/2/library/2to3.html
  14. 28 © 2019 Pinterest. All rights reserved. Lib2to3 based tools

    • Python-future • python-modernize • Black • Bowler
  15. 29 © 2019 Pinterest. All rights reserved. python-future • Compatibility

    layer to concurrently support Py2 and Py3 • Py3 idioms • Uses lib2to3 python-future.org
  16. 30 © 2019 Pinterest. All rights reserved. python-modernize • Converts

    Py2 code into a common subset of Py2 and Py3 • Uses six and lib2to3 • Futurize converts Py2 into (almost) standard Py3 code python-modernize.readthedocs.io
  17. 32 © 2019 Pinterest. All rights reserved. bowler • Requires

    Py3.6 can be run against Py2 • Lib2to3 based • Simple to execute: bowler run ... pybowler.io
  18. 36 © 2019 Pinterest. All rights reserved. black • Requires

    Py3.6 can be run against Py2 • Lib2to3 based • Validates CST transformation with AST black.readthedocs.io
  19. 38 © 2019 Pinterest. All rights reserved. Conclusion • Syntax

    trees make code transformations quick and safe • Saved countless hours of tedious labor • Complex edge cases are still complex