Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Joe Gordon - Syntax Trees and Python - Automated Code Transformations

Joe Gordon - Syntax Trees and Python - Automated Code Transformations

Manually updating a million line code base is tedious. Thankfully syntax trees provide a safe and quick way to automatically apply repetitive transformations. Leveraging syntax tree based tooling (based on lib2to3), has been a critical component of Pinterest's Python 3 upgrade strategy, and saved us countless hours of work. Learn how syntax trees work, how they are used to transform code, and how you can quickly write your own transformations.

https://us.pycon.org/2019/schedule/presentation/205/

53b37e14a09c5a718a39fda61fe1b8e5?s=128

PyCon 2019

May 04, 2019
Tweet

Transcript

  1. None
  2. 2 © 2019 Pinterest. All rights reserved. Joe Gordon, Site

    Reliability Engineer Syntax Trees and Python - Automated Code Transformations
  3. 3 © 2019 Pinterest. All rights reserved. Introduction Introduction 1

    2 3 4
  4. 4 © 2019 Pinterest. All rights reserved. Our mission to

    create a life the inspiration everyone they love. To bring
  5. 5 © 2019 Pinterest. All rights reserved.

  6. 6 © 2019 Pinterest. All rights reserved. Python at Pinterest

    • 250 Million monthly active users • Used for every request • Over 2.6 million lines along with 600,000 lines of comments
  7. 7 © 2019 Pinterest. All rights reserved. Python 3 Manually

    porting 2.6 million lines of Python 2 to support Python 3 is tedious and would take too long. Safe Transformation shouldn’t introduce new issues Quick Minimal developer time required to apply. Generalization Refactoring anything at scale has these requirements Problem Statement Refactoring at scale
  8. 8 © 2019 Pinterest. All rights reserved. Theory Theory 1

    2 3 4
  9. 9 © 2019 Pinterest. All rights reserved. Automated Code Transformations

    • Source → Syntax Tree → transform (on tree) → Source • Apply a series of fixers to transform source code • Safely automate tedious tasks
  10. 10 © 2019 Pinterest. All rights reserved. Automated Code Transformations

    Applications • Applying style guides • Porting code to Python 3 • Refactoring code ◦ Removing a dependency ◦ Moving to a new API At Pinterest Porting code to Python 3
  11. Automated Code Transformations • Go ◦ gofmt ◦ go tool

    fix • Javascript ◦ https://babeljs.io • C/C++/C# ◦ clang-format • ...
  12. 12 © 2019 Pinterest. All rights reserved. Work How They

    Automated Code Transformations
  13. 13 © 2019 Pinterest. All rights reserved. Regular Expressions

  14. 14 © 2019 Pinterest. All rights reserved. Regular Expressions •

    Quick - for simple cases • Unsafe ◦ `__author__ = ‘bob’` ◦ Comments ◦ docstrings
  15. 15 © 2019 Pinterest. All rights reserved. Abstract Syntax Trees

    Rendered with show_ast
  16. 16 © 2019 Pinterest. All rights reserved. Abstract Syntax Trees

    Rendered with show_ast
  17. 17 © 2019 Pinterest. All rights reserved. Abstract Syntax Trees

    Linting
  18. 18 © 2019 Pinterest. All rights reserved. Abstract Syntax Trees

    Linting https://github.com/jparise/flake8-author/blob/master/flake8_author.py#L71
  19. 19 © 2019 Pinterest. All rights reserved. Syntax Trees •

    Safer • Multi line transformations • Can get complex quickly Automated Code Transformation
  20. 20 © 2019 Pinterest. All rights reserved. Syntax Trees “This

    is a very concrete parse tree; we need to keep every token and even the comments and whitespace between tokens.” https://github.com/python/cpython/blob/c57e6e2e52d5d8b4005753bed789d99ebe407fb6/Lib/lib2to3/pytree.py Solves recreating original code from syntax tree Concrete vs Abstract
  21. 21 © 2019 Pinterest. All rights reserved. lib2to3 • Concrete

    syntax tree • Added in Python 2.6 • Bundled with fixers for porting code to Python 3 ◦ Example: Except X,T to except X as T • Preserves formatting information ◦ node.prefix ◦ node.get_suffix() • Track if node was changed >>> node Leaf(22, '=') >>> node.get_suffix() ' ' Syntax Trees ast • Abstract syntax tree • Added in Python 2.6 • Good for static code analysis Python libraries
  22. 22 © 2019 Pinterest. All rights reserved. Tooling Tooling 1

    2 3 4
  23. 23 © 2019 Pinterest. All rights reserved. Using lib2to3 •

    Automated Python 2 to 3 code translation • Concrete Syntax Tree • Complex interface • Powerful and safe • Useful framework around fixers Reference: http://python3porting.com/fixers.html https://docs.python.org/2/library/2to3.html
  24. 24 © 2019 Pinterest. All rights reserved. Using lib2to3 https://github.com/python/cpython/blob/c57e6e2e52d5d8b4005753bed789d99ebe407fb6/Lib/lib2to3/fixes/fix_numliterals.py

  25. 25 © 2019 Pinterest. All rights reserved. Using lib2to3 https://github.com/python/cpython/blob/c57e6e2e52d5d8b4005753bed789d99ebe407fb6/Lib/lib2to3/fixes/fix_asserts.py

  26. 26 © 2019 Pinterest. All rights reserved. Using lib2to3 https://github.com/python/cpython/blob/c57e6e2e52d5d8b4005753bed789d99ebe407fb6/Lib/lib2to3/fixes/fix_throw.py

  27. 27 © 2019 Pinterest. All rights reserved. Using lib2to3 Input

    Output Fixer Runner
  28. 28 © 2019 Pinterest. All rights reserved. Lib2to3 based tools

    • Python-future • python-modernize • Black • Bowler
  29. 29 © 2019 Pinterest. All rights reserved. python-future • Compatibility

    layer to concurrently support Py2 and Py3 • Py3 idioms • Uses lib2to3 python-future.org
  30. 30 © 2019 Pinterest. All rights reserved. python-modernize • Converts

    Py2 code into a common subset of Py2 and Py3 • Uses six and lib2to3 • Futurize converts Py2 into (almost) standard Py3 code python-modernize.readthedocs.io
  31. 31 © 2019 Pinterest. All rights reserved. Modernize vs futurize

    modernize futurize
  32. 32 © 2019 Pinterest. All rights reserved. bowler • Requires

    Py3.6 can be run against Py2 • Lib2to3 based • Simple to execute: bowler run ... pybowler.io
  33. 33 © 2019 Pinterest. All rights reserved. bowler pybowler.io

  34. 34 © 2019 Pinterest. All rights reserved. bowler

  35. 35 © 2019 Pinterest. All rights reserved. Bowler vs lib2to3

  36. 36 © 2019 Pinterest. All rights reserved. black • Requires

    Py3.6 can be run against Py2 • Lib2to3 based • Validates CST transformation with AST black.readthedocs.io
  37. 37 © 2019 Pinterest. All rights reserved. Conclusion Conclusion 1

    2 3 4
  38. 38 © 2019 Pinterest. All rights reserved. Conclusion • Syntax

    trees make code transformations quick and safe • Saved countless hours of tedious labor • Complex edge cases are still complex
  39. 39 © 2019 Pinterest. All rights reserved.