Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Joe Gordon - Syntax Trees and Python - Automated Code Transformations

Joe Gordon - Syntax Trees and Python - Automated Code Transformations

Manually updating a million line code base is tedious. Thankfully syntax trees provide a safe and quick way to automatically apply repetitive transformations. Leveraging syntax tree based tooling (based on lib2to3), has been a critical component of Pinterest's Python 3 upgrade strategy, and saved us countless hours of work. Learn how syntax trees work, how they are used to transform code, and how you can quickly write your own transformations.

https://us.pycon.org/2019/schedule/presentation/205/

PyCon 2019

May 04, 2019
Tweet

More Decks by PyCon 2019

Other Decks in Programming

Transcript

  1. View Slide

  2. 2
    © 2019 Pinterest. All rights reserved.
    Joe Gordon, Site Reliability Engineer
    Syntax Trees and Python -
    Automated Code
    Transformations

    View Slide

  3. 3
    © 2019 Pinterest. All rights reserved.
    Introduction
    Introduction 1 2 3 4

    View Slide

  4. 4
    © 2019 Pinterest. All rights reserved.
    Our mission
    to create a life
    the inspiration
    everyone
    they love.
    To bring

    View Slide

  5. 5
    © 2019 Pinterest. All rights reserved.

    View Slide

  6. 6
    © 2019 Pinterest. All rights reserved.
    Python at Pinterest
    ● 250 Million monthly active users
    ● Used for every request
    ● Over 2.6 million lines along with 600,000 lines of
    comments

    View Slide

  7. 7
    © 2019 Pinterest. All rights reserved.
    Python 3
    Manually porting 2.6 million lines of Python 2 to support
    Python 3 is tedious and would take too long.
    Safe
    Transformation shouldn’t introduce new issues
    Quick
    Minimal developer time required to apply.
    Generalization
    Refactoring anything at scale has these requirements
    Problem Statement
    Refactoring at scale

    View Slide

  8. 8
    © 2019 Pinterest. All rights reserved.
    Theory
    Theory 1 2 3 4

    View Slide

  9. 9
    © 2019 Pinterest. All rights reserved.
    Automated Code
    Transformations
    ● Source → Syntax Tree → transform (on tree) → Source
    ● Apply a series of fixers to transform source code
    ● Safely automate tedious tasks

    View Slide

  10. 10
    © 2019 Pinterest. All rights reserved.
    Automated Code
    Transformations
    Applications
    ● Applying style guides
    ● Porting code to Python 3
    ● Refactoring code
    ○ Removing a dependency
    ○ Moving to a new API
    At Pinterest
    Porting code to Python 3

    View Slide

  11. Automated Code Transformations
    ● Go
    ○ gofmt
    ○ go tool fix
    ● Javascript
    ○ https://babeljs.io
    ● C/C++/C#
    ○ clang-format
    ● ...

    View Slide

  12. 12
    © 2019 Pinterest. All rights reserved.
    Work
    How They
    Automated Code Transformations

    View Slide

  13. 13
    © 2019 Pinterest. All rights reserved.
    Regular Expressions

    View Slide

  14. 14
    © 2019 Pinterest. All rights reserved.
    Regular Expressions
    ● Quick - for simple cases
    ● Unsafe
    ○ `__author__ = ‘bob’`
    ○ Comments
    ○ docstrings

    View Slide

  15. 15
    © 2019 Pinterest. All rights reserved.
    Abstract Syntax Trees
    Rendered with show_ast

    View Slide

  16. 16
    © 2019 Pinterest. All rights reserved.
    Abstract Syntax Trees
    Rendered with show_ast

    View Slide

  17. 17
    © 2019 Pinterest. All rights reserved.
    Abstract Syntax Trees
    Linting

    View Slide

  18. 18
    © 2019 Pinterest. All rights reserved.
    Abstract Syntax Trees
    Linting
    https://github.com/jparise/flake8-author/blob/master/flake8_author.py#L71

    View Slide

  19. 19
    © 2019 Pinterest. All rights reserved.
    Syntax Trees
    ● Safer
    ● Multi line transformations
    ● Can get complex quickly
    Automated Code Transformation

    View Slide

  20. 20
    © 2019 Pinterest. All rights reserved.
    Syntax Trees
    “This is a very concrete parse tree; we need to keep every
    token and even the comments and whitespace between
    tokens.”
    https://github.com/python/cpython/blob/c57e6e2e52d5d8b4005753bed789d99ebe407fb6/Lib/lib2to3/pytree.py
    Solves recreating original code from syntax tree
    Concrete vs Abstract

    View Slide

  21. 21
    © 2019 Pinterest. All rights reserved.
    lib2to3
    ● Concrete syntax tree
    ● Added in Python 2.6
    ● Bundled with fixers for porting code to Python 3
    ○ Example: Except X,T to except X as T
    ● Preserves formatting information
    ○ node.prefix
    ○ node.get_suffix()
    ● Track if node was changed
    >>> node
    Leaf(22, '=')
    >>> node.get_suffix()
    ' '
    Syntax Trees
    ast
    ● Abstract syntax tree
    ● Added in Python 2.6
    ● Good for static code analysis
    Python libraries

    View Slide

  22. 22
    © 2019 Pinterest. All rights reserved.
    Tooling
    Tooling 1 2 3 4

    View Slide

  23. 23
    © 2019 Pinterest. All rights reserved.
    Using lib2to3
    ● Automated Python 2 to 3 code translation
    ● Concrete Syntax Tree
    ● Complex interface
    ● Powerful and safe
    ● Useful framework around fixers
    Reference: http://python3porting.com/fixers.html
    https://docs.python.org/2/library/2to3.html

    View Slide

  24. 24
    © 2019 Pinterest. All rights reserved.
    Using lib2to3
    https://github.com/python/cpython/blob/c57e6e2e52d5d8b4005753bed789d99ebe407fb6/Lib/lib2to3/fixes/fix_numliterals.py

    View Slide

  25. 25
    © 2019 Pinterest. All rights reserved.
    Using lib2to3
    https://github.com/python/cpython/blob/c57e6e2e52d5d8b4005753bed789d99ebe407fb6/Lib/lib2to3/fixes/fix_asserts.py

    View Slide

  26. 26
    © 2019 Pinterest. All rights reserved.
    Using lib2to3
    https://github.com/python/cpython/blob/c57e6e2e52d5d8b4005753bed789d99ebe407fb6/Lib/lib2to3/fixes/fix_throw.py

    View Slide

  27. 27
    © 2019 Pinterest. All rights reserved.
    Using lib2to3
    Input
    Output
    Fixer
    Runner

    View Slide

  28. 28
    © 2019 Pinterest. All rights reserved.
    Lib2to3 based tools
    ● Python-future
    ● python-modernize
    ● Black
    ● Bowler

    View Slide

  29. 29
    © 2019 Pinterest. All rights reserved.
    python-future
    ● Compatibility layer to concurrently support Py2 and Py3
    ● Py3 idioms
    ● Uses lib2to3
    python-future.org

    View Slide

  30. 30
    © 2019 Pinterest. All rights reserved.
    python-modernize
    ● Converts Py2 code into a common subset of Py2 and Py3
    ● Uses six and lib2to3
    ● Futurize converts Py2 into (almost) standard Py3 code
    python-modernize.readthedocs.io

    View Slide

  31. 31
    © 2019 Pinterest. All rights reserved.
    Modernize vs futurize
    modernize futurize

    View Slide

  32. 32
    © 2019 Pinterest. All rights reserved.
    bowler
    ● Requires Py3.6 can be run against Py2
    ● Lib2to3 based
    ● Simple to execute: bowler run ...
    pybowler.io

    View Slide

  33. 33
    © 2019 Pinterest. All rights reserved.
    bowler
    pybowler.io

    View Slide

  34. 34
    © 2019 Pinterest. All rights reserved.
    bowler

    View Slide

  35. 35
    © 2019 Pinterest. All rights reserved.
    Bowler vs lib2to3

    View Slide

  36. 36
    © 2019 Pinterest. All rights reserved.
    black
    ● Requires Py3.6 can be run against Py2
    ● Lib2to3 based
    ● Validates CST transformation with AST
    black.readthedocs.io

    View Slide

  37. 37
    © 2019 Pinterest. All rights reserved.
    Conclusion
    Conclusion 1 2 3 4

    View Slide

  38. 38
    © 2019 Pinterest. All rights reserved.
    Conclusion
    ● Syntax trees make code transformations quick and safe
    ● Saved countless hours of tedious labor
    ● Complex edge cases are still complex

    View Slide

  39. 39
    © 2019 Pinterest. All rights reserved.

    View Slide