Jordan Adler, Joe Gordon - Migrating Pinterest from Python2 to Python3

Jordan Adler, Joe Gordon - Migrating Pinterest from Python2 to Python3

Over the course of nearly a year, we migrated Pinterest's primary systems from Python2 to Python3. A large, tightly coupled codebase with over 2 million lines of code, the Pinterest codebase contained nearly every edge case that might exist in a Py2 to Py3 migration.

We'll cover our approach, gotchas, and tools, and the incredible impact our migration has made on infra spend and code quality.

https://us.pycon.org/2019/schedule/presentation/147/

53b37e14a09c5a718a39fda61fe1b8e5?s=128

PyCon 2019

May 03, 2019
Tweet

Transcript

  1. 1.
  2. 2.

    2 © 2019 Pinterest. All rights reserved. Jordan Adler, Software

    Engineer Joe Gordon, Site Reliability Engineer Migrating Pinterest from Python 2 to Python 3
  3. 4.

    4 © 2019 Pinterest. All rights reserved. Our mission to

    create a life the inspiration everyone they love. To bring
  4. 8.

    8 © 2019 Pinterest. All rights reserved. Python at Pinterest

    • Started with • Python now used to serve over 250 million monthly active users • Python’s speed and flexibility enables quick experimentation
  5. 9.

    9 © 2019 Pinterest. All rights reserved. Large 2.6 million

    LOC (excl deps) Aged 10 year old codebase with over 1,000 authors over lifetime Dynamic ~3,500 changes monthly from over 450 developers Multi-Stakeholder Tightly Coupled Problem Statement Large-Scale Python Codebase Migration
  6. 10.

    10 © 2019 Pinterest. All rights reserved. Engineering Principles •

    Start Simple ◦ We strive to build the simplest viable solution. ◦ We learn, iterate and scale quickly. ◦ We value progress over perfection. • Build for Impact ◦ We look beyond our team’s goals for the highest impact opportunities. ◦ We are metrics-informed, not metrics-driven. ◦ We prioritize Pinners over technology in our decisions. • Own It! ◦ We are responsible for driving our work forward. ◦ When there is ambiguity, we take initiative. ◦ We take pride in improving our work, keeping it high-quality and performant.
  7. 11.

    11 © 2019 Pinterest. All rights reserved. Gradual Py3 Rollout

    1. Make Py3 available 2. Upgrade requirements 3. Futurize codebase 4. Test under Py2 and Py3 5. Migrate production environments to Py3 6. Drop support for Py2 7. Add Py3 only features
  8. 12.

    12 © 2019 Pinterest. All rights reserved. Upgrade requirements •

    Start at bottom of dependency graph • caniusepython3 (version classifier troves) • Unmaintained dependencies • >8 years of changes in some cases • CI test of requirements.txt as backstop • Environment markers; python_version < '3'
  9. 15.

    15 © 2019 Pinterest. All rights reserved. Stage 2: lib2to3.fixes.fix_basestring

    lib2to3.fixes.fix_dict lib2to3.fixes.fix_exec lib2to3.fixes.fix_getcwdu lib2to3.fixes.fix_input lib2to3.fixes.fix_itertools lib2to3.fixes.fix_itertools_imports lib2to3.fixes.fix_filter lib2to3.fixes.fix_long lib2to3.fixes.fix_map lib2to3.fixes.fix_nonzero lib2to3.fixes.fix_operator lib2to3.fixes.fix_raw_input lib2to3.fixes.fix_zip libfuturize.fixes.fix_cmp libfuturize.fixes.fix_division libfuturize.fixes.fix_execfile libfuturize.fixes.fix_future_builtins libfuturize.fixes.fix_future_standard_library libfuturize.fixes.fix_future_standard_library_urllib libfuturize.fixes.fix_metaclass libpasteurize.fixes.fix_newstyle libfuturize.fixes.fix_object libfuturize.fixes.fix_unicode_keep_u libfuturize.fixes.fix_xrange_with_import Fixers Stage 1: lib2to3.fixes.fix_apply lib2to3.fixes.fix_except lib2to3.fixes.fix_exitfunc lib2to3.fixes.fix_funcattrs lib2to3.fixes.fix_has_key lib2to3.fixes.fix_idioms lib2to3.fixes.fix_intern lib2to3.fixes.fix_isinstance lib2to3.fixes.fix_methodattrs lib2to3.fixes.fix_ne lib2to3.fixes.fix_numliterals lib2to3.fixes.fix_paren lib2to3.fixes.fix_reduce lib2to3.fixes.fix_renames lib2to3.fixes.fix_repr lib2to3.fixes.fix_standarderror lib2to3.fixes.fix_sys_exc lib2to3.fixes.fix_throw lib2to3.fixes.fix_tuple_params lib2to3.fixes.fix_types lib2to3.fixes.fix_ws_comma lib2to3.fixes.fix_xreadlines libfuturize.fixes.fix_absolute_import libfuturize.fixes.fix_next_call libfuturize.fixes.fix_print_with_import libfuturize.fixes.fix_raise
  10. 17.

    17 © 2019 Pinterest. All rights reserved. Futurize Codebase •

    Linters and CI to prevent regressions • Apply each fix individually • Run unit tests under both Python 2 and 3 • Upgrading to Py3 is an exercise in discovering what code is tested
  11. 18.

    18 © 2019 Pinterest. All rights reserved. Dependency Graph •

    Introspect internal dependency graph ◦ Test driven migration • Monkey patch __import__() • Build a list of modules that run under Py3 • Find test modules that have the fewest dependencies on Py2 only
  12. 19.

    19 © 2019 Pinterest. All rights reserved. Test under Py2

    and Py3 • Test runner and base class uses 100’s of modules • Bootstrapping problem • Add smaller test runner and base class
  13. 20.

    20 © 2019 Pinterest. All rights reserved. Fail Paths •

    Detectable by Flake8 ◦ Syntax Errors ◦ Scope • Detectable at Import Time ◦ Bad dependencies ◦ Code evaluated on import • Detected at Runtime ◦ Via unit tests ◦ In production ◦ Everything else
  14. 21.

    21 © 2019 Pinterest. All rights reserved. Potential Complications •

    Large and complex code base • test coverage • Business logic is complex (int/float, str/bytes) • Limitations in code transformation ◦ old_div_safe ◦ fix_dict ◦ str/bytes
  15. 32.

    32 © 2019 Pinterest. All rights reserved. Py2 Py3 Code:

    Numbers x = None x > 1 False TypeError: '>' not supported between instances of 'NoneType' and 'int'
  16. 33.

    33 © 2019 Pinterest. All rights reserved. Py2 Py3 Code:

    Numbers month = 01 print(month) 1 SyntaxError: invalid token
  17. 34.

    34 © 2019 Pinterest. All rights reserved. Number Complex Real

    Rational PEP 3141: A Type Hierarchy for Numbers Numeric Tower of ABCs Int Bool Float Complex Numeric Primitives
  18. 35.

    35 © 2019 Pinterest. All rights reserved. Py2 Py3 Code:

    Bytes user_id_int = 2345434565434565434 user_id_string = bytes(user_id) '2345434565434565434' Python(36431,0x7fff894fc380) malloc: ... MemoryError
  19. 36.

    36 © 2019 Pinterest. All rights reserved. Py2 Py3 Code:

    Bytes for i in b'123': print(i) 1 2 3 49 50 51
  20. 37.

    37 © 2019 Pinterest. All rights reserved. Py2 Py3 Code:

    Strings import string print(string.ascii_letters) print(string.letters) abcdefghijklmnopqrstuvwxyzABCDEFGH IJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyzABCDEFGH IJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyzABCDEFGHIJ KLMNOPQRSTUVWXYZ AttributeError: module 'string' has no attribute 'letters'
  21. 38.

    38 © 2019 Pinterest. All rights reserved. Py2 Py3 Code:

    Bytes/Strings # -*- encoding: utf-8 -*- try: raise Exception(u' ') except Exception as e: print(e) UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)
  22. 39.

    39 © 2019 Pinterest. All rights reserved. Sequence Sanity list

    tuple unicode str Container Contains any unicode bytes list tuple str bytes Container Contains any encoded bytes bytes
  23. 40.

    40 © 2019 Pinterest. All rights reserved. Py2 Py3 Code:

    Scopes [i for i in range(10)] print(i) 9 NameError: name 'i' is not defined
  24. 41.

    41 © 2019 Pinterest. All rights reserved. Py2 Py3 Code:

    Scopes class Foo(object): A = [1] B = [i for i in [1, 2] if i not in A] print(B) [2] NameError: name 'A' is not defined
  25. 42.

    42 © 2019 Pinterest. All rights reserved. “This is because

    list comprehensions are now implemented with their own function object like generator expressions have always been.” https://bugs.python.org/issue21161
  26. 44.

    44 © 2019 Pinterest. All rights reserved. Py2 Py3 Code:

    Scopes class Foo(object): def __init__(self): print(list(locals().keys())) super(Foo, self).__init__() Foo() ['self'] ['self', '__class__']
  27. 45.

    45 © 2019 Pinterest. All rights reserved. Py2 Py3 Code:

    Dictionaries def foo(**kwargs): print(kwargs) foo(b=1, a=2) {'a': 2, 'b': 1} {'b': 1, 'a': 2} {'a': 2, 'b': 1} Py3.6+ {'b': 1, 'a': 2}
  28. 46.

    46 © 2019 Pinterest. All rights reserved. Hash randomization is

    intended to provide protection against a denial-of-service … O(n^2) complexity. See http://www.ocert.org/advisories/ocert-20 11-003.html for details. https://docs.python.org/3.3/using/cmdline.html#cmdoption-R
  29. 47.

    47 © 2019 Pinterest. All rights reserved. The order-preserving aspect

    of this new implementation is considered an implementation detail and should not be relied upon https://docs.python.org/3/whatsnew/3.6.html #whatsnew36-compactdict
  30. 48.

    48 © 2019 Pinterest. All rights reserved. Py2 Py3 Code:

    Dictionaries mydict = {1: 'z', 2: 'a'} print(mydict.keys() + [3]) [1, 2, 3] TypeError: unsupported operand type(s) for +: 'dict_keys' and 'list'
  31. 49.

    49 © 2019 Pinterest. All rights reserved. Py2 Py3 Code:

    Exceptions try: raise Exception() except Exception: print(sys.exc_info()[0]) print(sys.exc_info()[0]) <type 'exceptions.Exception'> <type 'exceptions.Exception'> <class 'Exception'> None
  32. 51.

    51 © 2019 Pinterest. All rights reserved. Py2 Py3 Code:

    Exceptions from exceptions import KeyError ModuleNotFoundError: No module named 'exceptions'
  33. 53.

    53 © 2019 Pinterest. All rights reserved. Py2 Py3 Code:

    Hashability class MyString(str): def __eq__(self, other): return super(MyString, self).__eq__(other) print(hash(MyString('pycon'))) 5778351363512243486 TypeError: unhashable type: 'MyString'
  34. 54.

    54 © 2019 Pinterest. All rights reserved. Py2 Py3 Code:

    Unicode # -*- encoding: utf-8 -*- s = u"a " for char in s: print(hex(ord(char))) 0x61 0x20 0xd83d 0xdca9 0x61 0x20 0x1f4a9 Py2 Wide 0x61 0x20 0x1f4a9
  35. 55.

    55 © 2019 Pinterest. All rights reserved. Py2 Py3 Code:

    StringIO from cStringIO import StringIO StringIO(b'imagedata') StringIO(u'text') ModuleNotFoundError: No module named 'cStringIO'
  36. 56.

    56 © 2019 Pinterest. All rights reserved. Py2 Py3 Code:

    StringIO from io import StringIO StringIO(b'sdf') StringIO(u'sdf') TypeError: initial_value must be str or None, not bytes TypeError: initial_value must be str or None, not bytes
  37. 57.

    57 © 2019 Pinterest. All rights reserved. Output Code: Numbers

    - JavaScript Edition! Welcome to Node.js v12.1.0. > 2**55+10 > 2**55+9 > 2**55+10 36028797018963976 > 2**55+9 36028797018963976
  38. 58.

    58 © 2019 Pinterest. All rights reserved. Py2 Py3 Code:

    Mock import mock with mock.patch('__builtin__.open'): with open('notafile'): pass Traceback (most recent call last): ... ModuleNotFoundError: No module named '__builtin__'