Slide 1

Slide 1 text

Google Confidential and Proprietary Maintaining Python at Google Scale Thomas Wouters Yhg1s on Freenode IRC Google Confidential and Proprietary Agenda ● Google Scale ● Python at Google ● … in 2006 ● Solving the problems ● Building Python ● Embedding Python ● Unsolved issues ● Questions

Slide 2

Slide 2 text

Google Confidential and Proprietary Google Scale ● Lots of machines ○ servers and workstations ● Lots of code ○ over 100 million lines of code ● Single shared codebase (mostly) ○ lots of re-use, lots of moving targets ● Lots of third-party software use ○ including open-source software ○ Python, GCC, LLVM, countless libraries ○ strict adherence to licenses ● Lots of exceptions Google Confidential and Proprietary Build ideals ● Hermetic programs ○ completely self-contained binaries ○ run anywhere ○ get the same result everywhere ● Reproducible builds ○ build at the same revision, get bit-identical binaries ○ easier to cherry-pick changes ○ easier to detect unintended changes ● No shared libraries ○ static linking everywhere ○ no ABI concerns ● Build everything from head

Slide 3

Slide 3 text

Google Confidential and Proprietary Python at Google ● Third-largest language ○ 1/3rd lines of code of C++ ○ still many millions of lines of code ○ lots of use of C++ libraries (SWIG) ● One giant Python package ● Blaze (build tool) ○ builds everything (in the cloud) ○ creates “entrypoint” scripts ○ makes interactive use of Python harder ● PAR ○ “JAR for Python” ○ executable, distributable format ○ hermetic except for Python ● Lots of third-party use ○ sys.path trickery to not change module names Google Confidential and Proprietary Python at Google in 2007 ● Python programs controlled by shebang line ○ mostly Python 2.2 ○ 2.4 slowly growing ○ some 2.3, never officially supported ● Different versions of 2.4 on different machines ○ workstations, RedHat-based, using 2.4.1, 32-bit ○ servers using 2.4.3 with a different libc version, 32-bit ○ new workstations, Ubuntu-based, using 2.4.5… but 64-bit ● Not hermetic at all ○ system-installed third-party dependencies available ● No way to use 64-bit Python ○ C++ was mostly 64-bit, except when building Python programs ● All extension modules built for Python 2.2 ○ and then used in 2.4

Slide 4

Slide 4 text

Google Confidential and Proprietary Unifying Environments ● Google Runtime Environment ○ runtime libraries (glibc, Python) independent from the system ○ version controlled by configuration in source tree ● Python as part of GRTE ○ Python 2.4 only, but 32-bit and 64-bit ○ ignore shebang lines of .py files ○ build tool selects right Python to use ■ for extension modules as well ○ Still not quite hermetic ● Difficult to update once in use ○ with millions of lines of code, many bugs seem like features ○ long release cycle ● Flag-day to flip Python version ○ along with glibc and gcc ○ lots of testing precedes Google Confidential and Proprietary GRTE and Python versions ● GRTEv1: Python 2.4 (2008) ○ to save space, symlinks identical files between 32-bit and 64-bit ○ can’t symlink .pyc files ● GRTEv2: Python 2.6 (2010) ○ first major upgrade of Python in many years ○ disables writing .pyc/.pyo files by default (-B option) ○ to save space, uses the same stdlib for both 32-bit and 64-bit ■ (causes confusing tracebacks.) ● GRTEv3: Python 2.7 (2012) ○ turns on hash randomization by default ■ flushed out a surprising number of bugs ○ to save space, puts the stdlib in a ZIP file ■ flushed out a surprising number of bugs ○ Builds with PGO: +20% performance

Slide 5

Slide 5 text

Google Confidential and Proprietary Building Python is Hard ● Two step process ○ build python ○ run setup.py with built python ● setup.py is messy code ○ searches filesystem ■ third-party packages on host affect build output ○ can’t static link dependencies ■ need static linking for openssl, readline, etc. ● Pre-distutils way: Modules/Setup ○ still used for built-in (static linked) extension modules ○ can also produce shared extension modules ○ can control exact compiler arguments (up to a point) ● PGO: profile-guided optimization ○ make profile-opt ○ who knew? Google Confidential and Proprietary Embedding Python ● Not as easy as it looks ● Even with static linking, Python still needs standard library ○ including extension modules ● Standard library is searched for relative to executable ○ run program from inside /usr, find system Python stdlib ○ run program elsewhere, find GRTE Python stdlib ○ same is true for ‘exec -a process_name python ...’ ■ used by PAR ○ see Modules/getpath.c:search_for_prefix in CPython source ● Solution: embed all the things ○ static link extension modules ○ embed Python stdlib in ZIP file in executable ○ modified zipimport loads stdlib from executable ● Solved by volunteer from another team ○ 20% time

Slide 6

Slide 6 text

Google Confidential and Proprietary Unsolved issues ● Actual hermetic builds ○ include Python in PAR file ■ like py2exe and pyinstaller ■ allows for gradual evolution of Python in Google ● Extension modules are problematic for PAR files ○ change glibc to accommodate Python ● Python 3 ○ treated as parallel Python version ○ talk to Greg ● Windows / MacOS / non-Google machines ○ ignoring for now ● Finding unused/dead code ○ some is flushed out during GRTE upgrades ● Keeping track of all uses of our Python ○ interesting new uses sneak in Google Confidential and Proprietary Questions ● Google Engineering Tools blog ○ all about the build system and the tools ○ http://google-engtools.blogspot.com/ ● Questions (if there is time) ● Come talk to us ○ Thomas Wouters ○ Gregory P. Smith