Wouters <[email protected]> <[email protected]> Yhg1s on Freenode IRC Google Confidential and Proprietary Agenda • Google Scale • Python at Google • … in 2006 • Solving the problems • Building Python • Embedding Python • Unsolved issues • Questions
◦ servers and workstations • Lots of code ◦ over 100 million lines of code • Single shared codebase (mostly) ◦ lots of re-use, lots of moving targets • Lots of third-party software use ◦ including open-source software ◦ Python, GCC, LLVM, countless libraries ◦ strict adherence to licenses • Lots of exceptions Google Confidential and Proprietary Build ideals • Hermetic programs ◦ completely self-contained binaries ◦ run anywhere ◦ get the same result everywhere • Reproducible builds ◦ build at the same revision, get bit-identical binaries ◦ easier to cherry-pick changes ◦ easier to detect unintended changes • No shared libraries ◦ static linking everywhere ◦ no ABI concerns • Build everything from head
◦ 1/3rd lines of code of C++ ◦ still many millions of lines of code ◦ lots of use of C++ libraries (SWIG) • One giant Python package • Blaze (build tool) ◦ builds everything (in the cloud) ◦ creates “entrypoint” scripts ◦ makes interactive use of Python harder • PAR ◦ “JAR for Python” ◦ executable, distributable format ◦ hermetic except for Python • Lots of third-party use ◦ sys.path trickery to not change module names Google Confidential and Proprietary Python at Google in 2007 • Python programs controlled by shebang line ◦ mostly Python 2.2 ◦ 2.4 slowly growing ◦ some 2.3, never officially supported • Different versions of 2.4 on different machines ◦ workstations, RedHat-based, using 2.4.1, 32-bit ◦ servers using 2.4.3 with a different libc version, 32-bit ◦ new workstations, Ubuntu-based, using 2.4.5… but 64-bit • Not hermetic at all ◦ system-installed third-party dependencies available • No way to use 64-bit Python ◦ C++ was mostly 64-bit, except when building Python programs • All extension modules built for Python 2.2 ◦ and then used in 2.4
◦ runtime libraries (glibc, Python) independent from the system ◦ version controlled by configuration in source tree • Python as part of GRTE ◦ Python 2.4 only, but 32-bit and 64-bit ◦ ignore shebang lines of .py files ◦ build tool selects right Python to use ▪ for extension modules as well ◦ Still not quite hermetic • Difficult to update once in use ◦ with millions of lines of code, many bugs seem like features ◦ long release cycle • Flag-day to flip Python version ◦ along with glibc and gcc ◦ lots of testing precedes Google Confidential and Proprietary GRTE and Python versions • GRTEv1: Python 2.4 (2008) ◦ to save space, symlinks identical files between 32-bit and 64-bit ◦ can’t symlink .pyc files • GRTEv2: Python 2.6 (2010) ◦ first major upgrade of Python in many years ◦ disables writing .pyc/.pyo files by default (-B option) ◦ to save space, uses the same stdlib for both 32-bit and 64-bit ▪ (causes confusing tracebacks.) • GRTEv3: Python 2.7 (2012) ◦ turns on hash randomization by default ▪ flushed out a surprising number of bugs ◦ to save space, puts the stdlib in a ZIP file ▪ flushed out a surprising number of bugs ◦ Builds with PGO: +20% performance
step process ◦ build python ◦ run setup.py with built python • setup.py is messy code ◦ searches filesystem ▪ third-party packages on host affect build output ◦ can’t static link dependencies ▪ need static linking for openssl, readline, etc. • Pre-distutils way: Modules/Setup ◦ still used for built-in (static linked) extension modules ◦ can also produce shared extension modules ◦ can control exact compiler arguments (up to a point) • PGO: profile-guided optimization ◦ make profile-opt ◦ who knew? Google Confidential and Proprietary Embedding Python • Not as easy as it looks • Even with static linking, Python still needs standard library ◦ including extension modules • Standard library is searched for relative to executable ◦ run program from inside /usr, find system Python stdlib ◦ run program elsewhere, find GRTE Python stdlib ◦ same is true for ‘exec -a process_name python ...’ ▪ used by PAR ◦ see Modules/getpath.c:search_for_prefix in CPython source • Solution: embed all the things ◦ static link extension modules ◦ embed Python stdlib in ZIP file in executable ◦ modified zipimport loads stdlib from executable • Solved by volunteer from another team ◦ 20% time
◦ include Python in PAR file ▪ like py2exe and pyinstaller ▪ allows for gradual evolution of Python in Google • Extension modules are problematic for PAR files ◦ change glibc to accommodate Python • Python 3 ◦ treated as parallel Python version ◦ talk to Greg • Windows / MacOS / non-Google machines ◦ ignoring for now • Finding unused/dead code ◦ some is flushed out during GRTE upgrades • Keeping track of all uses of our Python ◦ interesting new uses sneak in Google Confidential and Proprietary Questions • Google Engineering Tools blog ◦ all about the build system and the tools ◦ http://google-engtools.blogspot.com/ • Questions (if there is time) • Come talk to us ◦ Thomas Wouters <[email protected]> ◦ Gregory P. Smith <[email protected]>