Slide 1

Slide 1 text

Mic Test My Name is Joongi Kim. The title of my talk is Modernizing development workflow for a 7-year old 74K LoC Python project using Pantsbuild. My presentation will be in English. The presentation materials are in English. I will publish the presentation materials. I agree to having my picture taken during my presentation. I will comply with the PyCon JP Code of Conduct.

Slide 2

Slide 2 text

Modernizing development workflow for a 7-year old 74K LoC Python project using Pantsbuild Joongi Kim (김준기, ⾦駿起) Lablup Inc. ("래블업" aka "ラボをアップグレ ド")

Slide 3

Slide 3 text

About Me • Working as CTO / co-founder of Lablup Inc. since 2015 Has been developing Backend.AI for 7+ years An open-source enthusiast Domain: backend engineering, systems programming, distributed & accelerated computing Interests: writing codes manageable 3 years later • Recent status (updating my codes written 5 years ago...) • Talked in PyCon KR for 8 years in serial... Mostly about asyncio-related topics 🤯 😵💫

Slide 4

Slide 4 text

Table of Contents • Mono-repo vs. Multi-repo: That is the problem • Problems with the prior art in my team • Short introduction to Pantsbuild • Mono-repo migration process using Pantsbuild • Customization to adapt with our cases • Experience after migration • Recap

Slide 5

Slide 5 text

Mono-repo vs. Multi-repo: That is the problem! Background

Slide 6

Slide 6 text

Dependency mgmt is hard • "Software Engineering at Google" Long-term SW engineering is all about keeping the pace of upgradability. The key of upgradability is dependency management. Backend.AI is passing this point (7-years old, ~74K LoC) ref) 😵

Slide 7

Slide 7 text

Mono-repo to the rescue? • Well-known dependency managers yarn, npm, cargo, poetry, pipenv, go get, ... Diamond ( ) dependency conflicts ✓ Sandboxing all n-th order dependencies vs. full resolution (NP-complete[1]) • Two axes of dependency management Internal: Cross-dependency between components written by us External: Depedency to components written by others Most existing build systems take care of external dependencies only! • Managing internal dependencies Multi-repo: multiple per-component repositories Mono-repo: single merged repository of all components [1]

Slide 8

Slide 8 text

Mono-repo to the rescue? • There is no single right answer! How do your component teams collaborate? ✓ Mono-repo may be better if a single team develops multiple components. How synchronous are the release cycles of related components? How closely coupled are the components? (e.g., direct type refs, unversioned APIs) ✗ Scalability issues if the repo becomes very large ✗ Difficult to fork individual components ✗ Difficult to set up CI/CD workflows ✓ Less pain for refactoring across components ✓ Sharing the same development culture & process ✓ Easier onboarding with a unified view of systems ✓ Independent release cycle & versioning ✓ Per-repository team access control ✓ Taking advantage of existing build systems ✗ Team fragmentation by repo boundaries ✗ Sync overheads of internal dependencies ✗ Difficult to have a holistic view Multi-Repo Mono-Repo ref)

Slide 9

Slide 9 text

Mono-repo + Modern build system • Mono-repo simplifies internal dependency mgmt. Easy to find duplications and refactor all API usage occurrences at once e.g., "One version rule" • Mono-repo at large scale needs a "modern" build system. Support both: a single unified build vs. per-component builds Reproducible builds Minimize human errors & mistakes ✓ Declarative dependency configurations ✓ Automatic dependency resolution & inference Speed up builds and CI/CD pipelines ✓ Detecting the affected modules for a changeset in CI workflows ✓ Parallelized & distributed execution with artifact caching

Slide 10

Slide 10 text

Problems with the prior art in my team Why did we begin to consider mono-repo?

Slide 11

Slide 11 text

Previous practice • Per-package GitHub repositories One Python wheel → one repository Release each package independently using the standard setuptools/pip toolchain • Setting up Backend.AI[1] development env. The minimum set of components for the server-side (6 repositories) ✓ manager, agent, common, client-py, webserver, storage-proxy ✓ (There is yet another long story for the frontend...) A single-line installation script ( ✓ Installs database containers using docker-compose ("halfstack") ✓ Clones multiple repositories, creates venvs, runs "editable-install" in each venv, and populates database schema with fixtures [1]

Slide 12

Slide 12 text

Problems with prior art (1/2) • Difficult to write and review multiple PRs for a single issue A single issue often consists of multiple PRs to multiple repositories Often we forget to switch git branches same for multi-repo clones ✓ There is an implicit rule to match the PR branch names in different repos, and new contributors often forget this, breaking CI/CD. Difficult to keep our mind contexts when switching repositories ✓ e.g., Forgeting to add corresponding client function for a new server API ✓ Often reviewers forget things as well. 🤯 Not very compatible with GitHub ✓ The issue resolution from multiple linked PRs: "OR" instead of "AND" ✓ GitHub Codespace works for a single repository only. ✓ GitHub Project v2 is still missing cross-repo label, milestone configurations.

Slide 13

Slide 13 text

Problems with prior art (2/2) • Feeling hesitated with refactoring Reducing the maintenance points > Splitting components by clear purpose & semantics No way to specify explicit internal dependencies • Time-consuming release process Painful to repeat the same release workflow for 6 repositories... (error-prone) ✓ Need to repeat updating CI/CD configs for 6 times... (reduced motivation to improve) Waiting for dependee packages to get released when there are internal dependencies • Difficult to keep track of compatible set of component version combinations Often a minor patch release makes it incompatible because different components are closed coupled. Causing headaches for on-site engineering staffs when upgrading and applying custom patches for individual customer sites

Slide 14

Slide 14 text

Solution • The problem Reduced motivation to refactor across components & improve dev process (cynicism) Too high context switching overheads for managing issues & PRs • Let's migrate to (semi-)mono-repo! Backend.AI is not yet as big as Google's repository — don't need to worry about the extreme scalability issues. Target repos: open-source core components that shares the same release cycle and has internal cross-dependencies • Challenges How to automate internal dependency management? (e.g., parsing/generating setup.cfg?) How to run tests against only changed modules on commits? (e.g., git sparse checkout?) We need a modern build system tailored for mono-repo!

Slide 15

Slide 15 text

Introduction to Pantsbuild Is it our savior?

Slide 16

Slide 16 text

What is Pants? • Main features Automatic dependency inference by static analysis First-class support for the Python ecosystem Graph-based parallel & async task execution Extensible with a plugin subsystem • Overview Pants 2 is a fast, scalable, user-friendly build system for codebases of all sizes. It's currently focused on Python, Go, Java, Scala, Shell, and Docker, with support for other languages and frameworks coming soon. ref)

Slide 17

Slide 17 text

Pants: Demo

Slide 18

Slide 18 text

Pants: Architecture Rust-based DAG scheduler & async-parallel execution engine (monadic, pure, cancellable/interruptible, concurrent, cached) Python-based BUILD rule engine & intrinsic multi-language plugins Filesystem and OS pants.toml + BUILD configs in Git repositories (Target build may use an arbitrary Python version) Included inside PyPI's pantsbuild-pants wheel package Pre-installed Python Runtime (one of 3.7, 3.8, 3.9) PEX venv generator & dependency resolver

Slide 19

Slide 19 text

Pants: Basic Usage • Requirements to start using Pants[1] ./pants script (download from pants.toml & **/BUILD files ✓ What to build (including source & resource files), what they depends on others • ./pants [global-options] {goal} [goal-options] [targets] What it does: ✓ Self-bootstrap Pants itself at ~/.cache/pants/ & ./.pants.d ✓ Generate a task DAG from BUILD files ✓ Asynchronously run the DAG with parallelization when possible Refer our team's cheatsheet how it works with daily development workflows[2] [1] [2]

Slide 20

Slide 20 text

The migration process How did we do it?

Slide 21

Slide 21 text

Restructuring GitHub repos Backend.AI Core Backend.AI Fronted Unify!

Slide 22

Slide 22 text

Restructuring dev-setup

Slide 23

Slide 23 text

Mono-repo structure Unified package version Pants build config for each directory (like Bazel) "local-config" templates for Backend.AI Core Backend.AI developer documentation Backend.AI plugin development workspace Utility shell scripts for developers Backend.AI Core source codes Backend.AI Core test codes Pants main config Toolchain configs (flake8, mypy, pytest) Unified requirements for all components Unified requirements dependency lock Toolchain requirements dependency lock Our Pants plugin for custom generation Main entry scripts for daily use venvs & build artifacts generated by Pants VERSION **/BUILD configs/{manager,agent,common,...} docs/ plugins/ scripts/ src/ai/backend/{manager,agent,common,...} tests/{manager,agent,common,...} pants.toml, pyproject.toml, .flake8 requirements.txt python.lock tools/*.lock tools/pants-plugin/setupgen/ ./pants, ./py, ./ dist/

Slide 24

Slide 24 text

Migrating multi-repo setup.cfg, requirements/*.txt changes/, configs/ scripts/ src/ai/backend/{component}/ tests/ src/ai/backend/{component}/BUILD, VERSION, pants.toml, BUILD, requirements.txt src/ai/backend/{component}/ changes/, configs/{component}/ scripts/{component}/ src/ai/backend/{component}/ tests/{component}/ (unified) (unified) __version__ = '22.03.1' from pathlib import Path __version__ = ( Path(__file__).parent / 'VERSION' ).read_text().strip() $ cd src/ai/backend/{component} $ ln -s ../../../../VERSION src/ai/backend/{component}/ (moved) (moved) (moved) (moved) (moved)

Slide 25

Slide 25 text

The whole history • Started : Apr 27 / Merged: May 31 (168 commits) / many follow-up PRs afterwards ✓ The initial plan was two weeks, but as always... 😅 More than 60 times of Q&A in the Pantsbuild community Slack Pants: 5 bug reports (all fixed now), 2 feature requests, 2 doc patches Pex: 3 bug reports triggering new releases ✓ Afternoon KST: bug report / Dinner KST: talk with developers / Night-morning KST: developers fix the issue and release / Next morning KST: apply the release • The size of mono-repo Backend.AI Core LoC: 74K+ LoC including all external dependencies: 1.5M+

Slide 26

Slide 26 text

Customzing Pants There is no silver bullet, as always...

Slide 27

Slide 27 text Generator • Pants plugin : tools/pants-plugins/setupgen • What's added Single-source the version number from the root's VERSION file Change long_description_type depending on the extension of README (.md, .rst) Change trove classifer depending on the version number suffix (a, b, rc) Add the license type argument so that each wheel package may have different licenses

Slide 28

Slide 28 text

towncrier Tool • Pants plugin: tools/pants-plugins/towncrier • What's added Like black, isort, and flake8, defined a new "PythonTool" for towncrier Allows using independent venvs and dependency lockfiles for towncrier

Slide 29

Slide 29 text

Platfrom-specific Deps • Pants plugin: tools/pants-plugins/platform_resources • What's added Use different resource files (pre-built executables) for Backend.AI Agent by the target platform argument Needed to rewrite the code upon Pants minor version updates as dependency management implemention in Pants has frequent updates (expecting to be stabilized soon)

Slide 30

Slide 30 text

Dynamuic Module Loading • Wrote a module loader that searches & parses BUILD files using AST Backend.AI largely depends on entry_points of the package metadata for plugin and replacible module discoveries. • Use ./pants export :: and ./py wrapper script instead of ./pants run ... In PEX envs, there is neither BUILD files nor the package metadata.

Slide 31

Slide 31 text

Experience after migration Importance of open source ecosystem

Slide 32

Slide 32 text

Satisfying points • Decreased the time needed for making new release (hours → ~10 min.) Automated release-related workflows e.g., Generate GitHub's release note by extracting the latest section of • Reduced code review burdens & context switching overheads One issue completes with one PR! (single file tree & single diff) Review all things together, including documentation Now we can utilize GitHub better: Projects v2 & Codespace • Minimized the impact to CI/CD execution times by taking diffs ./pants test --changed-since=main

Slide 33

Slide 33 text

Adaptation required (1/3) • Requiring multiple installs of Python versions Pants requires Python 3.9 on Apple Silicon Macs / Backend.AI requires Python 3.10 macOS Monterey (via XCode CLI tools) provides Python 3.8 by default Depending on when & how you have used Homebrew or pyenv, Python 3.9 may be missing! ✓ Fastest workaround: brew install [email protected] or pyenv local 3.9.13 For new contributors with less experience on managing multiple Python versions, this becomes a huge hurdle!

Slide 34

Slide 34 text

Adaptation required (2/3) • ImportError due to forgetting to pass PYTHONPATH environment variable Subprocesses should also take pants.toml's source_roots into account • ImportError due to dynamic imports importlib.import_module() SQLAlchemy selects which engine module to import based on the database URL Such dependencies should be manually specified in BUILD files.

Slide 35

Slide 35 text

Adaptation required (3/3) • Unexpected amount of efforts to parallelize the test suite Port number conflicts of database containers created as a test fixture ✓ pants.toml: [pytest].execution_slot_var = "BACKEND_TEST_EXEC_SLOT" ✓ Use emphemeral port numbers and/or add the slot number to a fixed constant • Ubuntu 22.04 + Snap + Docker + /tmp Snap enforces Docker to use a private /tmp instead of the host /tmp. Mounting a /tmp sub-directory to fixture containers → unexpected failure It is not Pants-own problem, but test parallelization would lead to this pitfall in many cases. ✓ Workaround by using ./.tmp instead of /tmp

Slide 36

Slide 36 text

Summary Recap & what's left

Slide 37

Slide 37 text

Recap • Worth enough the efforts and time No slow-down of development process after migration, thanks to Pants! ✓ Introduced black + isort + git hook, automation of release note generation Opened many ways to exploit new features of GitHub ✓ Larger action runner to speed up CI, Projects v2, and Codespace • New concerns: unifying tracking of public & private issues • Friendly technical support from the Pantsbuild community Slack The core members and contributors are welcome to questions. I'm trying to contribute back as well! (bug reports, PyCon talks, etc.) • Pants is a highly recommended option if you consider Python-based mono-repo!

Slide 38

Slide 38 text

About our project: Backend.AI • In short: "An all-in-one enterprise platform to develop and operate AI services" • It is an open-source project with enterprise plugins. Contributed to & created many open-source libraries to support this project ✓ aiodocker, aiohttp, aiomonitor-ng, aiotools, aiotusclient, async-timeout, callosum (async RPC), click, etcetra (async etcd3 client), janus, pyzmq, ...

Slide 39

Slide 39 text

About our project: Backend.AI • In short: "An all-in-one enterprise platform to develop and operate AI services" • We are opening a small exhibition booth in the Japan IT Week (Oct 26-28)!