Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[PyCon JP] Modernizing development workflows for a 7-year old 74K LoC Python project using Pantsbuild

Joongi Kim
October 17, 2022

[PyCon JP] Modernizing development workflows for a 7-year old 74K LoC Python project using Pantsbuild

Joongi Kim

October 17, 2022
Tweet

More Decks by Joongi Kim

Other Decks in Programming

Transcript

  1. Mic Test
    My Name is Joongi Kim.
    The title of my talk is Modernizing development workflow for a 7-year old 74K LoC
    Python project using Pantsbuild.
    My presentation will be in English.
    The presentation materials are in English.
    I will publish the presentation materials.
    I agree to having my picture taken during my presentation.
    I will comply with the PyCon JP Code of Conduct.

    View Slide

  2. Modernizing development workflow
    for a 7-year old 74K LoC
    Python project using Pantsbuild
    Joongi Kim (김준기, ⾦駿起)
    Lablup Inc. ("래블업" aka "ラボをアップグレ ド")

    View Slide

  3. About Me
    • Working as CTO / co-founder of Lablup Inc. since 2015
    Has been developing Backend.AI for 7+ years
    An open-source enthusiast
    Domain: backend engineering, systems programming,
    distributed & accelerated computing
    Interests: writing codes manageable 3 years later
    • Recent status
    (updating my codes written 5 years ago...)
    • Talked in PyCon KR for 8 years in serial...
    Mostly about asyncio-related topics
    🤯 😵💫

    View Slide

  4. Table of Contents
    • Mono-repo vs. Multi-repo: That is the problem
    • Problems with the prior art in my team
    • Short introduction to Pantsbuild
    • Mono-repo migration process using Pantsbuild
    • Customization to adapt with our cases
    • Experience after migration
    • Recap

    View Slide

  5. Mono-repo vs. Multi-repo:
    That is the problem!
    Background

    View Slide

  6. Dependency mgmt is hard
    • "Software Engineering at Google"
    Long-term SW engineering is all about keeping the pace of upgradability.
    The key of upgradability is dependency management.
    Backend.AI is passing this point
    (7-years old, ~74K LoC)
    ref) https://abseil.io/resources/swe-book/html/ch01.html#time_and_change
    😵

    View Slide

  7. Mono-repo to the rescue?
    • Well-known dependency managers
    yarn, npm, cargo, poetry, pipenv, go get, ...
    Diamond ( ) dependency conflicts
    ✓ Sandboxing all n-th order dependencies vs. full resolution (NP-complete[1])
    • Two axes of dependency management
    Internal: Cross-dependency between components written by us
    External: Depedency to components written by others
    Most existing build systems take care of external dependencies only!
    • Managing internal dependencies
    Multi-repo: multiple per-component repositories
    Mono-repo: single merged repository of all components
    [1] https://research.swtch.com/version-sat

    View Slide

  8. Mono-repo to the rescue?
    • There is no single right answer!
    How do your component teams collaborate?
    ✓ Mono-repo may be better if a single team develops multiple components.
    How synchronous are the release cycles of related components?
    How closely coupled are the components? (e.g., direct type refs, unversioned APIs)
    ✗ Scalability issues if the repo becomes very large
    ✗ Difficult to fork individual components
    ✗ Difficult to set up CI/CD workflows
    ✓ Less pain for refactoring across components
    ✓ Sharing the same development culture & process
    ✓ Easier onboarding with a unified view of systems
    ✓ Independent release cycle & versioning
    ✓ Per-repository team access control
    ✓ Taking advantage of existing build systems
    ✗ Team fragmentation by repo boundaries
    ✗ Sync overheads of internal dependencies
    ✗ Difficult to have a holistic view
    Multi-Repo Mono-Repo
    ref) https://kinsta.com/blog/monorepo-vs-multi-repo/

    View Slide

  9. Mono-repo + Modern build system
    • Mono-repo simplifies internal dependency mgmt.
    Easy to find duplications and refactor all API usage occurrences at once
    e.g., "One version rule"
    • Mono-repo at large scale needs a "modern" build system.
    Support both: a single unified build vs. per-component builds
    Reproducible builds
    Minimize human errors & mistakes
    ✓ Declarative dependency configurations
    ✓ Automatic dependency resolution & inference
    Speed up builds and CI/CD pipelines
    ✓ Detecting the affected modules for a changeset in CI workflows
    ✓ Parallelized & distributed execution with artifact caching

    View Slide

  10. Problems with the prior art in my team
    Why did we begin to consider mono-repo?

    View Slide

  11. Previous practice
    • Per-package GitHub repositories
    One Python wheel → one repository
    Release each package independently using the standard setuptools/pip toolchain
    • Setting up Backend.AI[1] development env.
    The minimum set of components for the server-side (6 repositories)
    ✓ manager, agent, common, client-py, webserver, storage-proxy
    ✓ (There is yet another long story for the frontend...)
    A single-line installation script (install-dev.sh)
    ✓ Installs database containers using docker-compose ("halfstack")
    ✓ Clones multiple repositories, creates venvs, runs "editable-install" in each venv,
    and populates database schema with fixtures
    [1] https://github.com/lablup/backend.ai

    View Slide

  12. Problems with prior art (1/2)
    • Difficult to write and review multiple PRs for a single issue
    A single issue often consists of multiple PRs to multiple repositories
    Often we forget to switch git branches same for multi-repo clones
    ✓ There is an implicit rule to match the PR branch names in different repos, and
    new contributors often forget this, breaking CI/CD.
    Difficult to keep our mind contexts when switching repositories
    ✓ e.g., Forgeting to add corresponding client function for a new server API
    ✓ Often reviewers forget things as well. 🤯
    Not very compatible with GitHub
    ✓ The issue resolution from multiple linked PRs: "OR" instead of "AND"
    ✓ GitHub Codespace works for a single repository only.
    ✓ GitHub Project v2 is still missing cross-repo label, milestone configurations.

    View Slide

  13. Problems with prior art (2/2)
    • Feeling hesitated with refactoring
    Reducing the maintenance points > Splitting components by clear purpose & semantics
    No way to specify explicit internal dependencies
    • Time-consuming release process
    Painful to repeat the same release workflow for 6 repositories... (error-prone)
    ✓ Need to repeat updating CI/CD configs for 6 times... (reduced motivation to improve)
    Waiting for dependee packages to get released when there are internal dependencies
    • Difficult to keep track of compatible set of component version combinations
    Often a minor patch release makes it incompatible because different components are
    closed coupled.
    Causing headaches for on-site engineering staffs when upgrading and applying custom
    patches for individual customer sites

    View Slide

  14. Solution
    • The problem
    Reduced motivation to refactor across components & improve dev process (cynicism)
    Too high context switching overheads for managing issues & PRs
    • Let's migrate to (semi-)mono-repo!
    Backend.AI is not yet as big as Google's repository — don't need to worry about the
    extreme scalability issues.
    Target repos: open-source core components that shares the same release cycle and
    has internal cross-dependencies
    • Challenges
    How to automate internal dependency management? (e.g., parsing/generating setup.cfg?)
    How to run tests against only changed modules on commits? (e.g., git sparse checkout?)
    We need a modern build system tailored for mono-repo!

    View Slide

  15. Introduction to Pantsbuild
    Is it our savior?

    View Slide

  16. What is Pants?
    • Main features
    Automatic dependency inference by static analysis
    First-class support for the Python ecosystem
    Graph-based parallel & async task execution
    Extensible with a plugin subsystem
    • Overview
    https://www.pantsbuild.org/docs/how-does-pants-work
    https://blog.pantsbuild.org/pycon-us-2022-talk/
    https://blog.pantsbuild.org/pants-vs-bazel/
    Pants 2 is a fast, scalable, user-friendly build system for codebases of all
    sizes. It's currently focused on Python, Go, Java, Scala, Shell, and Docker,
    with support for other languages and frameworks coming soon.
    ref) https://www.pantsbuild.org/

    View Slide

  17. Pants: Demo

    View Slide

  18. Pants: Architecture
    Rust-based DAG scheduler & async-parallel execution engine
    (monadic, pure, cancellable/interruptible, concurrent, cached)
    Python-based BUILD rule engine
    & intrinsic multi-language plugins
    Filesystem and OS
    pants.toml + BUILD configs in Git repositories
    (Target build may use an arbitrary Python version)
    Included inside PyPI's
    pantsbuild-pants wheel package
    Pre-installed
    Python Runtime
    (one of 3.7, 3.8, 3.9)
    PEX venv generator & dependency resolver

    View Slide

  19. Pants: Basic Usage
    • Requirements to start using Pants[1]
    ./pants script (download from https://static.pantsbuild.org/setup/pants)
    pants.toml & pants.ci.toml
    **/BUILD files
    ✓ What to build (including source & resource files), what they depends on others
    • ./pants [global-options] {goal} [goal-options] [targets]
    What it does:
    ✓ Self-bootstrap Pants itself at ~/.cache/pants/ & ./.pants.d
    ✓ Generate a task DAG from BUILD files
    ✓ Asynchronously run the DAG with parallelization when possible
    Refer our team's cheatsheet how it works with daily development workflows[2]
    [1] https://www.pantsbuild.org/docs/installation
    [2] https://docs.backend.ai/en/latest/dev/daily-workflows.html

    View Slide

  20. The migration process
    How did we do it?

    View Slide

  21. Restructuring GitHub repos
    backend.ai-manager
    backend.ai-agent
    backend.ai-common
    backend.ai-webserver backend.ai-client-py
    backend.ai-storage-proxy
    backend.ai-webui backend.ai-client-js
    Backend.AI Core
    https://github.com/lablup/...
    Backend.AI Fronted
    https://github.com/lablup/backend.ai
    Unify!

    View Slide

  22. Restructuring dev-setup

    View Slide

  23. Mono-repo structure
    Unified package version
    Pants build config for each directory (like Bazel)
    "local-config" templates for Backend.AI Core
    Backend.AI developer documentation
    Backend.AI plugin development workspace
    Utility shell scripts for developers
    Backend.AI Core source codes
    Backend.AI Core test codes
    Pants main config
    Toolchain configs (flake8, mypy, pytest)
    Unified requirements for all components
    Unified requirements dependency lock
    Toolchain requirements dependency lock
    Our Pants plugin for custom setup.py generation
    Main entry scripts for daily use
    venvs & build artifacts generated by Pants
    VERSION
    **/BUILD
    configs/{manager,agent,common,...}
    docs/
    plugins/
    scripts/
    src/ai/backend/{manager,agent,common,...}
    tests/{manager,agent,common,...}
    pants.toml, pants.ci.toml
    pyproject.toml, .flake8
    requirements.txt
    python.lock
    tools/*.lock
    tools/pants-plugin/setupgen/
    ./pants, ./py, ./backend.ai
    dist/

    View Slide

  24. Migrating multi-repo
    setup.cfg, requirements/*.txt
    README.md
    changes/, CHANGELOG.md
    configs/
    scripts/
    src/ai/backend/{component}/
    tests/
    src/ai/backend/{component}/BUILD,
    VERSION, pants.toml, BUILD, requirements.txt
    src/ai/backend/{component}/README.md
    changes/, CHANGELOG.md
    configs/{component}/
    scripts/{component}/
    src/ai/backend/{component}/
    tests/{component}/
    (unified)
    (unified)
    __version__ = '22.03.1' from pathlib import Path
    __version__ = (
    Path(__file__).parent / 'VERSION'
    ).read_text().strip()
    $ cd src/ai/backend/{component}
    $ ln -s ../../../../VERSION
    src/ai/backend/{component}/__init__.py:
    (moved)
    (moved)
    (moved)
    (moved)
    (moved)

    View Slide

  25. The whole history
    • https://github.com/lablup/backend.ai/pull/417
    Started : Apr 27 / Merged: May 31 (168 commits) / many follow-up PRs afterwards
    ✓ The initial plan was two weeks, but as always... 😅
    More than 60 times of Q&A in the Pantsbuild community Slack
    Pants: 5 bug reports (all fixed now), 2 feature requests, 2 doc patches
    Pex: 3 bug reports triggering new releases
    ✓ Afternoon KST: bug report / Dinner KST: talk with developers / Night-morning
    KST: developers fix the issue and release / Next morning KST: apply the release
    • The size of mono-repo
    Backend.AI Core LoC: 74K+
    LoC including all external dependencies: 1.5M+

    View Slide

  26. Customzing Pants
    There is no silver bullet, as always...

    View Slide

  27. setup.py Generator
    • Pants plugin : tools/pants-plugins/setupgen
    • What's added
    Single-source the version number from
    the root's VERSION file
    Change long_description_type
    depending on the extension of README
    (.md, .rst)
    Change trove classifer depending on the
    version number suffix (a, b, rc)
    Add the license type argument so that
    each wheel package may have different
    licenses

    View Slide

  28. towncrier Tool
    • Pants plugin: tools/pants-plugins/towncrier
    • What's added
    Like black, isort, and flake8, defined a new "PythonTool" for towncrier
    Allows using independent venvs and dependency lockfiles for towncrier

    View Slide

  29. Platfrom-specific Deps
    • Pants plugin: tools/pants-plugins/platform_resources
    • What's added
    Use different resource files (pre-built executables)
    for Backend.AI Agent by the target platform
    argument
    Needed to rewrite the code upon Pants minor
    version updates as dependency management
    implemention in Pants has frequent updates
    (expecting to be stabilized soon)

    View Slide

  30. Dynamuic Module Loading
    • Wrote a module loader that searches & parses BUILD files using AST
    Backend.AI largely depends on entry_points of the package metadata for plugin
    and replacible module discoveries.
    https://github.com/lablup/backend.ai/blob/main/src/ai/backend/plugin/entrypoint.py
    • Use ./pants export :: and ./py wrapper script instead of ./pants run ...
    In PEX envs, there is neither BUILD files nor the package metadata.

    View Slide

  31. Experience after migration
    Importance of open source ecosystem

    View Slide

  32. Satisfying points
    • Decreased the time needed for making new release (hours → ~10 min.)
    Automated release-related workflows
    e.g., Generate GitHub's release note by extracting the latest section of
    CHANGELOG.md
    • Reduced code review burdens & context switching overheads
    One issue completes with one PR! (single file tree & single diff)
    Review all things together, including documentation
    Now we can utilize GitHub better: Projects v2 & Codespace
    • Minimized the impact to CI/CD execution times by taking diffs
    ./pants test --changed-since=main

    View Slide

  33. Adaptation required (1/3)
    • Requiring multiple installs of Python versions
    Pants requires Python 3.9 on Apple Silicon Macs / Backend.AI requires Python 3.10
    macOS Monterey (via XCode CLI tools) provides Python 3.8 by default
    Depending on when & how you have used Homebrew or pyenv,
    Python 3.9 may be missing!
    ✓ Fastest workaround: brew install [email protected] or pyenv local 3.9.13
    For new contributors with less experience on managing multiple Python versions,
    this becomes a huge hurdle!

    View Slide

  34. Adaptation required (2/3)
    • ImportError due to forgetting to pass PYTHONPATH environment variable
    Subprocesses should also take pants.toml's source_roots into account
    • ImportError due to dynamic imports
    importlib.import_module()
    SQLAlchemy selects which engine module to import based on the database URL
    Such dependencies should be manually specified in BUILD files.

    View Slide

  35. Adaptation required (3/3)
    • Unexpected amount of efforts to parallelize the test suite
    Port number conflicts of database containers created as a test fixture
    ✓ pants.toml: [pytest].execution_slot_var = "BACKEND_TEST_EXEC_SLOT"
    ✓ Use emphemeral port numbers and/or add the slot number to a fixed constant
    • Ubuntu 22.04 + Snap + Docker + /tmp
    Snap enforces Docker to use a private /tmp instead of the host /tmp.
    Mounting a /tmp sub-directory to fixture containers → unexpected failure
    It is not Pants-own problem, but test parallelization would lead to this pitfall in
    many cases.
    ✓ Workaround by using ./.tmp instead of /tmp

    View Slide

  36. Summary
    Recap & what's left

    View Slide

  37. Recap
    • Worth enough the efforts and time
    No slow-down of development process after migration, thanks to Pants!
    ✓ Introduced black + isort + git hook, automation of release note generation
    Opened many ways to exploit new features of GitHub
    ✓ Larger action runner to speed up CI, Projects v2, and Codespace
    • New concerns: unifying tracking of public & private issues
    • Friendly technical support from the Pantsbuild community Slack
    The core members and contributors are welcome to questions.
    I'm trying to contribute back as well! (bug reports, PyCon talks, etc.)
    • Pants is a highly recommended option if you consider Python-based mono-repo!

    View Slide

  38. About our project: Backend.AI
    • In short: "An all-in-one enterprise platform to develop and operate AI services"
    • It is an open-source project with enterprise plugins.
    https://github.com/lablup/backend.ai
    Contributed to & created many open-source libraries to support this project
    ✓ aiodocker, aiohttp, aiomonitor-ng, aiotools, aiotusclient, async-timeout, callosum
    (async RPC), click, etcetra (async etcd3 client), janus, pyzmq, ...

    View Slide

  39. About our project: Backend.AI
    • In short: "An all-in-one enterprise platform to develop and operate AI services"
    • We are opening a small exhibition booth in the Japan IT Week (Oct 26-28)!

    View Slide