Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Monorepos in Python

Monorepos in Python

PyCascades 2025

Avatar for Avik Basu

Avik Basu

March 01, 2025
Tweet

More Decks by Avik Basu

Other Decks in Technology

Transcript

  1. About Me • Sta ff Data Scientist at Intuit •

    Engineering + Data Science • Love PS5 games • Soccer + Tennis • Driving is therapy!
  2. What is a mono-repository? • a.k.a. Monolith • A single

    code base for multiple projects • Projects can be • Libraries • Microservices • Jupyter notebooks/explorations • Projects can be in more than 1 language
  3. Mono-repos When does it make sense? • Inter-related projects with

    a common goal • Small to medium level code-base • Early stage projects • Better development velocity • Easier promotion • Di ff erent components have similar change rate • System-level extensions, e.g. C/C++/Rust
  4. Python Mono-repos Advantages • One place to look through everything

    • Can improve development velocity (sometimes) • Easier promotion of open-source projects (think GitHub ⭐) • Better management of other language extensions, e.g. C/C++/Rust • Popular mono-repositories • pytorch-lightning • azure-sdk-for-python
  5. Python Mono-repos Disadvantages • Versioning decisions • CI/CD complexity •

    Merging and con fl icts • Code ownership issues • Need for specialized tools • Unintended code breakage
  6. Project Scenario Data Science • 2 libraries • 1 core

    library containing ML models and related tools • 1 supporting library for connecting to di ff erent data sources • 1 microservice • FastAPI app with a docker container • Serves real-time inference • 1 notebook project • Contains training jobs in Jupyter notebooks
  7. Open-source tools For managing mono-repos • uv workspace • Allows

    to manage multiple projects • As a bonus uv is a great package management tool • Poetry mono-ranger plugin • Early stage • nx • Language agnostic • Pants • Multi-language
  8. Version management For multiple libraries in the repo A. One

    version for the whole repo ‣ Simpler to manage ‣ Easier to keep track ‣ Unnecessary updates for libraries with no changes
 
 
 B. Each library has its own version ‣ No unnecessary updates ‣ Di ffi cult to maintain ‣ Essential to have a compatibility matrix ‣ Needs git submodules