Python Monorepos:
What, Why and How
Benjy Weinberger
Maintainer, Pants Build
EuroPython 2021
Slide 2
Slide 2 text
About me
● 25 years' experience as a
Software Engineer.
● Worked at Check Point, Google,
Twitter, Foursquare.
● Maintainer of the Pants OSS project.
● Co-founder of Toolchain.
Slide 3
Slide 3 text
Overview
1. What is a monorepo
2. Why would I want one?
3. Tooling for a Python monorepo
Slide 4
Slide 4 text
1. What is a monorepo?
Slide 5
Slide 5 text
A common codebase characteristic
They
g r o w
over time.
Slide 6
Slide 6 text
A common consequence of growth
Builds get harder: slower, less manageable
Slide 7
Slide 7 text
Two ways to scale your codebase
Multi-repo vs. Monorepo
Slide 8
Slide 8 text
Multi-repo
Split the codebase into growing numbers of small
repos, along team or project boundaries.
Slide 9
Slide 9 text
Monorepo
A monorepo is a unified codebase containing
code for multiple projects that share underlying
dependencies, data models, functionality, tooling
and processes.
Slide 10
Slide 10 text
monorepo != monolithic server
Monorepos are often great for microservices.
Slide 11
Slide 11 text
2. Why should I want a monorepo?
Slide 12
Slide 12 text
Multi-repo kinda sounds better at first
More decentralized. More bottom-up.
I can do my own thing in my own repo.
Slide 13
Slide 13 text
But, for some core problems...
Multi-repo doesn't solve them.
It hides them.
And it creates new ones.
Multi-repo relies on publishing
For code from repo A to be consumed by other repos,
it must publish an artifact, such as an sdist or wheel.
A-1.2.0
Slide 16
Slide 16 text
Multi-repo relies on versioning
When repo A makes a change, it has to re-publish
under a new version.
A-1.3.0
A-1.2.0
Slide 17
Slide 17 text
Say repo B depends on repo A
It does so at a specific version:
B-4.1.0 A-1.2.0
Slide 18
Slide 18 text
When repo B needs a change in repo A
Modify A, publish it at a new version, and consume
that new version in a new version of B.
Now, you have two choices...
B-4.2.0 A-1.3.0
Slide 19
Slide 19 text
Change management: virtuous choice
1. Find all the consumers of repo A
2. Ensure that they still work at A-1.3.0
3. Make changes as needed until tests pass
4. Repeat - recursively! - for all repos you changed
Slide 20
Slide 20 text
Change management: lazy choice
Don't worry about the other consumers of repo A.
After all, they're safely pinned to A-1.2.0.
Let them deal with the problems when they upgrade.
But...
Slide 21
Slide 21 text
Dependency hell
This causes a huge dependency resolution problem.
C-1.8.0
B-4.2.0 A-1.3.0
A-1.2.0
Slide 22
Slide 22 text
But in a monorepo
There is no versioning or publishing.
All the consumers are right there in the same repo.
Breakages are immediately visible.
Slide 23
Slide 23 text
Monorepos can be more flexible
Easier to refactor
Easier to debug
Easier to discover and reuse code
Unified change history
Slide 24
Slide 24 text
Your codebase ➜ your organization
Balkanized codebase ➜ balkanized org
Unified codebase ➜ unified org
Slide 25
Slide 25 text
3. Tooling for a Python monorepo
Slide 26
Slide 26 text
Build Performance At Scale
Standard Python tools not designed for monorepos.
● Global state.
● Side effects.
● Small changes trigger full reruns.
Slide 27
Slide 27 text
How to speed things up
Do less work
● Fine-grained invalidation
● Caching
Do more work at once
● Concurrency
● Remote execution
Slide 28
Slide 28 text
What kind of tooling has these features?
To work effectively, you need a build system
designed for monorepos.
It sits on top of existing standard tooling, and
orchestrates them for you.
Slide 29
Slide 29 text
Examples of such tools include
● Pants
● Bazel
● Buck
Slide 30
Slide 30 text
How do these tools work?
● Goal-based command interface
● Reliance on build graph metadata
● Extensible workflow with no side-effects
Slide 31
Slide 31 text
Goals
A monorepo build system typically supports
requesting goals on specific inputs.
$ pants test src/python/foo/bar/test.py
$ pants package src/python/foo/**
$ pants lint fmt --changed-since=HEAD
Slide 32
Slide 32 text
Code dependencies
A monorepo build system requires extra metadata to
describe the build graph: the units of code and the
dependencies between them.
Slide 33
Slide 33 text
Task dependencies
A monorepo build system maintains the rule graph:
The units of work and the dependencies between
them.
Custom rules can be plugged in, for extensibility.
Slide 34
Slide 34 text
Build workflow
Code dependencies + task dependencies = workflow.
Recursively maps initial inputs to final outputs.
● side effect-free
● No global state
Slide 35
Slide 35 text
The explicitly-modelled workflow enables
● fine-grained invalidation
● caching
● concurrency
● remote execution
Which is what makes builds scale with your
codebase!
Slide 36
Slide 36 text
Summary
● Monorepos are an effective codebase
architecture
● They require appropriate tooling for performance
and reliability at scale
● This tooling exists!
Slide 37
Slide 37 text
Thanks for attending!
You can find us on https://www.pantsbuild.org/, we're
a friendly OSS community, always happy to assist.
I'll be happy to take any questions.