Slide 1

Slide 1 text

Python Monorepos: What, Why and How Benjy Weinberger Maintainer, Pants Build EuroPython 2021

Slide 2

Slide 2 text

About me ● 25 years' experience as a Software Engineer. ● Worked at Check Point, Google, Twitter, Foursquare. ● Maintainer of the Pants OSS project. ● Co-founder of Toolchain.

Slide 3

Slide 3 text

Overview 1. What is a monorepo 2. Why would I want one? 3. Tooling for a Python monorepo

Slide 4

Slide 4 text

1. What is a monorepo?

Slide 5

Slide 5 text

A common codebase characteristic They g r o w over time.

Slide 6

Slide 6 text

A common consequence of growth Builds get harder: slower, less manageable

Slide 7

Slide 7 text

Two ways to scale your codebase Multi-repo vs. Monorepo

Slide 8

Slide 8 text

Multi-repo Split the codebase into growing numbers of small repos, along team or project boundaries.

Slide 9

Slide 9 text

Monorepo A monorepo is a unified codebase containing code for multiple projects that share underlying dependencies, data models, functionality, tooling and processes.

Slide 10

Slide 10 text

monorepo != monolithic server Monorepos are often great for microservices.

Slide 11

Slide 11 text

2. Why should I want a monorepo?

Slide 12

Slide 12 text

Multi-repo kinda sounds better at first More decentralized. More bottom-up. I can do my own thing in my own repo.

Slide 13

Slide 13 text

But, for some core problems... Multi-repo doesn't solve them. It hides them. And it creates new ones.

Slide 14

Slide 14 text

The hardest codebase problems are... Managing Changes 😤 Managing Dependencies 😡

Slide 15

Slide 15 text

Multi-repo relies on publishing For code from repo A to be consumed by other repos, it must publish an artifact, such as an sdist or wheel. A-1.2.0

Slide 16

Slide 16 text

Multi-repo relies on versioning When repo A makes a change, it has to re-publish under a new version. A-1.3.0 A-1.2.0

Slide 17

Slide 17 text

Say repo B depends on repo A It does so at a specific version: B-4.1.0 A-1.2.0

Slide 18

Slide 18 text

When repo B needs a change in repo A Modify A, publish it at a new version, and consume that new version in a new version of B. Now, you have two choices... B-4.2.0 A-1.3.0

Slide 19

Slide 19 text

Change management: virtuous choice 1. Find all the consumers of repo A 2. Ensure that they still work at A-1.3.0 3. Make changes as needed until tests pass 4. Repeat - recursively! - for all repos you changed

Slide 20

Slide 20 text

Change management: lazy choice Don't worry about the other consumers of repo A. After all, they're safely pinned to A-1.2.0. Let them deal with the problems when they upgrade. But...

Slide 21

Slide 21 text

Dependency hell This causes a huge dependency resolution problem. C-1.8.0 B-4.2.0 A-1.3.0 A-1.2.0

Slide 22

Slide 22 text

But in a monorepo There is no versioning or publishing. All the consumers are right there in the same repo. Breakages are immediately visible.

Slide 23

Slide 23 text

Monorepos can be more flexible Easier to refactor Easier to debug Easier to discover and reuse code Unified change history

Slide 24

Slide 24 text

Your codebase ➜ your organization Balkanized codebase ➜ balkanized org Unified codebase ➜ unified org

Slide 25

Slide 25 text

3. Tooling for a Python monorepo

Slide 26

Slide 26 text

Build Performance At Scale Standard Python tools not designed for monorepos. ● Global state. ● Side effects. ● Small changes trigger full reruns.

Slide 27

Slide 27 text

How to speed things up Do less work ● Fine-grained invalidation ● Caching Do more work at once ● Concurrency ● Remote execution

Slide 28

Slide 28 text

What kind of tooling has these features? To work effectively, you need a build system designed for monorepos. It sits on top of existing standard tooling, and orchestrates them for you.

Slide 29

Slide 29 text

Examples of such tools include ● Pants ● Bazel ● Buck

Slide 30

Slide 30 text

How do these tools work? ● Goal-based command interface ● Reliance on build graph metadata ● Extensible workflow with no side-effects

Slide 31

Slide 31 text

Goals A monorepo build system typically supports requesting goals on specific inputs. $ pants test src/python/foo/bar/test.py $ pants package src/python/foo/** $ pants lint fmt --changed-since=HEAD

Slide 32

Slide 32 text

Code dependencies A monorepo build system requires extra metadata to describe the build graph: the units of code and the dependencies between them.

Slide 33

Slide 33 text

Task dependencies A monorepo build system maintains the rule graph: The units of work and the dependencies between them. Custom rules can be plugged in, for extensibility.

Slide 34

Slide 34 text

Build workflow Code dependencies + task dependencies = workflow. Recursively maps initial inputs to final outputs. ● side effect-free ● No global state

Slide 35

Slide 35 text

The explicitly-modelled workflow enables ● fine-grained invalidation ● caching ● concurrency ● remote execution Which is what makes builds scale with your codebase!

Slide 36

Slide 36 text

Summary ● Monorepos are an effective codebase architecture ● They require appropriate tooling for performance and reliability at scale ● This tooling exists!

Slide 37

Slide 37 text

Thanks for attending! You can find us on https://www.pantsbuild.org/, we're a friendly OSS community, always happy to assist. I'll be happy to take any questions.