Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Python monorepos: what, why and how

Benjy
June 09, 2021

Python monorepos: what, why and how

This talk will describe the monorepo codebase architecture, explain why you might want to use it for your Python code, and what kind of tooling you need to work effectively in it.

As organizations and repos grow, we have to choose how to manage codebases in a scalable way. We have two architectural alternatives:

- Multirepo: split the codebase into increasing numbers of small repos, along team or project boundaries.
- Monorepo: Maintain one large repository containing code for many projects and libraries, with multiple teams collaborating across it.

In this talk we'll discuss the pros and cons of monorepos for Python codebases, and the kinds of tooling and processes we can use to make working in a Python monorepo effective.

Benjy

June 09, 2021
Tweet

More Decks by Benjy

Other Decks in Programming

Transcript

  1. Python Monorepos:
    What, Why and How
    Benjy Weinberger
    Maintainer, Pants Build
    PyCon Israel 2021

    View Slide

  2. About me
    ● 25 years' experience as a
    Software Engineer.
    ● Worked at Check Point, Google,
    Twitter, Foursquare.
    ● Maintainer of the Pants OSS project.
    ● Co-founder of Toolchain.

    View Slide

  3. Overview
    1. What is a monorepo
    2. Why would I want one?
    3. Tooling for a Python monorepo

    View Slide

  4. 1. What is a monorepo?

    View Slide

  5. A common codebase characteristic
    They
    g r o w
    over time.

    View Slide

  6. A common consequence of growth
    Builds get harder: slower, less manageable

    View Slide

  7. Two ways to scale your codebase
    Multi-repo vs. Monorepo

    View Slide

  8. Multi-repo
    Split the codebase into growing numbers of small
    repos, along team or project boundaries.

    View Slide

  9. Monorepo
    A monorepo is a unified codebase containing
    code for multiple projects that share underlying
    dependencies, data models, functionality, tooling
    and processes.

    View Slide

  10. monorepo != monolithic server
    Monorepos are often great for microservices.

    View Slide

  11. 2. Why should I want a monorepo?

    View Slide

  12. Multi-repo kinda sounds better at first
    More decentralized. More bottom-up.
    I can do my own thing in my own repo.

    View Slide

  13. But, for some core problems...
    Multi-repo doesn't solve them.
    It hides them.
    And it creates new ones.

    View Slide

  14. The hardest codebase problems are...
    Managing
    Changes
    😤
    Managing
    Dependencies
    😡

    View Slide

  15. Multi-repo relies on publishing
    For code from repo A to be consumed by other repos,
    it must publish an artifact, such as an sdist or wheel.
    A-1.2.0

    View Slide

  16. Multi-repo relies on versioning
    When repo A makes a change, it has to re-publish
    under a new version.
    A-1.3.0
    A-1.2.0

    View Slide

  17. Say repo B depends on repo A
    It does so at a specific version:
    B-4.1.0 A-1.2.0

    View Slide

  18. When repo B needs a change in repo A
    Modify A, publish it at a new version, and consume
    that new version in a new version of B.
    Now, you have two choices...
    B-4.2.0 A-1.3.0

    View Slide

  19. Change management: virtuous choice
    1. Find all the consumers of repo A
    2. Ensure that they still work at A-1.3.0
    3. Make changes as needed until tests pass
    4. Repeat - recursively! - for all repos you changed

    View Slide

  20. Change management: lazy choice
    Don't worry about the other consumers of repo A.
    After all, they're safely pinned to A-1.2.0.
    Let them deal with the problems when they upgrade.
    But...

    View Slide

  21. Dependency hell
    This causes a huge dependency resolution problem.
    C-1.8.0
    B-4.2.0 A-1.3.0
    A-1.2.0

    View Slide

  22. But in a monorepo
    There is no versioning or publishing.
    All the consumers are right there in the same repo.
    Breakages are immediately visible.

    View Slide

  23. Monorepos are more flexible
    Easier to refactor
    Easier to debug
    Easier to discover and reuse code
    Unified change history

    View Slide

  24. Your codebase ➜ your organization
    Balkanized codebase ➜ balkanized org
    Unified codebase ➜ unified org

    View Slide

  25. 3. Tooling for a Python monorepo

    View Slide

  26. Build Performance At Scale
    Standard Python tools not designed for monorepos.
    Small changes trigger full rebuilds.
    As your codebase grows, so do your build times.

    View Slide

  27. How to speed things up
    Do less work
    ● Fine-grained invalidation
    ● Caching
    Do more work at once
    ● Concurrency
    ● Remote execution

    View Slide

  28. What kind of tooling has these features?
    To work effectively, you need a build system
    designed for monorepos.
    It sits on top of existing standard tooling, and
    orchestrates them for you.

    View Slide

  29. Examples of such tools include
    Pants
    Bazel
    Buck

    View Slide

  30. How do these tools work?
    ● Goal-based command interface
    ● Build graph metadata
    ● Extensible workflow with no side-effects

    View Slide

  31. Goals
    A monorepo build system typically supports
    requesting goals on specific inputs.
    $ pants test src/python/foo/bar/test.py
    $ pants package src/python/foo/**
    $ pants lint fmt --changed-since=HEAD

    View Slide

  32. Code dependencies
    A monorepo build system requires extra metadata to
    describe the build graph: the units of code and the
    dependencies between them.

    View Slide

  33. Task dependencies
    A monorepo build system maintains the rule graph:
    The units of work and the dependencies between
    them.
    Custom rules can be plugged in, for extensibility.

    View Slide

  34. Build workflow
    Code dependencies + task dependencies = workflow.
    Workflow recursively maps initial inputs to final
    outputs - the goals the user requested.
    Workflow is side effect-free.

    View Slide

  35. The explicitly-modelled workflow enables
    ● fine-grained invalidation
    ● caching
    ● concurrency
    ● remote execution
    Which is what makes builds scale with your
    codebase!

    View Slide

  36. Summary
    ● Monorepos are an effective codebase
    architecture
    ● They require appropriate tooling for performance
    and reliability at scale
    ● This tooling exists!

    View Slide

  37. Thanks for attending!
    I'll be happy to take any questions.
    You can also find us on https://www.pantsbuild.org/

    View Slide