Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How the Pants build system leverages Python 3 features

How the Pants build system leverages Python 3 features

Pants is a scalable, stable build system for monorepos, written in Rust and Python. Traditionally, hermeticity, invalidation, caching, concurrency, and remote execution have been difficult to implement generically in an extensible system. In this talk we'll show how the new Pants execution engine leverages Python 3 features, such as async coroutines, dataclasses, and type annotations, to provide these benefits automatically.

Benjy

June 09, 2021
Tweet

More Decks by Benjy

Other Decks in Programming

Transcript

  1. How the Pants build system
    leverages Python 3 features
    Benjy Weinberger
    Co-founder, Toolchain

    View Slide

  2. Overview
    1. What is Pants?
    2. How does Pants leverage Python 3?

    View Slide

  3. 1. What is Pants?

    View Slide

  4. Pants
    Pants is an open-source build system, designed for:
    ● Performance
    ● Scalability
    ● Reproducibility
    ● Heterogeneity
    ● Extensibility
    Pants supports repos of all sizes, and is particularly useful if you expect your repo
    to scale, even into a full-fledged "monorepo".

    View Slide

  5. Pants functionality
    Pants supports a wide variety of build functionality, including:
    ● Resolving transitive dependencies
    ● Running tests
    ● Code generation
    ● Linting and formatting
    ● Packaging
    ● Debugging
    It's easy to add your own custom build steps to Pants.

    View Slide

  6. Pants Python support
    In the Python domain, Pants uses standard underlying tools and formats:
    ● Resolving: pex, pip
    ● Running tests: pytest
    ● Code generation: protoc
    ● Linting and formatting: bandit, black, docformatter, flake8, isort, pylint.
    ● Packaging: Executable .pex files, AWS Lambdas.
    ● Debugging: Python REPL, iPython, pdb.

    View Slide

  7. Pants History
    ● Started at Twitter in 2010. Picked up at Foursquare in 2011 and developed
    into a full-fledged open-source project.
    ● Inspired by Google’s internal build system, Blaze (itself since open-sourced
    under the name Bazel).

    View Slide

  8. Invoking Pants
    Pants invocations use a "goal" paradigm rather than a "command" paradigm:
    ./pants test 'src/python/**/*_integration.py'
    means "give me the results of all the matching tests".
    This may mean actually running tests, but could also mean fetching results from
    cache. So rerunning the same goal with the same arguments is generally
    instantaneous.

    View Slide

  9. Pants OSS Governance
    ● Apache 2 License
    ● Regular releases
    ● Stable API with strict deprecation cycles
    ● Easy to install
    ● Code of conduct
    ● A Slack workspace
    See pants.readme.io

    View Slide

  10. Pants v1 vs. v2
    The v1 Pants execution engine was almost ten years old, and showing its age.
    It did not sufficiently isolate build work from global state and side effects, so
    getting things like caching, concurrency and remote execution right was hard.
    The v2 engine is a complete redesign of Pants, based on lessons learned from v1.

    View Slide

  11. Implementation of Pants v2
    There are two main layers:
    ● An execution engine.
    ○ Figures out the minimum work that needs to be done to achieve the requested goal, and
    sequences the work steps with as much concurrency as possible.
    ● Actual build logic to perform that work.
    ○ A lot of useful functionality is provided out of the box, and it's easy to add your own.
    The engine is written in Rust, for performance
    Build logic is written in Python, for ease of use

    View Slide

  12. Language Support in Pants v2
    ● Support for building Python code has been ported to v2
    ● Other language support is coming (fortunately this is a lot easier than in v1)

    View Slide

  13. 2. How does Pants leverage Python 3?

    View Slide

  14. What does the new engine provide?
    The engine model guarantees straightforward semantics for:
    ● Fine-grained invalidation
    ● Caching
    ● Concurrency
    ● Remote execution
    These are key to both performance and correctness in a build.
    As a custom build logic author, you don't need to worry about these. They fall out
    of the design, and you get them for free.

    View Slide

  15. Python 3 features
    The Pants v2 engine leverages the following Python 3 features to achieve those
    robust invalidation, caching, concurrency and remote execution semantics:
    ● async/await coroutines
    ● type annotations
    ● dataclasses
    Let's see how!

    View Slide

  16. How are goals satisfied in v2?
    The engine operates on a collection of rules.
    A rule is a pure function (or, more precisely, a coroutine) that maps a set of
    statically-declared input types to an output type.
    @rule
    async def run_python_test(test_target: PythonTests,
    pytest: PyTest,
    python_setup: PythonSetup,
    test_options: TestOptions) -> TestResult:
    """Runs pytest for one target."""
    ...
    Note the standard Python async and type annotation syntax.

    View Slide

  17. The rule graph
    ● Rules are registered with the engine.
    ● A rule says "given inputs of these types, I produce an output of this type".
    ● The engine computes a graph representing these available type transitions.
    ● A set of root types are provided by the system itself.
    ● A goal is mapped to a final type that represents a result.
    ● The engine recursively computes a path from "types we have" to
    "type we need".
    Computing this rule graph requires full type annotation on all rules.

    View Slide

  18. Rule graph validation and extension
    ● The rules are statically validated for ambiguity, reachability, satisfiability.
    ● Anyone can write and register additional rules, to extend functionality. No
    wiring necessary!
    ● In case it seemed familiar: this is basically statically checked dependency
    injection ("static" in the sense that the entire rule graph is validated up-front,
    and won't fail arbitrarily at runtime).

    View Slide

  19. Type transitions
    ● The engine transitions between types by invoking a rule on a set of inputs of
    given input types, resulting in an output of the desired output type.
    ● The input types must be immutable and hashable.
    ● This is typically achieved by making them frozen dataclasses.
    ● Rules cannot rely on side-effects.
    Result: The output of any rule can be safely cached on the hash of its inputs.

    View Slide

  20. Rules are coroutines
    Rules declare inputs they know about in advance. But as a rule runs, if it decides it
    needs some other input, it yields back to the engine.
    pytest_binary = await Get[PyTest](
    PytestConfig(version="pytest>=5.3.5,<5.4",
    plugins=["pytest-timeout>=1.3.4,<1.4",
    "pytest-cov>=2.8.1,<2.9"])
    )
    Again, this is standard Python 3 async and type annotation syntax.
    Note, however, that the event loop is run by the Pants engine, in Rust code, and
    not by asyncio.run().

    View Slide

  21. Rules are coroutines (contd.)
    This is very powerful! Rules are applied dynamically, on the fly, rather than
    execution being precomputed statically.
    However even in this case, rules are still statically validated for ambiguity,
    reachability, satisfiability.
    Result: Rule authors can apply a natural control flow, including branching and
    looping, in the context of a statically validated rule graph.

    View Slide

  22. Rules can express concurrency
    A rule can await multiple engine requests at once:
    test_results = await MultiGet(
    Get[TestResult](TestTarget, target)
    for target in targets
    )
    The engine will execute these concurrently. And because I/O, process execution,
    and caching is implemented in Rust, those portions will frequently execute in
    parallel.
    (This is equivalent to asyncio.gather.)

    View Slide

  23. Rules must be pure
    Generally, rules must be deterministic, and must not have, or rely on, side-effects.
    E.g., instead of accessing the filesystem directly, you ask the engine to do it:
    init_files = await Get[Snapshot](PathGlobs(["**/__init__.py"]))
    Or, to access the network:
    snapshot = await Get[Snapshot](UrlToFetch("https://pants.readme.io"))
    Or, to run a process:
    result = await Get[ProcessResult](Process(argv=["/bin/echo", "hello world"]))
    Only the outermost rule that computes the final result may have certain side
    effects (e.g., to write results to local disk).

    View Slide

  24. In short
    Type annotations:
    Allow rules to be wired together automatically.
    Frozen dataclasses:
    Provide a stable fingerprint for all inputs and outputs, so that invalidation and
    caching is always correct (assuming rules are in fact side-effect free).
    async/await:
    Provide control points for the engine to:
    ● retrieve cached results
    ● execute rules concurrently
    ● execute processes remotely

    View Slide

  25. Summary
    Python 3 features allow us to expose a simple programming model to a complex
    system: You write natural-looking Python 3 code, and things like caching,
    concurrency and remote execution "just happen".
    This is a testament to the design of Python 3!

    View Slide

  26. Thanks for listening!
    I'll be happy to take any questions.
    We're always happy to hear from you at:
    pants.readme.io
    [email protected]

    View Slide