Slide 1

Slide 1 text

How the Pants build system leverages Python 3 features Benjy Weinberger Co-founder, Toolchain

Slide 2

Slide 2 text

Overview 1. What is Pants? 2. How does Pants leverage Python 3?

Slide 3

Slide 3 text

1. What is Pants?

Slide 4

Slide 4 text

Pants Pants is an open-source build system, designed for: ● Performance ● Scalability ● Reproducibility ● Heterogeneity ● Extensibility Pants supports repos of all sizes, and is particularly useful if you expect your repo to scale, even into a full-fledged "monorepo".

Slide 5

Slide 5 text

Pants functionality Pants supports a wide variety of build functionality, including: ● Resolving transitive dependencies ● Running tests ● Code generation ● Linting and formatting ● Packaging ● Debugging It's easy to add your own custom build steps to Pants.

Slide 6

Slide 6 text

Pants Python support In the Python domain, Pants uses standard underlying tools and formats: ● Resolving: pex, pip ● Running tests: pytest ● Code generation: protoc ● Linting and formatting: bandit, black, docformatter, flake8, isort, pylint. ● Packaging: Executable .pex files, AWS Lambdas. ● Debugging: Python REPL, iPython, pdb.

Slide 7

Slide 7 text

Pants History ● Started at Twitter in 2010. Picked up at Foursquare in 2011 and developed into a full-fledged open-source project. ● Inspired by Google’s internal build system, Blaze (itself since open-sourced under the name Bazel).

Slide 8

Slide 8 text

Invoking Pants Pants invocations use a "goal" paradigm rather than a "command" paradigm: ./pants test 'src/python/**/*_integration.py' means "give me the results of all the matching tests". This may mean actually running tests, but could also mean fetching results from cache. So rerunning the same goal with the same arguments is generally instantaneous.

Slide 9

Slide 9 text

Pants OSS Governance ● Apache 2 License ● Regular releases ● Stable API with strict deprecation cycles ● Easy to install ● Code of conduct ● A Slack workspace See pants.readme.io

Slide 10

Slide 10 text

Pants v1 vs. v2 The v1 Pants execution engine was almost ten years old, and showing its age. It did not sufficiently isolate build work from global state and side effects, so getting things like caching, concurrency and remote execution right was hard. The v2 engine is a complete redesign of Pants, based on lessons learned from v1.

Slide 11

Slide 11 text

Implementation of Pants v2 There are two main layers: ● An execution engine. ○ Figures out the minimum work that needs to be done to achieve the requested goal, and sequences the work steps with as much concurrency as possible. ● Actual build logic to perform that work. ○ A lot of useful functionality is provided out of the box, and it's easy to add your own. The engine is written in Rust, for performance Build logic is written in Python, for ease of use

Slide 12

Slide 12 text

Language Support in Pants v2 ● Support for building Python code has been ported to v2 ● Other language support is coming (fortunately this is a lot easier than in v1)

Slide 13

Slide 13 text

2. How does Pants leverage Python 3?

Slide 14

Slide 14 text

What does the new engine provide? The engine model guarantees straightforward semantics for: ● Fine-grained invalidation ● Caching ● Concurrency ● Remote execution These are key to both performance and correctness in a build. As a custom build logic author, you don't need to worry about these. They fall out of the design, and you get them for free.

Slide 15

Slide 15 text

Python 3 features The Pants v2 engine leverages the following Python 3 features to achieve those robust invalidation, caching, concurrency and remote execution semantics: ● async/await coroutines ● type annotations ● dataclasses Let's see how!

Slide 16

Slide 16 text

How are goals satisfied in v2? The engine operates on a collection of rules. A rule is a pure function (or, more precisely, a coroutine) that maps a set of statically-declared input types to an output type. @rule async def run_python_test(test_target: PythonTests, pytest: PyTest, python_setup: PythonSetup, test_options: TestOptions) -> TestResult: """Runs pytest for one target.""" ... Note the standard Python async and type annotation syntax.

Slide 17

Slide 17 text

The rule graph ● Rules are registered with the engine. ● A rule says "given inputs of these types, I produce an output of this type". ● The engine computes a graph representing these available type transitions. ● A set of root types are provided by the system itself. ● A goal is mapped to a final type that represents a result. ● The engine recursively computes a path from "types we have" to "type we need". Computing this rule graph requires full type annotation on all rules.

Slide 18

Slide 18 text

Rule graph validation and extension ● The rules are statically validated for ambiguity, reachability, satisfiability. ● Anyone can write and register additional rules, to extend functionality. No wiring necessary! ● In case it seemed familiar: this is basically statically checked dependency injection ("static" in the sense that the entire rule graph is validated up-front, and won't fail arbitrarily at runtime).

Slide 19

Slide 19 text

Type transitions ● The engine transitions between types by invoking a rule on a set of inputs of given input types, resulting in an output of the desired output type. ● The input types must be immutable and hashable. ● This is typically achieved by making them frozen dataclasses. ● Rules cannot rely on side-effects. Result: The output of any rule can be safely cached on the hash of its inputs.

Slide 20

Slide 20 text

Rules are coroutines Rules declare inputs they know about in advance. But as a rule runs, if it decides it needs some other input, it yields back to the engine. pytest_binary = await Get[PyTest]( PytestConfig(version="pytest>=5.3.5,<5.4", plugins=["pytest-timeout>=1.3.4,<1.4", "pytest-cov>=2.8.1,<2.9"]) ) Again, this is standard Python 3 async and type annotation syntax. Note, however, that the event loop is run by the Pants engine, in Rust code, and not by asyncio.run().

Slide 21

Slide 21 text

Rules are coroutines (contd.) This is very powerful! Rules are applied dynamically, on the fly, rather than execution being precomputed statically. However even in this case, rules are still statically validated for ambiguity, reachability, satisfiability. Result: Rule authors can apply a natural control flow, including branching and looping, in the context of a statically validated rule graph.

Slide 22

Slide 22 text

Rules can express concurrency A rule can await multiple engine requests at once: test_results = await MultiGet( Get[TestResult](TestTarget, target) for target in targets ) The engine will execute these concurrently. And because I/O, process execution, and caching is implemented in Rust, those portions will frequently execute in parallel. (This is equivalent to asyncio.gather.)

Slide 23

Slide 23 text

Rules must be pure Generally, rules must be deterministic, and must not have, or rely on, side-effects. E.g., instead of accessing the filesystem directly, you ask the engine to do it: init_files = await Get[Snapshot](PathGlobs(["**/__init__.py"])) Or, to access the network: snapshot = await Get[Snapshot](UrlToFetch("https://pants.readme.io")) Or, to run a process: result = await Get[ProcessResult](Process(argv=["/bin/echo", "hello world"])) Only the outermost rule that computes the final result may have certain side effects (e.g., to write results to local disk).

Slide 24

Slide 24 text

In short Type annotations: Allow rules to be wired together automatically. Frozen dataclasses: Provide a stable fingerprint for all inputs and outputs, so that invalidation and caching is always correct (assuming rules are in fact side-effect free). async/await: Provide control points for the engine to: ● retrieve cached results ● execute rules concurrently ● execute processes remotely

Slide 25

Slide 25 text

Summary Python 3 features allow us to expose a simple programming model to a complex system: You write natural-looking Python 3 code, and things like caching, concurrency and remote execution "just happen". This is a testament to the design of Python 3!

Slide 26

Slide 26 text

Thanks for listening! I'll be happy to take any questions. We're always happy to hear from you at: pants.readme.io [email protected]