Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How the Pants build system leverages Python 3 features

How the Pants build system leverages Python 3 features

Pants is a scalable, stable build system for monorepos, written in Rust and Python. Traditionally, hermeticity, invalidation, caching, concurrency, and remote execution have been difficult to implement generically in an extensible system. In this talk we'll show how the new Pants execution engine leverages Python 3 features, such as async coroutines, dataclasses, and type annotations, to provide these benefits automatically.

Benjy

June 09, 2021
Tweet

More Decks by Benjy

Other Decks in Programming

Transcript

  1. Pants Pants is an open-source build system, designed for: •

    Performance • Scalability • Reproducibility • Heterogeneity • Extensibility Pants supports repos of all sizes, and is particularly useful if you expect your repo to scale, even into a full-fledged "monorepo".
  2. Pants functionality Pants supports a wide variety of build functionality,

    including: • Resolving transitive dependencies • Running tests • Code generation • Linting and formatting • Packaging • Debugging It's easy to add your own custom build steps to Pants.
  3. Pants Python support In the Python domain, Pants uses standard

    underlying tools and formats: • Resolving: pex, pip • Running tests: pytest • Code generation: protoc • Linting and formatting: bandit, black, docformatter, flake8, isort, pylint. • Packaging: Executable .pex files, AWS Lambdas. • Debugging: Python REPL, iPython, pdb.
  4. Pants History • Started at Twitter in 2010. Picked up

    at Foursquare in 2011 and developed into a full-fledged open-source project. • Inspired by Google’s internal build system, Blaze (itself since open-sourced under the name Bazel).
  5. Invoking Pants Pants invocations use a "goal" paradigm rather than

    a "command" paradigm: ./pants test 'src/python/**/*_integration.py' means "give me the results of all the matching tests". This may mean actually running tests, but could also mean fetching results from cache. So rerunning the same goal with the same arguments is generally instantaneous.
  6. Pants OSS Governance • Apache 2 License • Regular releases

    • Stable API with strict deprecation cycles • Easy to install • Code of conduct • A Slack workspace See pants.readme.io
  7. Pants v1 vs. v2 The v1 Pants execution engine was

    almost ten years old, and showing its age. It did not sufficiently isolate build work from global state and side effects, so getting things like caching, concurrency and remote execution right was hard. The v2 engine is a complete redesign of Pants, based on lessons learned from v1.
  8. Implementation of Pants v2 There are two main layers: •

    An execution engine. ◦ Figures out the minimum work that needs to be done to achieve the requested goal, and sequences the work steps with as much concurrency as possible. • Actual build logic to perform that work. ◦ A lot of useful functionality is provided out of the box, and it's easy to add your own. The engine is written in Rust, for performance Build logic is written in Python, for ease of use
  9. Language Support in Pants v2 • Support for building Python

    code has been ported to v2 • Other language support is coming (fortunately this is a lot easier than in v1)
  10. What does the new engine provide? The engine model guarantees

    straightforward semantics for: • Fine-grained invalidation • Caching • Concurrency • Remote execution These are key to both performance and correctness in a build. As a custom build logic author, you don't need to worry about these. They fall out of the design, and you get them for free.
  11. Python 3 features The Pants v2 engine leverages the following

    Python 3 features to achieve those robust invalidation, caching, concurrency and remote execution semantics: • async/await coroutines • type annotations • dataclasses Let's see how!
  12. How are goals satisfied in v2? The engine operates on

    a collection of rules. A rule is a pure function (or, more precisely, a coroutine) that maps a set of statically-declared input types to an output type. @rule async def run_python_test(test_target: PythonTests, pytest: PyTest, python_setup: PythonSetup, test_options: TestOptions) -> TestResult: """Runs pytest for one target.""" ... Note the standard Python async and type annotation syntax.
  13. The rule graph • Rules are registered with the engine.

    • A rule says "given inputs of these types, I produce an output of this type". • The engine computes a graph representing these available type transitions. • A set of root types are provided by the system itself. • A goal is mapped to a final type that represents a result. • The engine recursively computes a path from "types we have" to "type we need". Computing this rule graph requires full type annotation on all rules.
  14. Rule graph validation and extension • The rules are statically

    validated for ambiguity, reachability, satisfiability. • Anyone can write and register additional rules, to extend functionality. No wiring necessary! • In case it seemed familiar: this is basically statically checked dependency injection ("static" in the sense that the entire rule graph is validated up-front, and won't fail arbitrarily at runtime).
  15. Type transitions • The engine transitions between types by invoking

    a rule on a set of inputs of given input types, resulting in an output of the desired output type. • The input types must be immutable and hashable. • This is typically achieved by making them frozen dataclasses. • Rules cannot rely on side-effects. Result: The output of any rule can be safely cached on the hash of its inputs.
  16. Rules are coroutines Rules declare inputs they know about in

    advance. But as a rule runs, if it decides it needs some other input, it yields back to the engine. pytest_binary = await Get[PyTest]( PytestConfig(version="pytest>=5.3.5,<5.4", plugins=["pytest-timeout>=1.3.4,<1.4", "pytest-cov>=2.8.1,<2.9"]) ) Again, this is standard Python 3 async and type annotation syntax. Note, however, that the event loop is run by the Pants engine, in Rust code, and not by asyncio.run().
  17. Rules are coroutines (contd.) This is very powerful! Rules are

    applied dynamically, on the fly, rather than execution being precomputed statically. However even in this case, rules are still statically validated for ambiguity, reachability, satisfiability. Result: Rule authors can apply a natural control flow, including branching and looping, in the context of a statically validated rule graph.
  18. Rules can express concurrency A rule can await multiple engine

    requests at once: test_results = await MultiGet( Get[TestResult](TestTarget, target) for target in targets ) The engine will execute these concurrently. And because I/O, process execution, and caching is implemented in Rust, those portions will frequently execute in parallel. (This is equivalent to asyncio.gather.)
  19. Rules must be pure Generally, rules must be deterministic, and

    must not have, or rely on, side-effects. E.g., instead of accessing the filesystem directly, you ask the engine to do it: init_files = await Get[Snapshot](PathGlobs(["**/__init__.py"])) Or, to access the network: snapshot = await Get[Snapshot](UrlToFetch("https://pants.readme.io")) Or, to run a process: result = await Get[ProcessResult](Process(argv=["/bin/echo", "hello world"])) Only the outermost rule that computes the final result may have certain side effects (e.g., to write results to local disk).
  20. In short Type annotations: Allow rules to be wired together

    automatically. Frozen dataclasses: Provide a stable fingerprint for all inputs and outputs, so that invalidation and caching is always correct (assuming rules are in fact side-effect free). async/await: Provide control points for the engine to: • retrieve cached results • execute rules concurrently • execute processes remotely
  21. Summary Python 3 features allow us to expose a simple

    programming model to a complex system: You write natural-looking Python 3 code, and things like caching, concurrency and remote execution "just happen". This is a testament to the design of Python 3!
  22. Thanks for listening! I'll be happy to take any questions.

    We're always happy to hear from you at: pants.readme.io [email protected]