Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Declarative Configs for Maintainable Reproducib...

Declarative Configs for Maintainable Reproducible Code

Slides for the talk at the PyCon US 2021

The code examples can be found at https://github.com/jstriebel/declarative-configs

Jonathan Striebel

April 30, 2021
Tweet

Other Decks in Programming

Transcript

  1. Large Scale Data-Analysis Experiments 50+ TB image data Machine Learning Systems

    • Training Data • Evaluation Data • Week-long Training Segmentation & Agglomeration Algorithms Scalability in HPC Clusters Neuron Reconstructions for Biological Analysis
  2. Experiments Hierarchy Months / Years Changing, adding & removing Features

    dataset 1 feature A dataset 1 feature A + B dataset 1 feature B + C dataset 1 feature B + D dataset 1 dataset 2 feature B + D dataset 2
  3. • Separation of Config & Code • Config Verification •

    Code Verification (Config Usage) • Automated Migrations Maintainability & Reproducibility
  4. config path: some_path thresh: 10 plot_x: X plot_y: Y experiment.py

    data = load("some_path") … find_outliers( data, thresh=10 ) … plot( data["X"], data["Y"] ) Declarative (data-only) Imperative
  5. Input Formats import argparse parser = argparse.ArgumentParser() parser.add_argument("path") args =

    parser.parse_args() args.path args.wrong_key CLI Arguments: $ python experiment.py \ --path somepath --thresh 10 --plot_x X … import typer def main(path: str): path typer.run(main) 💥❌😱 Runtime Error
  6. Input Formats CLI Arguments: $ python experiment.py \ --path somepath

    --thresh 10 --plot_x X … Files: $ python experiment.py config.yaml # or .json, .toml, … Environment Variables: $ export PATH=somepath THRESH=10 … $ python experiment.py import os os.environ["path"] import typer def main(path: str): path typer.run(main) import yaml with open("…") as f: c = yaml.safe_load(f) c["path"]
  7. Representations Basic Python Types: dict, list, string, int, float, bool

    config = { "path": "somepath", "thresh": 10, "plot_x": "X", "plot_y": "Y", } Objects (e.g. with attrs) import attr @attr.s(auto_attribs=True) class ConfigSchema: path: str thresh: int plot_x: str plot_y: str config["wrong_key"] config.wrong_key structuring: cattrs 🎉 cattr.structure( config, ConfigSchema ) ⚠ Static Type-Checks (e.g. with mypy) 💥❌😱 Runtime Error
  8. Input: Type Checkers: • mypy • pytype • … (Tracking)

    sacred, MLflow, Guild Converters • typedload • dacite • pydantic • cattrs • CLI-parameters • environment variables • json / yaml / toml / ini / … Representations • TypedDict • NamedTuple • dataclasses • pydantic • attrs
  9. • Separation of Config & Code • Config Verification •

    Code Verification (Config Usage) • Automated Migrations declarative configs ✔ cattrs ✔ mypy ✔ evolutions ✔ Maintainability & Reproducibility Maintainability & Reproducibility Maintainability & Reproducibility Jonathan Striebel | [email protected] @jostriebel