Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Declarative Configs for Maintainable Reproducib...

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.

Declarative Configs for Maintainable Reproducible Code

Slides for the talk at the PyCon US 2021

The code examples can be found at https://github.com/jstriebel/declarative-configs

Avatar for Jonathan Striebel

Jonathan Striebel

April 30, 2021
Tweet

Other Decks in Programming

Transcript

  1. Large Scale Data-Analysis Experiments 50+ TB image data Machine Learning Systems

    • Training Data • Evaluation Data • Week-long Training Segmentation & Agglomeration Algorithms Scalability in HPC Clusters Neuron Reconstructions for Biological Analysis
  2. Experiments Hierarchy Months / Years Changing, adding & removing Features

    dataset 1 feature A dataset 1 feature A + B dataset 1 feature B + C dataset 1 feature B + D dataset 1 dataset 2 feature B + D dataset 2
  3. • Separation of Config & Code • Config Verification •

    Code Verification (Config Usage) • Automated Migrations Maintainability & Reproducibility
  4. config path: some_path thresh: 10 plot_x: X plot_y: Y experiment.py

    data = load("some_path") … find_outliers( data, thresh=10 ) … plot( data["X"], data["Y"] ) Declarative (data-only) Imperative
  5. Input Formats import argparse parser = argparse.ArgumentParser() parser.add_argument("path") args =

    parser.parse_args() args.path args.wrong_key CLI Arguments: $ python experiment.py \ --path somepath --thresh 10 --plot_x X … import typer def main(path: str): path typer.run(main) 💥❌😱 Runtime Error
  6. Input Formats CLI Arguments: $ python experiment.py \ --path somepath

    --thresh 10 --plot_x X … Files: $ python experiment.py config.yaml # or .json, .toml, … Environment Variables: $ export PATH=somepath THRESH=10 … $ python experiment.py import os os.environ["path"] import typer def main(path: str): path typer.run(main) import yaml with open("…") as f: c = yaml.safe_load(f) c["path"]
  7. Representations Basic Python Types: dict, list, string, int, float, bool

    config = { "path": "somepath", "thresh": 10, "plot_x": "X", "plot_y": "Y", } Objects (e.g. with attrs) import attr @attr.s(auto_attribs=True) class ConfigSchema: path: str thresh: int plot_x: str plot_y: str config["wrong_key"] config.wrong_key structuring: cattrs 🎉 cattr.structure( config, ConfigSchema ) ⚠ Static Type-Checks (e.g. with mypy) 💥❌😱 Runtime Error
  8. Input: Type Checkers: • mypy • pytype • … (Tracking)

    sacred, MLflow, Guild Converters • typedload • dacite • pydantic • cattrs • CLI-parameters • environment variables • json / yaml / toml / ini / … Representations • TypedDict • NamedTuple • dataclasses • pydantic • attrs
  9. • Separation of Config & Code • Config Verification •

    Code Verification (Config Usage) • Automated Migrations declarative configs ✔ cattrs ✔ mypy ✔ evolutions ✔ Maintainability & Reproducibility Maintainability & Reproducibility Maintainability & Reproducibility Jonathan Striebel | [email protected] @jostriebel