Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Declarative Configs for Maintainable Reproducible Code

Declarative Configs for Maintainable Reproducible Code

Slides for the talk at the PyCon US 2021

The code examples can be found at https://github.com/jstriebel/declarative-configs

4c0a4155429ca8cba643376fc7c80248?s=128

Jonathan Striebel

April 30, 2021
Tweet

Transcript

  1. Declarative Configs for Maintainable Reproducible Code Jonathan Striebel

  2. Hi, I’m Jonathan Striebel. @jostriebel jonathan@scalableminds.com

  3. https://webknossos.org

  4. Large Scale Data-Analysis Experiments 50+ TB image data Machine Learning Systems

    • Training Data • Evaluation Data • Week-long Training Segmentation & Agglomeration Algorithms Scalability in HPC Clusters Neuron Reconstructions for Biological Analysis
  5. Experiments Hierarchy Months / Years Changing, adding & removing Features

    dataset 1 feature A dataset 1 feature A + B dataset 1 feature B + C dataset 1 feature B + D dataset 1 dataset 2 feature B + D dataset 2
  6. • Separation of Config & Code • Config Verification •

    Code Verification (Config Usage) • Automated Migrations Maintainability & Reproducibility
  7. Toy Experiment Outlier Detection …

  8. config path: some_path thresh: 10 plot_x: X plot_y: Y experiment.py

    data = load("some_path") … find_outliers( data, thresh=10 ) … plot( data["X"], data["Y"] ) Declarative (data-only) Imperative
  9. declarative input format representation deserialization

  10. Input Formats import argparse parser = argparse.ArgumentParser() parser.add_argument("path") args =

    parser.parse_args() args.path args.wrong_key CLI Arguments: $ python experiment.py \ --path somepath --thresh 10 --plot_x X … import typer def main(path: str): path typer.run(main) 💥❌😱 Runtime Error
  11. Input Formats CLI Arguments: $ python experiment.py \ --path somepath

    --thresh 10 --plot_x X … Files: $ python experiment.py config.yaml # or .json, .toml, … Environment Variables: $ export PATH=somepath THRESH=10 … $ python experiment.py import os os.environ["path"] import typer def main(path: str): path typer.run(main) import yaml with open("…") as f: c = yaml.safe_load(f) c["path"]
  12. Representations Basic Python Types: dict, list, string, int, float, bool

    config = { "path": "somepath", "thresh": 10, "plot_x": "X", "plot_y": "Y", } Objects (e.g. with attrs) import attr @attr.s(auto_attribs=True) class ConfigSchema: path: str thresh: int plot_x: str plot_y: str config["wrong_key"] config.wrong_key structuring: cattrs 🎉 cattr.structure( config, ConfigSchema ) ⚠ Static Type-Checks (e.g. with mypy) 💥❌😱 Runtime Error
  13. Input: Type Checkers: • mypy • pytype • … (Tracking)

    sacred, MLflow, Guild Converters • typedload • dacite • pydantic • cattrs • CLI-parameters • environment variables • json / yaml / toml / ini / … Representations • TypedDict • NamedTuple • dataclasses • pydantic • attrs
  14. versioned config files dicts etc. objects up-to-date schema classes evolutions

    Schema Versions & Evolutions
  15. versioned config files dicts etc. objects evolutions Schema Versions &

    Evolutions + old up-to-date schema classes
  16. • Separation of Config & Code • Config Verification •

    Code Verification (Config Usage) • Automated Migrations declarative configs ✔ cattrs ✔ mypy ✔ evolutions ✔ Maintainability & Reproducibility Maintainability & Reproducibility Maintainability & Reproducibility Jonathan Striebel | jonathan@scalableminds.com @jostriebel