Slide 1

Slide 1 text

Declarative Configs for Maintainable Reproducible Code Jonathan Striebel

Slide 2

Slide 2 text

Hi, I’m Jonathan Striebel. @jostriebel [email protected]

Slide 3

Slide 3 text

https://webknossos.org

Slide 4

Slide 4 text

Large Scale Data-Analysis Experiments 50+ TB image data Machine Learning Systems ● Training Data ● Evaluation Data ● Week-long Training Segmentation & Agglomeration Algorithms Scalability in HPC Clusters Neuron Reconstructions for Biological Analysis

Slide 5

Slide 5 text

Experiments Hierarchy Months / Years Changing, adding & removing Features dataset 1 feature A dataset 1 feature A + B dataset 1 feature B + C dataset 1 feature B + D dataset 1 dataset 2 feature B + D dataset 2

Slide 6

Slide 6 text

● Separation of Config & Code ● Config Verification ● Code Verification (Config Usage) ● Automated Migrations Maintainability & Reproducibility

Slide 7

Slide 7 text

Toy Experiment Outlier Detection …

Slide 8

Slide 8 text

config path: some_path thresh: 10 plot_x: X plot_y: Y experiment.py data = load("some_path") … find_outliers( data, thresh=10 ) … plot( data["X"], data["Y"] ) Declarative (data-only) Imperative

Slide 9

Slide 9 text

declarative input format representation deserialization

Slide 10

Slide 10 text

Input Formats import argparse parser = argparse.ArgumentParser() parser.add_argument("path") args = parser.parse_args() args.path args.wrong_key CLI Arguments: $ python experiment.py \ --path somepath --thresh 10 --plot_x X … import typer def main(path: str): path typer.run(main) 💥❌😱 Runtime Error

Slide 11

Slide 11 text

Input Formats CLI Arguments: $ python experiment.py \ --path somepath --thresh 10 --plot_x X … Files: $ python experiment.py config.yaml # or .json, .toml, … Environment Variables: $ export PATH=somepath THRESH=10 … $ python experiment.py import os os.environ["path"] import typer def main(path: str): path typer.run(main) import yaml with open("…") as f: c = yaml.safe_load(f) c["path"]

Slide 12

Slide 12 text

Representations Basic Python Types: dict, list, string, int, float, bool config = { "path": "somepath", "thresh": 10, "plot_x": "X", "plot_y": "Y", } Objects (e.g. with attrs) import attr @attr.s(auto_attribs=True) class ConfigSchema: path: str thresh: int plot_x: str plot_y: str config["wrong_key"] config.wrong_key structuring: cattrs 🎉 cattr.structure( config, ConfigSchema ) ⚠ Static Type-Checks (e.g. with mypy) 💥❌😱 Runtime Error

Slide 13

Slide 13 text

Input: Type Checkers: ● mypy ● pytype ● … (Tracking) sacred, MLflow, Guild Converters ● typedload ● dacite ● pydantic ● cattrs ● CLI-parameters ● environment variables ● json / yaml / toml / ini / … Representations ● TypedDict ● NamedTuple ● dataclasses ● pydantic ● attrs

Slide 14

Slide 14 text

versioned config files dicts etc. objects up-to-date schema classes evolutions Schema Versions & Evolutions

Slide 15

Slide 15 text

versioned config files dicts etc. objects evolutions Schema Versions & Evolutions + old up-to-date schema classes

Slide 16

Slide 16 text

● Separation of Config & Code ● Config Verification ● Code Verification (Config Usage) ● Automated Migrations declarative configs ✔ cattrs ✔ mypy ✔ evolutions ✔ Maintainability & Reproducibility Maintainability & Reproducibility Maintainability & Reproducibility Jonathan Striebel | [email protected] @jostriebel