$30 off During Our Annual Pro Sale. View Details »

Declarative Configs for Maintainable Reproducible Code

Declarative Configs for Maintainable Reproducible Code

Slides for the talk at the PyCon US 2021

The code examples can be found at https://github.com/jstriebel/declarative-configs

Jonathan Striebel

April 30, 2021
Tweet

Other Decks in Programming

Transcript

  1. Declarative Configs for
    Maintainable Reproducible Code
    Jonathan Striebel

    View Slide

  2. Hi, I’m Jonathan Striebel.
    @jostriebel
    [email protected]

    View Slide

  3. https://webknossos.org

    View Slide

  4. Large Scale Data-Analysis Experiments
    50+ TB
    image data
    Machine Learning Systems
    ● Training Data
    ● Evaluation Data
    ● Week-long Training
    Segmentation & Agglomeration
    Algorithms
    Scalability in HPC Clusters
    Neuron
    Reconstructions for
    Biological Analysis

    View Slide

  5. Experiments Hierarchy
    Months / Years
    Changing, adding &
    removing Features
    dataset 1
    feature A
    dataset 1
    feature A
    + B
    dataset 1
    feature B
    + C
    dataset 1
    feature B
    + D
    dataset 1
    dataset 2
    feature B
    + D
    dataset 2

    View Slide

  6. ● Separation of Config & Code
    ● Config Verification
    ● Code Verification (Config Usage)
    ● Automated Migrations
    Maintainability & Reproducibility

    View Slide

  7. Toy Experiment
    Outlier Detection

    View Slide

  8. config
    path: some_path
    thresh: 10
    plot_x: X
    plot_y: Y
    experiment.py
    data = load("some_path")

    find_outliers(
    data,
    thresh=10
    )

    plot(
    data["X"],
    data["Y"]
    )
    Declarative
    (data-only) Imperative

    View Slide

  9. declarative
    input format representation
    deserialization

    View Slide

  10. Input Formats
    import argparse
    parser = argparse.ArgumentParser()
    parser.add_argument("path")
    args = parser.parse_args()
    args.path
    args.wrong_key
    CLI Arguments:
    $ python experiment.py \
    --path somepath --thresh 10 --plot_x X …
    import typer
    def main(path: str):
    path
    typer.run(main)
    💥❌😱 Runtime Error

    View Slide

  11. Input Formats
    CLI Arguments:
    $ python experiment.py \
    --path somepath --thresh 10 --plot_x X …
    Files:
    $ python experiment.py config.yaml
    # or .json, .toml, …
    Environment Variables:
    $ export PATH=somepath THRESH=10 …
    $ python experiment.py
    import os
    os.environ["path"]
    import typer
    def main(path: str):
    path
    typer.run(main)
    import yaml
    with open("…") as f:
    c = yaml.safe_load(f)
    c["path"]

    View Slide

  12. Representations
    Basic Python Types:
    dict, list, string, int, float, bool
    config = {
    "path": "somepath",
    "thresh": 10,
    "plot_x": "X",
    "plot_y": "Y",
    }
    Objects (e.g. with attrs)
    import attr
    @attr.s(auto_attribs=True)
    class ConfigSchema:
    path: str
    thresh: int
    plot_x: str
    plot_y: str
    config["wrong_key"] config.wrong_key
    structuring: cattrs 🎉
    cattr.structure(
    config,
    ConfigSchema
    )

    Static Type-Checks (e.g. with mypy)
    💥❌😱
    Runtime Error

    View Slide

  13. Input:
    Type Checkers:
    ● mypy
    ● pytype
    ● …
    (Tracking)
    sacred, MLflow, Guild
    Converters
    ● typedload
    ● dacite
    ● pydantic
    ● cattrs
    ● CLI-parameters
    ● environment variables
    ● json / yaml / toml / ini / …
    Representations
    ● TypedDict
    ● NamedTuple
    ● dataclasses
    ● pydantic
    ● attrs

    View Slide

  14. versioned
    config files
    dicts etc. objects
    up-to-date
    schema
    classes
    evolutions
    Schema Versions & Evolutions

    View Slide

  15. versioned
    config files
    dicts etc. objects
    evolutions
    Schema Versions & Evolutions
    + old
    up-to-date
    schema
    classes

    View Slide

  16. ● Separation of Config & Code
    ● Config Verification
    ● Code Verification (Config Usage)
    ● Automated Migrations
    declarative
    configs

    cattrs

    mypy

    evolutions

    Maintainability & Reproducibility
    Maintainability & Reproducibility
    Maintainability & Reproducibility
    Jonathan Striebel | [email protected] @jostriebel

    View Slide