Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Managing complex data science experiment config...

Managing complex data science experiment configurations with Hydra

Data science experiments have a lot of moving parts. Datasets, models, hyperparameters all have multiple knobs and dials. This means that keeping track of the exact parameter values can be tedious or error prone.

Thankfully you're not the only ones facing this problem and solutions are becoming available. One of them is Hydra from Meta AI Research. Hydra is an open-source application framework, which helps you handle complex configurations in an easy and elegant way. Experiments written with Hydra are traceable and reproducible with minimal boilerplate code.

In my talk I will go over the main features of Hydra and the OmegaConf configuration system it is based on. I will show examples of elegant code written with Hydra and talk about ways to integrate it with other open-source tools such as MLFlow.

Michał Karzyński

July 14, 2022
Tweet

More Decks by Michał Karzyński

Other Decks in Technology

Transcript

  1. MANAGING DATA SCIENCE EXPERIMENTS WITH HYDRA Con fi guration management

    framework Michał Karzyński • EuroPython 2022
  2. THE PROBLEM Data science experiments: • Have complex con fi

    gurations • Easy to confuse which values worked best (not traceable) • Results may be dif fi cult to reproduce
  3. my_experiment.py import logging import hydra from omegaconf import DictConfig, OmegaConf

    logger = logging.getLogger(__name__) @hydra.main(version_base="1.2", config_path=".", config_name="config") def my_experiment(cfg: DictConfig) -> None: logger.info("Hello, EuroPython!") logger.info(f"model: {cfg.model}") if __name__ == "__main__": my_experiment() con fi g.yaml model: a: 1 b: 2 c: 3 [2022-07-07 20:45:08,262][__main__][INFO] - Hello, EuroPython! [2022-07-07 20:45:08,262][__main__][INFO] - model: {'a': 1, 'b': 2, ‘c': 3}
  4. $ python my_experiment.py [2022-07-07 20:45:08,262][__main__][INFO] - Hello, EuroPython! [2022-07-07 20:45:08,262][__main__][INFO]

    - model: {'a': 1, 'b': 2, 'c': 3} $ python my_experiment.py --help my_experiment is powered by Hydra. == Config == Override anything in the config (foo.bar=value) model: a: 1 b: 2 c: 3 Powered by Hydra (https://hydra.cc) Use --hydra-help to view Hydra specific help $ python my_experiment.py model.a=64 [2022-07-07 21:15:10,536][__main__][INFO] - Hello, EuroPython! [2022-07-07 21:15:10,536][__main__][INFO] - model: {'a': 64, 'b': 2, 'c': 3}
  5. HYDRA OUTPUTS DIRECTORY outputs/2022-07-07/20-45-12 ├── .hydra │ ├── config.yaml │

    ├── hydra.yaml │ └── overrides.yaml └── my_experiment.log $ python my_experiment.py --config-dir=outputs/2022-07-07/20-45-12/.hydra \ --config-name=config [2022-07-07 21:15:08,262][__main__][INFO] - Hello, EuroPython! [2022-07-07 21:15:08,536][__main__][INFO] - model: {'a': 64, 'b': 2, 'c': 3} Traceability Reproducibility
  6. HYDRA MULTIRUN $ python my_experiment.py --multirun model.a=1,3 model.b=2,4 [2022-07-07 21:44:19,834][HYDRA]

    Launching 4 jobs locally [2022-07-07 21:44:19,834][HYDRA] #0 : model.a=1 model.b=2 [2022-07-07 21:44:19,958][__main__][INFO] - model: {'a': 1, 'b': 2, 'c': 3} [2022-07-07 21:44:19,959][HYDRA] #1 : model.a=1 model.b=4 [2022-07-07 21:44:20,097][__main__][INFO] - model: {'a': 1, 'b': 4, 'c': 3} [2022-07-07 21:44:20,098][HYDRA] #2 : model.a=3 model.b=2 [2022-07-07 21:44:20,222][__main__][INFO] - model: {'a': 3, 'b': 2, 'c': 3} [2022-07-07 21:44:20,223][HYDRA] #3 : model.a=3 model.b=4 [2022-07-07 21:44:20,351][__main__][INFO] - model: {'a': 3, 'b': 4, 'c': 3}
  7. OMEGACONF • The YAML con fi guration manager Hydra is

    based on • Also created by @omry $ pip install omegaconf
  8. OMEGACONF con fi g.yaml foo: bar: baz: "Hello!" from omegaconf

    import OmegaConf cfg = OmegaConf.load("config.yaml") assert cfg.foo.bar.baz == "Hello!" assert cfg["foo"]["bar"]["baz"] == "Hello!" assert OmegaConf.select(cfg, "foo.bar.baz") == "Hello!"
  9. VARIABLE INTERPOLATION con fi g.yaml foo: "Hello" bar: "EuroPython" baz:

    "${foo}, ${bar}!" from omegaconf import OmegaConf cfg = OmegaConf.load("config.yaml") assert cfg.foo == "Hello, EuroPython!"
  10. RESOLVER FUNCTIONS con fi g.yaml foo: 1 bar: 2 baz:

    ${add:${foo},${bar}} from omegaconf import OmegaConf OmegaConf.register_new_resolver( "add", lambda *numbers: sum(numbers) ) cfg = OmegaConf.load("config.yaml") assert cfg.baz == 3
  11. HYDRA • Application development framework • Focused on con fi

    guration management • Minimal boilerplate
  12. COMPOSING CONFIGURATIONS • Split big con fi gurations into multiple

    small fi les • Compose the fi nal con fi guration by combining them • Each subsection (“package”) has a subdirectory and namespace (similar to Python modules) • Con fi guration search path (similar to PYTHONPATH)
  13. COMPOSING CONFIGURATIONS my_experiment.yaml defaults: - training_settings lr: 0.03 training_settings.yaml epochs:

    20 optimizer_type: adam lr: 0.01 early_stopping: false @hydra.main(config_name="my_experiment") def my_experiment(cfg: DictConfig) -> None: OmegaConf.to_yaml(cfg) epochs: 20 optimizer_type: adam lr: 0.03 early_stopping: false
  14. CONFIG GROUPS AND OPTIONS my_experiment.yaml defaults: - dataset: imagenet dataset/imagenet.yaml

    images: /…/imagenet/ labels: /…/labels.txt dataset: images: /…/imagenet/ labels: /…/labels.txt dataset/cifar.yaml images: /…/cifar/ labels: /…/labels.txt
  15. @PACKAGE OVERRIDES my_experiment.yaml defaults: - dataset@training_dataset: imagenet - dataset@validation_dataset: imagenet

    training_dataset: images: /…/imagenet/ labels: /…/labels.txt validation_dataset: images: /…/imagenet/ labels: /…/labels.txt
  16. @PACKAGE DIRECTIVE plugins/…/colorlog.yaml # @package hydra.job_logging version: 1 (...) handlers:

    console: class: logging.StreamHandler formatter: colorlog stream: ext://sys.stdout file: class: logging.FileHandler formatter: simple filename: ${hydra.runtime.output_dir}/${hydra.job.name}.log root: level: INFO handlers: [console, file]
  17. HYDRA LOGGING • Python logging out-of-the box • Each run

    creates a log fi le • Highly con fi gurable import logging logger = logging.getLogger(__name__) logger.info("Hello, EuroPython!")
  18. HYDRA PARTIAL INSTANTIATE optimizer: _target_: torch.optim.SGD lr: 0.01 momentum: 0.9

    _partial_: true >>> optim_partial = hydra.utils.instantiate(cfg.optimizer) >>> optim_partial functools.partial(<class 'torch.optim.sgd.SGD'>, lr=0.01, momentum=0.9) >>> optimizer = optim_partial(model.parameters())
  19. TYPE CHECKING from dataclasses import dataclass from hydra.core.config_store import ConfigStore

    @dataclass class ModelConfig: a: int = 1 b: int = 2 c: int = 3 @dataclass class ExperimentConfig: model: ModelConfig = ModelConfig() cs = ConfigStore.instance() cs.store(name="config", node=ExperimentConfig) @hydra.main(version_base="1.2", config_name="config") def my_experiment(cfg: ExperimentConfig) -> None: logger.info("Hello, EuroPython!") logger.info(f"model: {cfg.model}")
  20. TYPE CHECKING $ python my_experiment.py model.a=Hello Error merging override model.a=Hello

    Value 'Hello' of type 'str' could not be converted to Integer full_key: model.a reference_type=Model object_type=Model
  21. HYDRA PLUGINS • Launchers • Joblib • Ray • Redis

    Queue (RQ) • Submit it! for Slurm • Sweepers • Adaptive Experimentation Platform (Ax) • Nevergrad • Optuna
  22. INTEGRATION WITH $ pip install mlflow $ mlflow server [2022-07-13

    23:13:56 +0100] [36398] [INFO] Starting gunicorn 20.1.0 [2022-07-13 23:13:56 +0100] [36398] [INFO] Listening at: http://127.0.0.1:5000 (36398) [2022-07-13 23:13:56 +0100] [36398] [INFO] Using worker: sync [2022-07-13 23:13:56 +0100] [36399] [INFO] Booting worker with pid: 36399 [2022-07-13 23:13:56 +0100] [36400] [INFO] Booting worker with pid: 36400 [2022-07-13 23:13:56 +0100] [36401] [INFO] Booting worker with pid: 36401 [2022-07-13 23:13:56 +0100] [36402] [INFO] Booting worker with pid: 36402
  23. import hydra import mlflow from omegaconf import DictConfig @hydra.main(version_base="1.1", config_path="conf",

    config_name="my_experiment") def my_experiment(cfg: DictConfig) -> None: ... mlflow.log_metric("epoch_train_loss", loss) mlflow.log_artifact(".hydra/config.yaml") INTEGRATION WITH
  24. import hydra import mlflow from hydra.core.hydra_config import HydraConfig from omegaconf

    import DictConfig @hydra.main(version_base="1.2", config_path=".", config_name="config") def my_experiment(cfg: DictConfig) -> None: mlflow.log_metric("epoch_train_loss", loss) config_yaml_path = os.path.join( HydraConfig.get().runtime.output_dir, ".hydra/config.yaml" ) mlflow.log_artifact(config_yaml_path) INTEGRATION WITH
  25. TAKEAWAYS • Hydra makes your experiments: • Easy to con

    fi gure • Traceable • Reproducible