Managing complex data science experiment configurations with Hydra

MANAGING DATA SCIENCE EXPERIMENTS WITH HYDRA Con fi guration management
framework Michał Karzyński • EuroPython 2022

WELCOME • Michał Karzyński (@postrational) • Hydra (hydra.cc, @omry) •
MLFlow (ml fl ow.org)

THE PROBLEM Data science experiments: • Have complex con fi
gurations • Easy to confuse which values worked best (not traceable) • Results may be dif fi cult to reproduce

THE SOLUTION hydra.cc

my_experiment.py import logging import hydra from omegaconf import DictConfig, OmegaConf
logger = logging.getLogger(__name__) @hydra.main(version_base="1.2", config_path=".", config_name="config") def my_experiment(cfg: DictConfig) -> None: logger.info("Hello, EuroPython!") logger.info(f"model: {cfg.model}") if __name__ == "__main__": my_experiment() con fi g.yaml model: a: 1 b: 2 c: 3 [2022-07-07 20:45:08,262][__main__][INFO] - Hello, EuroPython! [2022-07-07 20:45:08,262][__main__][INFO] - model: {'a': 1, 'b': 2, ‘c': 3}

$ python my_experiment.py [2022-07-07 20:45:08,262][__main__][INFO] - Hello, EuroPython! [2022-07-07 20:45:08,262][__main__][INFO]
- model: {'a': 1, 'b': 2, 'c': 3} $ python my_experiment.py --help my_experiment is powered by Hydra. == Config == Override anything in the config (foo.bar=value) model: a: 1 b: 2 c: 3 Powered by Hydra (https://hydra.cc) Use --hydra-help to view Hydra specific help $ python my_experiment.py model.a=64 [2022-07-07 21:15:10,536][__main__][INFO] - Hello, EuroPython! [2022-07-07 21:15:10,536][__main__][INFO] - model: {'a': 64, 'b': 2, 'c': 3}

HYDRA OUTPUTS DIRECTORY outputs/2022-07-07/20-45-12 ├── .hydra │ ├── config.yaml │
├── hydra.yaml │ └── overrides.yaml └── my_experiment.log $ python my_experiment.py --config-dir=outputs/2022-07-07/20-45-12/.hydra \ --config-name=config [2022-07-07 21:15:08,262][__main__][INFO] - Hello, EuroPython! [2022-07-07 21:15:08,536][__main__][INFO] - model: {'a': 64, 'b': 2, 'c': 3} Traceability Reproducibility

COMMAND-LINE COMPLETION $ eval "$(python my_experiment.py --shell-completion install=bash)" $ python
my_experiment.py model.[TAB][TAB] model.a= model.b= model.c=

HYDRA MULTIRUN $ python my_experiment.py --multirun model.a=1,3 model.b=2,4 [2022-07-07 21:44:19,834][HYDRA]
Launching 4 jobs locally [2022-07-07 21:44:19,834][HYDRA] #0 : model.a=1 model.b=2 [2022-07-07 21:44:19,958][__main__][INFO] - model: {'a': 1, 'b': 2, 'c': 3} [2022-07-07 21:44:19,959][HYDRA] #1 : model.a=1 model.b=4 [2022-07-07 21:44:20,097][__main__][INFO] - model: {'a': 1, 'b': 4, 'c': 3} [2022-07-07 21:44:20,098][HYDRA] #2 : model.a=3 model.b=2 [2022-07-07 21:44:20,222][__main__][INFO] - model: {'a': 3, 'b': 2, 'c': 3} [2022-07-07 21:44:20,223][HYDRA] #3 : model.a=3 model.b=4 [2022-07-07 21:44:20,351][__main__][INFO] - model: {'a': 3, 'b': 4, 'c': 3}

HYDRA COMPONENTS OmegaConf Python Logging Launchers Sweepers Plugins

OMEGA CONF

OMEGACONF • The YAML con fi guration manager Hydra is
based on • Also created by @omry $ pip install omegaconf

OMEGACONF con fi g.yaml foo: bar: baz: "Hello!" from omegaconf
import OmegaConf cfg = OmegaConf.load("config.yaml") assert cfg.foo.bar.baz == "Hello!" assert cfg["foo"]["bar"]["baz"] == "Hello!" assert OmegaConf.select(cfg, "foo.bar.baz") == "Hello!"

VARIABLE INTERPOLATION con fi g.yaml foo: "Hello" bar: "EuroPython" baz:
"${foo}, ${bar}!" from omegaconf import OmegaConf cfg = OmegaConf.load("config.yaml") assert cfg.foo == "Hello, EuroPython!"

RESOLVER FUNCTIONS con fi g.yaml foo: 1 bar: 2 baz:
${add:${foo},${bar}} from omegaconf import OmegaConf OmegaConf.register_new_resolver( "add", lambda *numbers: sum(numbers) ) cfg = OmegaConf.load("config.yaml") assert cfg.baz == 3

HYDRA • Application development framework • Focused on con fi
guration management • Minimal boilerplate

COMPOSING CONFIGURATIONS • Split big con fi gurations into multiple
small fi les • Compose the fi nal con fi guration by combining them • Each subsection (“package”) has a subdirectory and namespace (similar to Python modules) • Con fi guration search path (similar to PYTHONPATH)

COMPOSING CONFIGURATIONS my_experiment.yaml defaults: - training_settings lr: 0.03 training_settings.yaml epochs:
20 optimizer_type: adam lr: 0.01 early_stopping: false @hydra.main(config_name="my_experiment") def my_experiment(cfg: DictConfig) -> None: OmegaConf.to_yaml(cfg) epochs: 20 optimizer_type: adam lr: 0.03 early_stopping: false

CONFIG GROUPS AND OPTIONS my_experiment.yaml defaults: - dataset: imagenet dataset/imagenet.yaml
images: /…/imagenet/ labels: /…/labels.txt dataset: images: /…/imagenet/ labels: /…/labels.txt dataset/cifar.yaml images: /…/cifar/ labels: /…/labels.txt

@PACKAGE OVERRIDES my_experiment.yaml defaults: - dataset@training_dataset: imagenet - dataset@validation_dataset: imagenet
training_dataset: images: /…/imagenet/ labels: /…/labels.txt validation_dataset: images: /…/imagenet/ labels: /…/labels.txt

@PACKAGE DIRECTIVE plugins/…/colorlog.yaml # @package hydra.job_logging version: 1 (...) handlers:
console: class: logging.StreamHandler formatter: colorlog stream: ext://sys.stdout file: class: logging.FileHandler formatter: simple filename: ${hydra.runtime.output_dir}/${hydra.job.name}.log root: level: INFO handlers: [console, file]

HYDRA LOGGING • Python logging out-of-the box • Each run
creates a log fi le • Highly con fi gurable import logging logger = logging.getLogger(__name__) logger.info("Hello, EuroPython!")

HYDRA INSTANTIATE loss_function: _target_: torch.nn.modules.loss.CrossEntropyLoss label_smoothing: 0.1 >>> hydra.utils.instantiate(cfg.loss_function) CrossEntropyLoss()

HYDRA PARTIAL INSTANTIATE optimizer: _target_: torch.optim.SGD lr: 0.01 momentum: 0.9
_partial_: true >>> optim_partial = hydra.utils.instantiate(cfg.optimizer) >>> optim_partial functools.partial(<class 'torch.optim.sgd.SGD'>, lr=0.01, momentum=0.9) >>> optimizer = optim_partial(model.parameters())

TYPE CHECKING from dataclasses import dataclass from hydra.core.config_store import ConfigStore
@dataclass class ModelConfig: a: int = 1 b: int = 2 c: int = 3 @dataclass class ExperimentConfig: model: ModelConfig = ModelConfig() cs = ConfigStore.instance() cs.store(name="config", node=ExperimentConfig) @hydra.main(version_base="1.2", config_name="config") def my_experiment(cfg: ExperimentConfig) -> None: logger.info("Hello, EuroPython!") logger.info(f"model: {cfg.model}")

TYPE CHECKING $ python my_experiment.py model.a=Hello Error merging override model.a=Hello
Value 'Hello' of type 'str' could not be converted to Integer full_key: model.a reference_type=Model object_type=Model

HYDRA PLUGINS • Launchers • Joblib • Ray • Redis
Queue (RQ) • Submit it! for Slurm • Sweepers • Adaptive Experimentation Platform (Ax) • Nevergrad • Optuna

INTEGRATION WITH

INTEGRATION WITH $ pip install mlflow $ mlflow server [2022-07-13
23:13:56 +0100] [36398] [INFO] Starting gunicorn 20.1.0 [2022-07-13 23:13:56 +0100] [36398] [INFO] Listening at: http://127.0.0.1:5000 (36398) [2022-07-13 23:13:56 +0100] [36398] [INFO] Using worker: sync [2022-07-13 23:13:56 +0100] [36399] [INFO] Booting worker with pid: 36399 [2022-07-13 23:13:56 +0100] [36400] [INFO] Booting worker with pid: 36400 [2022-07-13 23:13:56 +0100] [36401] [INFO] Booting worker with pid: 36401 [2022-07-13 23:13:56 +0100] [36402] [INFO] Booting worker with pid: 36402

INTEGRATION WITH

import hydra import mlflow from omegaconf import DictConfig @hydra.main(version_base="1.1", config_path="conf",
config_name="my_experiment") def my_experiment(cfg: DictConfig) -> None: ... mlflow.log_metric("epoch_train_loss", loss) mlflow.log_artifact(".hydra/config.yaml") INTEGRATION WITH

INTEGRATION WITH

import hydra import mlflow from hydra.core.hydra_config import HydraConfig from omegaconf
import DictConfig @hydra.main(version_base="1.2", config_path=".", config_name="config") def my_experiment(cfg: DictConfig) -> None: mlflow.log_metric("epoch_train_loss", loss) config_yaml_path = os.path.join( HydraConfig.get().runtime.output_dir, ".hydra/config.yaml" ) mlflow.log_artifact(config_yaml_path) INTEGRATION WITH

TAKEAWAYS • Hydra makes your experiments: • Easy to con
fi gure • Traceable • Reproducible

THANK YOU

Managing complex data science experiment config...

Managing complex data science experiment configurations with Hydra

More Decks by Michał Karzyński

Other Decks in Technology

Featured

Transcript