Slide 1

Slide 1 text

MANAGING DATA SCIENCE EXPERIMENTS WITH HYDRA Con fi guration management framework Michał Karzyński • EuroPython 2022

Slide 2

Slide 2 text

WELCOME • Michał Karzyński (@postrational) • Hydra (hydra.cc, @omry) • MLFlow (ml fl ow.org)

Slide 3

Slide 3 text

THE PROBLEM Data science experiments: • Have complex con fi gurations • Easy to confuse which values worked best (not traceable) • Results may be dif fi cult to reproduce

Slide 4

Slide 4 text

THE SOLUTION hydra.cc

Slide 5

Slide 5 text

my_experiment.py import logging import hydra from omegaconf import DictConfig, OmegaConf logger = logging.getLogger(__name__) @hydra.main(version_base="1.2", config_path=".", config_name="config") def my_experiment(cfg: DictConfig) -> None: logger.info("Hello, EuroPython!") logger.info(f"model: {cfg.model}") if __name__ == "__main__": my_experiment() con fi g.yaml model: a: 1 b: 2 c: 3 [2022-07-07 20:45:08,262][__main__][INFO] - Hello, EuroPython! [2022-07-07 20:45:08,262][__main__][INFO] - model: {'a': 1, 'b': 2, ‘c': 3}

Slide 6

Slide 6 text

$ python my_experiment.py [2022-07-07 20:45:08,262][__main__][INFO] - Hello, EuroPython! [2022-07-07 20:45:08,262][__main__][INFO] - model: {'a': 1, 'b': 2, 'c': 3} $ python my_experiment.py --help my_experiment is powered by Hydra. == Config == Override anything in the config (foo.bar=value) model: a: 1 b: 2 c: 3 Powered by Hydra (https://hydra.cc) Use --hydra-help to view Hydra specific help $ python my_experiment.py model.a=64 [2022-07-07 21:15:10,536][__main__][INFO] - Hello, EuroPython! [2022-07-07 21:15:10,536][__main__][INFO] - model: {'a': 64, 'b': 2, 'c': 3}

Slide 7

Slide 7 text

HYDRA OUTPUTS DIRECTORY outputs/2022-07-07/20-45-12 ├── .hydra │ ├── config.yaml │ ├── hydra.yaml │ └── overrides.yaml └── my_experiment.log $ python my_experiment.py --config-dir=outputs/2022-07-07/20-45-12/.hydra \ --config-name=config [2022-07-07 21:15:08,262][__main__][INFO] - Hello, EuroPython! [2022-07-07 21:15:08,536][__main__][INFO] - model: {'a': 64, 'b': 2, 'c': 3} Traceability Reproducibility

Slide 8

Slide 8 text

COMMAND-LINE COMPLETION $ eval "$(python my_experiment.py --shell-completion install=bash)" $ python my_experiment.py model.[TAB][TAB] model.a= model.b= model.c=

Slide 9

Slide 9 text

HYDRA MULTIRUN $ python my_experiment.py --multirun model.a=1,3 model.b=2,4 [2022-07-07 21:44:19,834][HYDRA] Launching 4 jobs locally [2022-07-07 21:44:19,834][HYDRA] #0 : model.a=1 model.b=2 [2022-07-07 21:44:19,958][__main__][INFO] - model: {'a': 1, 'b': 2, 'c': 3} [2022-07-07 21:44:19,959][HYDRA] #1 : model.a=1 model.b=4 [2022-07-07 21:44:20,097][__main__][INFO] - model: {'a': 1, 'b': 4, 'c': 3} [2022-07-07 21:44:20,098][HYDRA] #2 : model.a=3 model.b=2 [2022-07-07 21:44:20,222][__main__][INFO] - model: {'a': 3, 'b': 2, 'c': 3} [2022-07-07 21:44:20,223][HYDRA] #3 : model.a=3 model.b=4 [2022-07-07 21:44:20,351][__main__][INFO] - model: {'a': 3, 'b': 4, 'c': 3}

Slide 10

Slide 10 text

HYDRA COMPONENTS OmegaConf Python Logging Launchers Sweepers Plugins

Slide 11

Slide 11 text

OMEGA CONF

Slide 12

Slide 12 text

OMEGACONF • The YAML con fi guration manager Hydra is based on • Also created by @omry $ pip install omegaconf

Slide 13

Slide 13 text

OMEGACONF con fi g.yaml foo: bar: baz: "Hello!" from omegaconf import OmegaConf cfg = OmegaConf.load("config.yaml") assert cfg.foo.bar.baz == "Hello!" assert cfg["foo"]["bar"]["baz"] == "Hello!" assert OmegaConf.select(cfg, "foo.bar.baz") == "Hello!"

Slide 14

Slide 14 text

VARIABLE INTERPOLATION con fi g.yaml foo: "Hello" bar: "EuroPython" baz: "${foo}, ${bar}!" from omegaconf import OmegaConf cfg = OmegaConf.load("config.yaml") assert cfg.foo == "Hello, EuroPython!"

Slide 15

Slide 15 text

RESOLVER FUNCTIONS con fi g.yaml foo: 1 bar: 2 baz: ${add:${foo},${bar}} from omegaconf import OmegaConf OmegaConf.register_new_resolver( "add", lambda *numbers: sum(numbers) ) cfg = OmegaConf.load("config.yaml") assert cfg.baz == 3

Slide 16

Slide 16 text

HYDRA

Slide 17

Slide 17 text

HYDRA • Application development framework • Focused on con fi guration management • Minimal boilerplate

Slide 18

Slide 18 text

COMPOSING CONFIGURATIONS • Split big con fi gurations into multiple small fi les • Compose the fi nal con fi guration by combining them • Each subsection (“package”) has a subdirectory and namespace (similar to Python modules) • Con fi guration search path (similar to PYTHONPATH)

Slide 19

Slide 19 text

COMPOSING CONFIGURATIONS my_experiment.yaml defaults: - training_settings lr: 0.03 training_settings.yaml epochs: 20 optimizer_type: adam lr: 0.01 early_stopping: false @hydra.main(config_name="my_experiment") def my_experiment(cfg: DictConfig) -> None: OmegaConf.to_yaml(cfg) epochs: 20 optimizer_type: adam lr: 0.03 early_stopping: false

Slide 20

Slide 20 text

CONFIG GROUPS AND OPTIONS my_experiment.yaml defaults: - dataset: imagenet dataset/imagenet.yaml images: /…/imagenet/ labels: /…/labels.txt dataset: images: /…/imagenet/ labels: /…/labels.txt dataset/cifar.yaml images: /…/cifar/ labels: /…/labels.txt

Slide 21

Slide 21 text

@PACKAGE OVERRIDES my_experiment.yaml defaults: - dataset@training_dataset: imagenet - dataset@validation_dataset: imagenet training_dataset: images: /…/imagenet/ labels: /…/labels.txt validation_dataset: images: /…/imagenet/ labels: /…/labels.txt

Slide 22

Slide 22 text

@PACKAGE DIRECTIVE plugins/…/colorlog.yaml # @package hydra.job_logging version: 1 (...) handlers: console: class: logging.StreamHandler formatter: colorlog stream: ext://sys.stdout file: class: logging.FileHandler formatter: simple filename: ${hydra.runtime.output_dir}/${hydra.job.name}.log root: level: INFO handlers: [console, file]

Slide 23

Slide 23 text

HYDRA LOGGING • Python logging out-of-the box • Each run creates a log fi le • Highly con fi gurable import logging logger = logging.getLogger(__name__) logger.info("Hello, EuroPython!")

Slide 24

Slide 24 text

HYDRA INSTANTIATE loss_function: _target_: torch.nn.modules.loss.CrossEntropyLoss label_smoothing: 0.1 >>> hydra.utils.instantiate(cfg.loss_function) CrossEntropyLoss()

Slide 25

Slide 25 text

HYDRA PARTIAL INSTANTIATE optimizer: _target_: torch.optim.SGD lr: 0.01 momentum: 0.9 _partial_: true >>> optim_partial = hydra.utils.instantiate(cfg.optimizer) >>> optim_partial functools.partial(, lr=0.01, momentum=0.9) >>> optimizer = optim_partial(model.parameters())

Slide 26

Slide 26 text

TYPE CHECKING from dataclasses import dataclass from hydra.core.config_store import ConfigStore @dataclass class ModelConfig: a: int = 1 b: int = 2 c: int = 3 @dataclass class ExperimentConfig: model: ModelConfig = ModelConfig() cs = ConfigStore.instance() cs.store(name="config", node=ExperimentConfig) @hydra.main(version_base="1.2", config_name="config") def my_experiment(cfg: ExperimentConfig) -> None: logger.info("Hello, EuroPython!") logger.info(f"model: {cfg.model}")

Slide 27

Slide 27 text

TYPE CHECKING $ python my_experiment.py model.a=Hello Error merging override model.a=Hello Value 'Hello' of type 'str' could not be converted to Integer full_key: model.a reference_type=Model object_type=Model

Slide 28

Slide 28 text

HYDRA PLUGINS • Launchers • Joblib • Ray • Redis Queue (RQ) • Submit it! for Slurm • Sweepers • Adaptive Experimentation Platform (Ax) • Nevergrad • Optuna

Slide 29

Slide 29 text

INTEGRATION WITH

Slide 30

Slide 30 text

INTEGRATION WITH $ pip install mlflow $ mlflow server [2022-07-13 23:13:56 +0100] [36398] [INFO] Starting gunicorn 20.1.0 [2022-07-13 23:13:56 +0100] [36398] [INFO] Listening at: http://127.0.0.1:5000 (36398) [2022-07-13 23:13:56 +0100] [36398] [INFO] Using worker: sync [2022-07-13 23:13:56 +0100] [36399] [INFO] Booting worker with pid: 36399 [2022-07-13 23:13:56 +0100] [36400] [INFO] Booting worker with pid: 36400 [2022-07-13 23:13:56 +0100] [36401] [INFO] Booting worker with pid: 36401 [2022-07-13 23:13:56 +0100] [36402] [INFO] Booting worker with pid: 36402

Slide 31

Slide 31 text

INTEGRATION WITH

Slide 32

Slide 32 text

import hydra import mlflow from omegaconf import DictConfig @hydra.main(version_base="1.1", config_path="conf", config_name="my_experiment") def my_experiment(cfg: DictConfig) -> None: ... mlflow.log_metric("epoch_train_loss", loss) mlflow.log_artifact(".hydra/config.yaml") INTEGRATION WITH

Slide 33

Slide 33 text

INTEGRATION WITH

Slide 34

Slide 34 text

import hydra import mlflow from hydra.core.hydra_config import HydraConfig from omegaconf import DictConfig @hydra.main(version_base="1.2", config_path=".", config_name="config") def my_experiment(cfg: DictConfig) -> None: mlflow.log_metric("epoch_train_loss", loss) config_yaml_path = os.path.join( HydraConfig.get().runtime.output_dir, ".hydra/config.yaml" ) mlflow.log_artifact(config_yaml_path) INTEGRATION WITH

Slide 35

Slide 35 text

TAKEAWAYS • Hydra makes your experiments: • Easy to con fi gure • Traceable • Reproducible

Slide 36

Slide 36 text

THANK YOU