Slide 1

Slide 1 text

Philipp Stephan, mediaire, PyCon Berlin 2024 Robust Con fi guration Management 1 WITH PYDANTIC'S DATA VALIDATION

Slide 2

Slide 2 text

2

Slide 3

Slide 3 text

3 DEFAULT_PORT = 80 PYTHON sender: port: 80 YAML --header "AcceptLanguage: de" Con fi guration

Slide 4

Slide 4 text

Simple dict 4 PYTHON config = { 'sender_enabled': True, 'sender': {'host': 'example.org', 'port': 80} 'receiver_enabled': True, 'receiver': {'port': 1337}, 'license_key': 'fsf89sdf98s9dff0ssdf09fs' }

Slide 5

Slide 5 text

Simple dict 5 config = { 'sender_enabled': True, 'sender': {'host': 'example.org', 'port': 80} 'receiver_enabled': True, 'receiver': {'port': 1337}, 'license_key': 'fsf89sdf98s9dff0ssdf09fs' } PYTHON { "sender_enabled": true, "sender": { "host": "example.org", "port": 80 }, "receiver_enabled": true, "receiver": { "port": 1337 }, "license_key": "ff89df98s9dff0df09f" } JSON receiver_enabled: yes sender: host: example.org port: 80 receiver_enabled: yes receiver: port: 1337 license_key: "ff89df98s9dff0df09f" YAML sender_enabled = true receiver_enabled = true license_key = "ff89df98s9dff0df09f" [sender] host = "example.org" port = 80 [receiver] port = 1337 TOML

Slide 6

Slide 6 text

Simple dict 6 config = { 'sender_enabled': True, 'sender': {'host': 'example.org', 'port': 80} 'receiver_enabled': True, 'receiver': {'port': 1337}, 'license_key': 'fsf89sdf98s9dff0ssdf09fs' } PYTHON

Slide 7

Slide 7 text

dict Loading 7 config = { 'sender_enabled': True, 'sender': {'host': 'example.org', 'port': 80} 'receiver_enabled': True, 'receiver': {'port': 1337}, 'license_key': 'fsf89sdf98s9dff0ssdf09fs' } PYTHON PYTHON dict.get(key, default=None) config.get('sender_enabled', False)

Slide 8

Slide 8 text

dict Loading 8 PYTHON config.get('sender_enabled', False) config = { 'sender_enabled': True, 'sender': {'host': 'example.org', 'port': 80} 'receiver_enabled': True, 'receiver': {'port': 1337}, 'license_key': 'fsf89sdf98s9dff0ssdf09fs' } PYTHON

Slide 9

Slide 9 text

PYTHON Problem: Defaults 9 DEFAULT_SENDER_ENABLED = False config.get('sender_enabled', config = { 'sender_enabled': True, 'sender': {'host': 'example.org', 'port': 80} 'receiver_enabled': True, 'receiver': {'port': 1337}, 'license_key': 'fsf89sdf98s9dff0ssdf09fs' } PYTHON False) DEFAULT_SENDER_ENABLED)

Slide 10

Slide 10 text

PYTHON Problem: Defaults 10 DEFAULT_SENDER_ENABLED = False config.get('sender_enabled', DEFAULT_SENDER_ENABLED) config = { 'sender_enabled': True, 'sender': {'host': 'example.org', 'port': 80} 'receiver_enabled': True, 'receiver': {'port': 1337}, 'license_key': 'fsf89sdf98s9dff0ssdf09fs' } PYTHON if config.get('sender_enabled', DEFAULT_SENDER_ENABLED): sender_port = config.get('sender', {}).get('port', DEFAULT_SENDER_PORT) logging.info(f"sender enabled on port {sender_port}") DEFAULT_SENDER_PORT = 80

Slide 11

Slide 11 text

11 Problem: Fallbacks config = { 'sender_enabled': True, 'sender': {'host': 'example.org', 'port': 80} 'receiver_enabled': True, 'receiver': {'port': 1337}, 'license_key': 'fsf89sdf98s9dff0ssdf09fs' } PYTHON PYTHON DEFAULT_SENDER_ENABLED = False DEFAULT_SENDER_PORT = 80 config.get('sender_enabled', DEFAULT_SENDER_ENABLED) if config.get('sender_enabled', DEFAULT_SENDER_ENABLED): sender_port = config.get('sender', {}).get('port', DEFAULT_SENDER_PORT) logging.info(f"sender enabled on port {sender_port}") if config.get('receiver_enabled', DEFAULT_RECEIVER_ENABLED): receiver_port = (config.get('receiver', {}) .get('port', (config.get('sender', {}) .get('port', DEFAULT_RECEIVER_PORT)))

Slide 12

Slide 12 text

Problem: Validation 12 DEFAULT_RECEIVER_ENABLED = False DEFAULT_RECEIVER_PORT = 42 if config.get('receiver_enabled', DEFAULT_RECEIVER_ENABLED): receiver_port = config.get('receiver', {}).get('port', DEFAULT_RECEIVER_PORT) if isinstance(receiver_port, str): receiver_port = int(receiver_port) elif not isinstance(receiver_port, int): raise ValueError("sender_port not configured") logging.info(f"receiver listening on port {sender_port}") with TCPServer(("", PORT), SimpleHTTPRequestHandler) as httpd: httpd.serve_forever() Traceback (most recent call last): File "/Users/phistep/Projects/mediaire/pycon-2024/server.py", line 8, in with socketserver.TCPServer(("", PORT), Handler) as httpd: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/[email protected]/3.11.7/Frameworks/Python.framework/ Versions/3.11/lib/python3.11/socketserver.py", line 456, in __init__ self.server_bind() File "/opt/homebrew/Cellar/[email protected]/3.11.7/Frameworks/Python.framework/ Versions/3.11/lib/python3.11/socketserver.py", line 472, in server_bind self.socket.bind(self.server_address) OverflowError: bind(): port must be 0-65535.

Slide 13

Slide 13 text

Default Constants Input Transformations Keys During Loading Signature kwargs Docstsrings README Defaults File Rendered Docs / Wiki 13 Problem: Documentation PYTHON sender: port: 80 YAML DEFAULT_SENDER_ENABLED = True if __name__ == '__main__': if config.get('sender_enabled', DEFAULT_SENDER_ENABLED) sender_port = (config.get('sender', {}) .get('port', DEFAULT_SENDER_PORT)) logging.info(f"sender enabled on port {sender_port}”) class Sender: def __init__(port: int = DEFAULT_SENDER_PORT): """ :param port: The Port to send the data to. """

Slide 14

Slide 14 text

14 Options Fully Fletched Config Manager Data Validation facebookresearch/ hydra ElektraInitiative/ libelektra dynaconf/ dynaconf beetbox/ confuse dnephin/ PyStaticConfiguration python-jsonschema/ jsonschema keleshev/ schema lidatong/ dataclasses-json marschmallow-code/ marshmallow pyeve/ cerberus

Slide 15

Slide 15 text

Pydantic 15 *v1 class SenderConfig(BaseModel): host: str port: int = 80 1 validation error for SenderConfig host must not be empty (type=value_error)

Slide 16

Slide 16 text

Pydantic Models 16 PYTHON PYTHON external_data = { 'sender': {'host': 'example.org'}, 'receiver_enabled': True, 'receiver': {'port': 1337}, 'license_key': 'fsf89sdf98s9dff0ssdf09fs' } from pydantic import BaseModel class SenderConfig(BaseModel): host: str port: int = 80 class ReceiverConfig(BaseModel): port: int class Config(BaseModel): sender_enabled: bool = True sender: SenderConfig receiver_enabled: bool = False receiver: Optional[ReceiverConfig] = None license_key: str config = Config(**external_data) REPL Config(sender_enabled=True, sender=SenderConfig(host="example.org" port=80), receiver_enabled=True, receiver=ReceiverConfig(port=1337), license_key="fsf89sdf98s9dff0ssdf09fs")

Slide 17

Slide 17 text

Constraints 17 PYTHON Orientation = Literal['sagittal', 'axial', 'coronal', 'original'] #: DICOM header tag name. See `DICOM Ch6 Registry of DICOM Data Elements #: `_. DicomTag = str #: Maps DICOM tag to value filter Filter = Dict[DicomTag, str] #: Single or multiple :data:`Filter` s FilterList = Union[Filter, List[Filter]] #: Report language Language = Literal['de', 'en', 'it'] PYTHON #: Time of the day, in the format ``HH:MM``. TimeOfDay = constr(regex=r'\d\d:\d\d') #: Port number Port = conint(ge=1, le=65535)

Slide 18

Slide 18 text

BaseSettings 18 1. Initializer 2. Environment 3. .env 4. Secrets Directory 5. Default Values SHELL PYTHON REPL class SenderConfig(BaseModel): host: str port: int = 80 class MyConfig(BaseSettings): class Config: env_nested_delimiter = '__' sender_enabled: bool = True sender: SenderConfig license_key: SecretStr export SENDER_ENABLED=true export SENDER__PORT=8080 >>> config = MyConfig( ... sender=SenderConfig(host="localhost"), ... license_key="supersecret" ... ) >>> print(config.license_key) print(config) host="example.org" port=80 key=SecretStr("*********") >>> ********* >>> print(config.license_key.get_secret_value()) supersecret

Slide 19

Slide 19 text

Transformations 19 Fallbacks PYTHON @validator('port', always=True) def fallback_sender_port(cls, v, values): if (not v and 'sender' in values)): return values['sender'].port return v Dynamic Values PYTHON #: Number of parallel processes. #: Default: number of CPU cores in system n_proc: Optional[PositiveInt] @validator('n_proc', always=True) def n_proc_cpu_count(cls, v, values): return v if v is not None else cpu_count() Deprecation Warnings PYTHON #: Deprecated build_ppc: Optional[bool] = None @validator('build_ppc', always=True) def deprecated_build_ppc(cls, v, values): if v is not None: warnings.warn( "Architecture PPC is deprecated", DeprecationWarning ) return v

Slide 20

Slide 20 text

Tooling: Deserialization 20 PYTHON class EnvConfig(MdsuiteConfig): @classmethod def load(cls, log: bool = False) -> 'EnvConfig': envconfig_mdsuite = read_yaml( MdsuiteConstants._get_envconfig_path('', config_dir=path) ) obj = {**envconfig_mdsuite} for product in cls._PRODUCT_MODELS: try: if product == MdsuiteConstants.MDSUITE: continue envconfig_path = \ MdsuiteConstants._get_envconfig_path(product) obj[product] = read_yaml(envconfig_path) except FileNotFoundError: logger.warning( f"'{product}' config not found in '{envconfig_path}'") config = cls.parse_obj(obj) if log: logger.info(pformat(config.dict())) return config

Slide 21

Slide 21 text

Tooling: Default Dumping 21 Full Default Config Easier Initial Setup Setting Required to None and Omitting PYTHON @classmethod def dump(cls, out_dir: Optional[str] = None) def _get_default_path(path: str) -> str: root, ext = os.path.splitext(path) return root + '.default' + ext for product, Model in cls._PRODUCT_MODELS.items(): path = MdsuiteConstants._get_envconfig_path(product, out_dir) default_path = get_default_path(path) os.makedirs(os.path.dirname(default_path), exist_ok=True) model_args = {} if product == MdsuiteConstants.MDSUITE: model_args = dict(client_api=ClientApiConfig( client_name='CLIENT_NAME', api_token='API_TOKEN' )) default_config = Model(**model_args).dict(exclude_none=True) if product == MdsuiteConstants.MDSUITE: default_config['client_api']['client_name'] = None default_config['client_api']['api_token'] = None with open(default_path, 'w', encoding='utf-8') as f: yaml.dump(default_config, f) logger.info( f"'{product}' default config written to '{default_path}'")

Slide 22

Slide 22 text

Tooling: Permissive Loading 22 PYTHON if validate: config = cls.parse_obj(obj) else: logger.warning('EnvConfig validation is not enforced') # use validate_model to expand the input transformations and # log the errors without raising expanded, _, errors = validate_model(cls, obj) logger.warning(str(errors)) # construct the object with the expanded values, # invalid ones will be replaced with default choices config = cls.construct_relaxed(expanded) @classmethod def construct_relaxed(cls, values: dict, _fields_set: Optional[Set[str]] = None, model: Optional[BaseModel] = None) -> MdsuiteConfig: """Support nested models without validation. Pydantic's native `construct` method does not unpack nested models, but we need this for the transition period. """ if model is None: model = cls m = model.__new__(model) fields_values = {} for name, field in m.__fields__.items(): key = field.alias # this check is necessary or Optional fields will crash if key in values: try: if not issubclass(field.type_, BaseModel): raise AttributeError if field.type_.__custom_root_type__: fields_values[name] = field.outer_type_( __root__=values[key]) elif field.shape == 2: fields_values[name] = \ [cls.construct_relaxed(e, model=field.type_) for e in values[key]] else: fields_values[name] = cls.construct_relaxed( values[key], model=field.outer_type_ ) except AttributeError: if values[key] is None and not field.required: fields_values[name] = field.default else: fields_values[name] = values[key] except Exception as e: logger.exception(e) elif not field.required: try: fields_values[name] = field.default except AttributeError: # .get_default was introduced in 1.7 fields_values[name] = m.__field_defaults__[name] # add extra fields (not defined in model) for k, v in values.items(): if k not in fields_values: fields_values[k] = v object.__setattr__(m, '__dict__', fields_values) if _fields_set is None: _fields_set = set(values.keys()) object.__setattr__(m, '__fields_set__', _fields_set) try: m._init_private_attributes() except AttributeError: # Private model attributes were introduced in 1.7 and can be # ignored when not supported pass return m PYTHON

Slide 23

Slide 23 text

Tooling: Doc Rendering 23 sphinx-doc.org sphinx-doc.org sphinx-contrib/ confluencebuilder Gitlab CI sphinx-contrib/ autodoc_pydantic Full Module Documentation ------------------------- For the loader class, see :class:`md_commons.envconfig.envconfig.EnvConfig`. .. toctree:: :hidden: mdsuite.rst .. autosummary:: :toctree: _autosummary :recursive: envconfig index.rst extensions = [ 'sphinx.ext.autodoc', 'sphinx.ext.autosummary', 'sphinx.ext.intersphinx', 'sphinxcontrib.autodoc_pydantic', 'sphinxcontrib.confluencebuilder', ] add_module_names = False python_use_unqualified_type_names = True autosummary_generate = True autoclass_content = 'class' autodoc_class_signature = 'separated' autodoc_inherit_docstrings = False autodoc_default_options = { 'members': True, 'undoc-members': True, 'private-members': True, 'exclude-members': '_abc_impl', } autodoc_typehints_format = 'short' autodoc_preserve_defaults = True autodoc_type_aliases = { 'Port': 'md_commons.envconfig.types.Port', } autodoc_pydantic_model_show_validator_members = False autodoc_pydantic_model_show_validator_summary = False autodoc_pydantic_model_show_json = False autodoc_pydantic_model_show_field_summary = False intersphinx_mapping = { 'python': ('https://docs.python.org/3', None), } conf.py pages: stage: release image: ubuntu:20.04 before_script: - apt-get update - apt-get install -y make python3 python3-venv script: - make envconfig_docs - cp -r envconfig_docs/_build/html public artifacts: paths: - public rules: - if: >- $CI_COMMIT_REF_NAME == $CI_DEFAULT_BRANCH && $CI_COMMIT_TAG .gitlab-ci.yml

Slide 24

Slide 24 text

Tooling: Incremental Adoption 24 Use BaseModel.dict() to just get validation Replace defaults code Pass Model objects Use TypeChecker PYTHON DEFAULT_SENDER_ENABLED = True DEFAULT_SENDER_CONFIG = {'host': None, 'port': 80} class Sender: def __init__(self, config: dict): self.port = config.get('port', DEFAULT_SENDER_CONFIG['port']) if __name__ == '__main__': with open('config.yml') as f: config_dict = yaml.safe_load(f) config = Config(**config_dict) config = yaml.safe_load(f) class SenderConfig(BaseModel): host: str port: int = 80 class Config(BaseModel) sender_enabled: bool = True if config.get('sender_enabled', DEFAULT_SENDER_ENABLED) sender = Sender(config.get('sender', DEFAULT_SENDER_CONFIG)) if config.sender_enabled: sender = Sender(config.sender .dict() .dict()) adopt.py:22: error: Argument 1 to "Sender" has incompatible type "SenderConfig"; expected "dict[Any, Any]" [arg-type] Found 1 error in 1 file (checked 1 source file)

Slide 25

Slide 25 text

Summary 25 Pydantic Settings and Transformations Strict Spec for Config Automation and Tooling

Slide 26

Slide 26 text

www.mediaire.ai mediaire.jobs.personio.de github.com/phistep docs.pydantic.dev 26 thanks, connect, src philippstephan.de /blog/posts/pycon24-talk/

Slide 27

Slide 27 text

Bonus Content 27

Slide 28

Slide 28 text

Problem: Migrations

Slide 29

Slide 29 text

Tooling: Migrations 29

Slide 30

Slide 30 text

Pydantic V2 30