Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Robust Configuration Management with Pydantic's Data Validation

Robust Configuration Management with Pydantic's Data Validation

We describe how we moved our configuration management system from a simple unstructured YAML format loaded into dictionaries into a fully formalized, typed, class-based system using [`Pydantic`'s][pydantic] data validation.

While simple enough to begin with, we discuss the problems that emerged from the lack of tight specification of our early configuration system: Missing ahead-of-time validation and resulting runtime errors; out-of-sync code and browsable user documentation; incompatible defaults and subtle differences in various separate parsers scattered throughout many microservices; duplicated and brittle fallback logic. Using a strict specification can mitigate these issues by enabling static validation of configuration files, automatic documentation generation, centralized defaults, and flexible data transformation.

After discussing various available configuration management systems, we explain
the motivation to hand-roll a simple system based on the data validation
library [`Pydantic`][pydantic]. Popularized by it's usage in [`FastAPI`][fastapi] has become the de-facto standard for data validation in Python. It's deep integration into Python's type annotation system makes it a powerful tool for configuration management.

After an introduction into [`Pydantic`][pydantic] capabilities and usage, specifically it's features tailored to configuration management ([`pydantic.BaseSettings`][basesettings]), we share some tips-and-tricks encountered while speccing out our configuration file format. Additionally, we share some inspiration on our internal tooling to load and validate configuration, render up-to-date browsable user documentation, integration with CI systems, and lessons learned for a incremental transition from the lose `dict`-based system to the strictly typed class-based type strict system powerd by [`Pydantic`][pydantic].

[pydantic]: https://pydantic.dev/
[fastapi]: https://fastapi.tiangolo.com/
[basesettings]: https://docs.pydantic.dev/latest/api/pydantic_settings/

Philipp Stephan

April 23, 2024
Tweet

More Decks by Philipp Stephan

Other Decks in Programming

Transcript

  1. Philipp Stephan, mediaire, PyCon Berlin 2024 Robust Con fi guration

    Management 1 WITH PYDANTIC'S DATA VALIDATION
  2. 2

  3. 3 DEFAULT_PORT = 80 PYTHON sender: port: 80 YAML --header

    "AcceptLanguage: de" Con fi guration
  4. Simple dict 4 PYTHON config = { 'sender_enabled': True, 'sender':

    {'host': 'example.org', 'port': 80} 'receiver_enabled': True, 'receiver': {'port': 1337}, 'license_key': 'fsf89sdf98s9dff0ssdf09fs' }
  5. Simple dict 5 config = { 'sender_enabled': True, 'sender': {'host':

    'example.org', 'port': 80} 'receiver_enabled': True, 'receiver': {'port': 1337}, 'license_key': 'fsf89sdf98s9dff0ssdf09fs' } PYTHON { "sender_enabled": true, "sender": { "host": "example.org", "port": 80 }, "receiver_enabled": true, "receiver": { "port": 1337 }, "license_key": "ff89df98s9dff0df09f" } JSON receiver_enabled: yes sender: host: example.org port: 80 receiver_enabled: yes receiver: port: 1337 license_key: "ff89df98s9dff0df09f" YAML sender_enabled = true receiver_enabled = true license_key = "ff89df98s9dff0df09f" [sender] host = "example.org" port = 80 [receiver] port = 1337 TOML
  6. Simple dict 6 config = { 'sender_enabled': True, 'sender': {'host':

    'example.org', 'port': 80} 'receiver_enabled': True, 'receiver': {'port': 1337}, 'license_key': 'fsf89sdf98s9dff0ssdf09fs' } PYTHON
  7. dict Loading 7 config = { 'sender_enabled': True, 'sender': {'host':

    'example.org', 'port': 80} 'receiver_enabled': True, 'receiver': {'port': 1337}, 'license_key': 'fsf89sdf98s9dff0ssdf09fs' } PYTHON PYTHON dict.get(key, default=None) config.get('sender_enabled', False)
  8. dict Loading 8 PYTHON config.get('sender_enabled', False) config = { 'sender_enabled':

    True, 'sender': {'host': 'example.org', 'port': 80} 'receiver_enabled': True, 'receiver': {'port': 1337}, 'license_key': 'fsf89sdf98s9dff0ssdf09fs' } PYTHON
  9. PYTHON Problem: Defaults 9 DEFAULT_SENDER_ENABLED = False config.get('sender_enabled', config =

    { 'sender_enabled': True, 'sender': {'host': 'example.org', 'port': 80} 'receiver_enabled': True, 'receiver': {'port': 1337}, 'license_key': 'fsf89sdf98s9dff0ssdf09fs' } PYTHON False) DEFAULT_SENDER_ENABLED)
  10. PYTHON Problem: Defaults 10 DEFAULT_SENDER_ENABLED = False config.get('sender_enabled', DEFAULT_SENDER_ENABLED) config

    = { 'sender_enabled': True, 'sender': {'host': 'example.org', 'port': 80} 'receiver_enabled': True, 'receiver': {'port': 1337}, 'license_key': 'fsf89sdf98s9dff0ssdf09fs' } PYTHON if config.get('sender_enabled', DEFAULT_SENDER_ENABLED): sender_port = config.get('sender', {}).get('port', DEFAULT_SENDER_PORT) logging.info(f"sender enabled on port {sender_port}") DEFAULT_SENDER_PORT = 80
  11. 11 Problem: Fallbacks config = { 'sender_enabled': True, 'sender': {'host':

    'example.org', 'port': 80} 'receiver_enabled': True, 'receiver': {'port': 1337}, 'license_key': 'fsf89sdf98s9dff0ssdf09fs' } PYTHON PYTHON DEFAULT_SENDER_ENABLED = False DEFAULT_SENDER_PORT = 80 config.get('sender_enabled', DEFAULT_SENDER_ENABLED) if config.get('sender_enabled', DEFAULT_SENDER_ENABLED): sender_port = config.get('sender', {}).get('port', DEFAULT_SENDER_PORT) logging.info(f"sender enabled on port {sender_port}") if config.get('receiver_enabled', DEFAULT_RECEIVER_ENABLED): receiver_port = (config.get('receiver', {}) .get('port', (config.get('sender', {}) .get('port', DEFAULT_RECEIVER_PORT)))
  12. Problem: Validation 12 DEFAULT_RECEIVER_ENABLED = False DEFAULT_RECEIVER_PORT = 42 if

    config.get('receiver_enabled', DEFAULT_RECEIVER_ENABLED): receiver_port = config.get('receiver', {}).get('port', DEFAULT_RECEIVER_PORT) if isinstance(receiver_port, str): receiver_port = int(receiver_port) elif not isinstance(receiver_port, int): raise ValueError("sender_port not configured") logging.info(f"receiver listening on port {sender_port}") with TCPServer(("", PORT), SimpleHTTPRequestHandler) as httpd: httpd.serve_forever() Traceback (most recent call last): File "/Users/phistep/Projects/mediaire/pycon-2024/server.py", line 8, in <module> with socketserver.TCPServer(("", PORT), Handler) as httpd: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/[email protected]/3.11.7/Frameworks/Python.framework/ Versions/3.11/lib/python3.11/socketserver.py", line 456, in __init__ self.server_bind() File "/opt/homebrew/Cellar/[email protected]/3.11.7/Frameworks/Python.framework/ Versions/3.11/lib/python3.11/socketserver.py", line 472, in server_bind self.socket.bind(self.server_address) OverflowError: bind(): port must be 0-65535.
  13. Default Constants Input Transformations Keys During Loading Signature kwargs Docstsrings

    README Defaults File Rendered Docs / Wiki 13 Problem: Documentation PYTHON sender: port: 80 YAML DEFAULT_SENDER_ENABLED = True if __name__ == '__main__': if config.get('sender_enabled', DEFAULT_SENDER_ENABLED) sender_port = (config.get('sender', {}) .get('port', DEFAULT_SENDER_PORT)) logging.info(f"sender enabled on port {sender_port}”) class Sender: def __init__(port: int = DEFAULT_SENDER_PORT): """ :param port: The Port to send the data to. """
  14. 14 Options Fully Fletched Config Manager Data Validation facebookresearch/ hydra

    ElektraInitiative/ libelektra dynaconf/ dynaconf beetbox/ confuse dnephin/ PyStaticConfiguration python-jsonschema/ jsonschema keleshev/ schema lidatong/ dataclasses-json marschmallow-code/ marshmallow pyeve/ cerberus
  15. Pydantic 15 *v1 class SenderConfig(BaseModel): host: str port: int =

    80 1 validation error for SenderConfig host must not be empty (type=value_error)
  16. Pydantic Models 16 PYTHON PYTHON external_data = { 'sender': {'host':

    'example.org'}, 'receiver_enabled': True, 'receiver': {'port': 1337}, 'license_key': 'fsf89sdf98s9dff0ssdf09fs' } from pydantic import BaseModel class SenderConfig(BaseModel): host: str port: int = 80 class ReceiverConfig(BaseModel): port: int class Config(BaseModel): sender_enabled: bool = True sender: SenderConfig receiver_enabled: bool = False receiver: Optional[ReceiverConfig] = None license_key: str config = Config(**external_data) REPL Config(sender_enabled=True, sender=SenderConfig(host="example.org" port=80), receiver_enabled=True, receiver=ReceiverConfig(port=1337), license_key="fsf89sdf98s9dff0ssdf09fs")
  17. Constraints 17 PYTHON Orientation = Literal['sagittal', 'axial', 'coronal', 'original'] #:

    DICOM header tag name. See `DICOM Ch6 Registry of DICOM Data Elements #: <https://dicom.nema.org/medical/dicom/current/output/chtml/part06/ #: chapter_6.html>`_. DicomTag = str #: Maps DICOM tag to value filter Filter = Dict[DicomTag, str] #: Single or multiple :data:`Filter` s FilterList = Union[Filter, List[Filter]] #: Report language Language = Literal['de', 'en', 'it'] PYTHON #: Time of the day, in the format ``HH:MM``. TimeOfDay = constr(regex=r'\d\d:\d\d') #: Port number Port = conint(ge=1, le=65535)
  18. BaseSettings 18 1. Initializer 2. Environment 3. .env 4. Secrets

    Directory 5. Default Values SHELL PYTHON REPL class SenderConfig(BaseModel): host: str port: int = 80 class MyConfig(BaseSettings): class Config: env_nested_delimiter = '__' sender_enabled: bool = True sender: SenderConfig license_key: SecretStr export SENDER_ENABLED=true export SENDER__PORT=8080 >>> config = MyConfig( ... sender=SenderConfig(host="localhost"), ... license_key="supersecret" ... ) >>> print(config.license_key) print(config) host="example.org" port=80 key=SecretStr("*********") >>> ********* >>> print(config.license_key.get_secret_value()) supersecret
  19. Transformations 19 Fallbacks PYTHON @validator('port', always=True) def fallback_sender_port(cls, v, values):

    if (not v and 'sender' in values)): return values['sender'].port return v Dynamic Values PYTHON #: Number of parallel processes. #: Default: number of CPU cores in system n_proc: Optional[PositiveInt] @validator('n_proc', always=True) def n_proc_cpu_count(cls, v, values): return v if v is not None else cpu_count() Deprecation Warnings PYTHON #: Deprecated build_ppc: Optional[bool] = None @validator('build_ppc', always=True) def deprecated_build_ppc(cls, v, values): if v is not None: warnings.warn( "Architecture PPC is deprecated", DeprecationWarning ) return v
  20. Tooling: Deserialization 20 PYTHON class EnvConfig(MdsuiteConfig): @classmethod def load(cls, log:

    bool = False) -> 'EnvConfig': envconfig_mdsuite = read_yaml( MdsuiteConstants._get_envconfig_path('', config_dir=path) ) obj = {**envconfig_mdsuite} for product in cls._PRODUCT_MODELS: try: if product == MdsuiteConstants.MDSUITE: continue envconfig_path = \ MdsuiteConstants._get_envconfig_path(product) obj[product] = read_yaml(envconfig_path) except FileNotFoundError: logger.warning( f"'{product}' config not found in '{envconfig_path}'") config = cls.parse_obj(obj) if log: logger.info(pformat(config.dict())) return config
  21. Tooling: Default Dumping 21 Full Default Config Easier Initial Setup

    Setting Required to None and Omitting PYTHON @classmethod def dump(cls, out_dir: Optional[str] = None) def _get_default_path(path: str) -> str: root, ext = os.path.splitext(path) return root + '.default' + ext for product, Model in cls._PRODUCT_MODELS.items(): path = MdsuiteConstants._get_envconfig_path(product, out_dir) default_path = get_default_path(path) os.makedirs(os.path.dirname(default_path), exist_ok=True) model_args = {} if product == MdsuiteConstants.MDSUITE: model_args = dict(client_api=ClientApiConfig( client_name='CLIENT_NAME', api_token='API_TOKEN' )) default_config = Model(**model_args).dict(exclude_none=True) if product == MdsuiteConstants.MDSUITE: default_config['client_api']['client_name'] = None default_config['client_api']['api_token'] = None with open(default_path, 'w', encoding='utf-8') as f: yaml.dump(default_config, f) logger.info( f"'{product}' default config written to '{default_path}'")
  22. Tooling: Permissive Loading 22 PYTHON if validate: config = cls.parse_obj(obj)

    else: logger.warning('EnvConfig validation is not enforced') # use validate_model to expand the input transformations and # log the errors without raising expanded, _, errors = validate_model(cls, obj) logger.warning(str(errors)) # construct the object with the expanded values, # invalid ones will be replaced with default choices config = cls.construct_relaxed(expanded) @classmethod def construct_relaxed(cls, values: dict, _fields_set: Optional[Set[str]] = None, model: Optional[BaseModel] = None) -> MdsuiteConfig: """Support nested models without validation. Pydantic's native `construct` method does not unpack nested models, but we need this for the transition period. """ if model is None: model = cls m = model.__new__(model) fields_values = {} for name, field in m.__fields__.items(): key = field.alias # this check is necessary or Optional fields will crash if key in values: try: if not issubclass(field.type_, BaseModel): raise AttributeError if field.type_.__custom_root_type__: fields_values[name] = field.outer_type_( __root__=values[key]) elif field.shape == 2: fields_values[name] = \ [cls.construct_relaxed(e, model=field.type_) for e in values[key]] else: fields_values[name] = cls.construct_relaxed( values[key], model=field.outer_type_ ) except AttributeError: if values[key] is None and not field.required: fields_values[name] = field.default else: fields_values[name] = values[key] except Exception as e: logger.exception(e) elif not field.required: try: fields_values[name] = field.default except AttributeError: # .get_default was introduced in 1.7 fields_values[name] = m.__field_defaults__[name] # add extra fields (not defined in model) for k, v in values.items(): if k not in fields_values: fields_values[k] = v object.__setattr__(m, '__dict__', fields_values) if _fields_set is None: _fields_set = set(values.keys()) object.__setattr__(m, '__fields_set__', _fields_set) try: m._init_private_attributes() except AttributeError: # Private model attributes were introduced in 1.7 and can be # ignored when not supported pass return m PYTHON
  23. Tooling: Doc Rendering 23 sphinx-doc.org sphinx-doc.org sphinx-contrib/ confluencebuilder Gitlab CI

    sphinx-contrib/ autodoc_pydantic Full Module Documentation ------------------------- For the loader class, see :class:`md_commons.envconfig.envconfig.EnvConfig`. .. toctree:: :hidden: mdsuite.rst .. autosummary:: :toctree: _autosummary :recursive: envconfig index.rst extensions = [ 'sphinx.ext.autodoc', 'sphinx.ext.autosummary', 'sphinx.ext.intersphinx', 'sphinxcontrib.autodoc_pydantic', 'sphinxcontrib.confluencebuilder', ] add_module_names = False python_use_unqualified_type_names = True autosummary_generate = True autoclass_content = 'class' autodoc_class_signature = 'separated' autodoc_inherit_docstrings = False autodoc_default_options = { 'members': True, 'undoc-members': True, 'private-members': True, 'exclude-members': '_abc_impl', } autodoc_typehints_format = 'short' autodoc_preserve_defaults = True autodoc_type_aliases = { 'Port': 'md_commons.envconfig.types.Port', } autodoc_pydantic_model_show_validator_members = False autodoc_pydantic_model_show_validator_summary = False autodoc_pydantic_model_show_json = False autodoc_pydantic_model_show_field_summary = False intersphinx_mapping = { 'python': ('https://docs.python.org/3', None), } conf.py pages: stage: release image: ubuntu:20.04 before_script: - apt-get update - apt-get install -y make python3 python3-venv script: - make envconfig_docs - cp -r envconfig_docs/_build/html public artifacts: paths: - public rules: - if: >- $CI_COMMIT_REF_NAME == $CI_DEFAULT_BRANCH && $CI_COMMIT_TAG .gitlab-ci.yml
  24. Tooling: Incremental Adoption 24 Use BaseModel.dict() to just get validation

    Replace defaults code Pass Model objects Use TypeChecker PYTHON DEFAULT_SENDER_ENABLED = True DEFAULT_SENDER_CONFIG = {'host': None, 'port': 80} class Sender: def __init__(self, config: dict): self.port = config.get('port', DEFAULT_SENDER_CONFIG['port']) if __name__ == '__main__': with open('config.yml') as f: config_dict = yaml.safe_load(f) config = Config(**config_dict) config = yaml.safe_load(f) class SenderConfig(BaseModel): host: str port: int = 80 class Config(BaseModel) sender_enabled: bool = True if config.get('sender_enabled', DEFAULT_SENDER_ENABLED) sender = Sender(config.get('sender', DEFAULT_SENDER_CONFIG)) if config.sender_enabled: sender = Sender(config.sender .dict() .dict()) adopt.py:22: error: Argument 1 to "Sender" has incompatible type "SenderConfig"; expected "dict[Any, Any]" [arg-type] Found 1 error in 1 file (checked 1 source file)