Slide 1

Slide 1 text

Data Contract in Practice from definition to enforcing

Slide 2

Slide 2 text

Data contract A data contract is a set of formal agreements between a data producer and a data consumer about the data generated and shared for production grade use cases. A good data contract should be ○ clearly identifiable by name and version number (addressable) ○ explicitly defined in a clear and unambiguous way, unnecessary information should be avoided (expressive and scoped) ○ reasonably stable over time and achievable by all parties involved (stable and reliable) ○ machine-readable (computable) Data what?

Slide 3

Slide 3 text

Data contract’s anatomy The set of agreements that compose a data contract can be grouped into these three main categories: ○ API: Agreements on how to access and consume the exposed data. Different standards exist to describe API (e.g., OpenAPI for REST services, AsyncAPI for streaming services, and DatastoreAPI for connection- based data services). ○ Constraints: Agreements on how data is generated, exposed and consumed ○ Semantics: Agreements on the meaning of the exposed data What’s inside?

Slide 4

Slide 4 text

Data contracts monolithic data platform modularized data platform decentralized data platform Consumers Producers Integrations Paradigm shift application-centric paradigm data-centric paradigm Data Contract Why bother? Why now? expose ingest ingest Data management is moving toward more modular and decentralized architectures. Data contracts are a key pillar of this paradigm shift

Slide 5

Slide 5 text

Data contracts & data product Are they the same thing? No but… Data Product Observability Ports Output Ports Input Ports Control Ports A data product it's the smallest unit that can be independently deployed and managed in a data architecture (i.e. architectural quantum). Data & Metadata Infrastructure Code Output Ports Input Ports Control Ports Services exposed by data product are described using service agreements. The service agreements associated to services that expose data are a data contract. It’s necessary for a data platform module in order to be considered a data product to expose data contracts. It is not sufficient anyway. Discovery Ports Observability Ports Discovery Ports Data Product Data & Metadata Infrastructure Code

Slide 6

Slide 6 text

Adopting data contract Data contracts have real impacts when defined between producers and consumers owned by different teams. So adopting them is a socio-technical challenge. A socio-technological challenge

Slide 7

Slide 7 text

Adopting data contract The tech side Enforce Define … through a formal and machine readable specification … through a platform capable to manage the contract lifecycle publish evolve Data Product Descriptor Specification Open Data Mesh Platform

Slide 8

Slide 8 text

Data contract definitions Data product descriptor specification DPDS is an open specification that declaratively defines a data product in all its components using a JSON or YAML descriptor document. More info here: dpds.opendatamesh.org

Slide 9

Slide 9 text

Data contract definitions Data product descriptor specification The DPDS divides the structure of a data product descriptor document in the following three main parts: 1. general info 2. interface components 3. internal components Note: The content of general info part and interface components part is shared with other data products and the platform to enable products discoverability and self-service usage. Internal components instead are accessible only to the product team and to the platform.

Slide 10

Slide 10 text

Data Product Descriptor Data Product Public Interface Definitions Promesis Through promises, the data product declares the port's intent. Examples of promises are descriptions of services’ API, SLO, deprecation policy, etc. Expectations Through expectations the data product declares how it wants the port to be used by its consumers. Examples of expectations are intended usage, intended audience, etc. Contracts Through contracts the data product declares promises and expectations that must be respected both by itself and its consumers respectively. Examples of contracts are terms of conditions, SLA, billing policy, etc. DPDS uses the following concepts of promises theory to formally describe the interface components of a data product

Slide 11

Slide 11 text

Data Product Descriptor Data Product Public Interface API The promise block describes the port API. The APIs are described using external standards according to the type of service (e.g. OpenAPI, AsyncAPI, etc…) The API definition generally contains: 1. data schema 2. endpoints 3. access modalities

Slide 12

Slide 12 text

Data contract enforcing Open data mesh platform Open Data Mesh Platform is an open source platform that manages the full lifecycle of a data product from deployment to retirement. More info here: platform.opendatamesh

Slide 13

Slide 13 text

Data contract enforcing (validation) Open data mesh platform References resolution Syntax validation Compatibility validation Compliance validation Syntax normalization Metadata publication Contract registration ODM Platform Registry Module Notification Service Policy Service Open Policy Agent Blindata Standard Interface Pluggable Implementation Collibra … Dara Product Experience Plane Infra Utility Plane Customer Infra Cue Lang … Git Repository descriptor files Dev branch descriptor files Main branch Merge Git hook or action /register endpoint

Slide 14

Slide 14 text

Data contract enforcing (monitoring) Open data mesh platform ODM Platform Observability Module Data Product Container Output port Schema checks Quality checks Observability port SLO & SLA checks Trust Score Calculation Quality Service Great Expectations Standard Interface Pluggable Implementation Dara Product Experience Plane Infra Utility Plane Customer Infra SODA … Event management Metadata publication Notification Service Blindata Collibra … Runtime environment

Slide 15

Slide 15 text

Adopting data contract The organizational side Data Contract expose ingest Anti corruption layer Data Contract Registry Producers must be accountable for the data they expose to their consumers, not the other way around. Operational producers are NOT an exception.

Slide 16

Slide 16 text

Data contracts Adoption journey Automate, Automate, Automate Expand the scope with a value driven approach Start small, demonstrate the value Build a business case to sustain the scale out of the new operating model Align operational team incentives with the new operating model Measure, demonstrate the value, iterate Cross the chasm

Slide 17

Slide 17 text

Via A. Mauri, 22 20900 Monza (MB) www.quantyca.it