Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Contract in Practice: from definition to ...

Data Contract in Practice: from definition to enforcing

In today’s decentralized data world, data contracts are essential for ensuring predictability and preventing uncontrolled entropy. In this talk at the Data Innovation Summit 2023, I explore what data contracts are, why they matter, and how to structure and enforce them effectively. You’ll walk away with a deeper understanding of data contracts and the tools needed to manage them successfully within organizations.

📽️ The recording of the talk is available here: https://www.youtube.com/watch?v=CKqSNn-7wiw

Andrea Gioia

May 25, 2023
Tweet

More Decks by Andrea Gioia

Other Decks in Technology

Transcript

  1. Data contract A data contract is a set of formal

    agreements between a data producer and a data consumer about the data generated and shared for production grade use cases. A good data contract should be ◦ clearly identifiable by name and version number (addressable) ◦ explicitly defined in a clear and unambiguous way, unnecessary information should be avoided (expressive and scoped) ◦ reasonably stable over time and achievable by all parties involved (stable and reliable) ◦ machine-readable (computable) Data what?
  2. Data contract’s anatomy The set of agreements that compose a

    data contract can be grouped into these three main categories: ◦ API: Agreements on how to access and consume the exposed data. Different standards exist to describe API (e.g., OpenAPI for REST services, AsyncAPI for streaming services, and DatastoreAPI for connection- based data services). ◦ Constraints: Agreements on how data is generated, exposed and consumed ◦ Semantics: Agreements on the meaning of the exposed data What’s inside?
  3. Data contracts monolithic data platform modularized data platform decentralized data

    platform Consumers Producers Integrations Paradigm shift application-centric paradigm data-centric paradigm Data Contract Why bother? Why now? expose ingest ingest Data management is moving toward more modular and decentralized architectures. Data contracts are a key pillar of this paradigm shift
  4. Data contracts & data product Are they the same thing?

    No but… Data Product Observability Ports Output Ports Input Ports Control Ports A data product it's the smallest unit that can be independently deployed and managed in a data architecture (i.e. architectural quantum). Data & Metadata Infrastructure Code Output Ports Input Ports Control Ports Services exposed by data product are described using service agreements. The service agreements associated to services that expose data are a data contract. It’s necessary for a data platform module in order to be considered a data product to expose data contracts. It is not sufficient anyway. Discovery Ports Observability Ports Discovery Ports Data Product Data & Metadata Infrastructure Code
  5. Adopting data contract Data contracts have real impacts when defined

    between producers and consumers owned by different teams. So adopting them is a socio-technical challenge. A socio-technological challenge
  6. Adopting data contract The tech side Enforce Define … through

    a formal and machine readable specification … through a platform capable to manage the contract lifecycle publish evolve Data Product Descriptor Specification Open Data Mesh Platform
  7. Data contract definitions Data product descriptor specification DPDS is an

    open specification that declaratively defines a data product in all its components using a JSON or YAML descriptor document. More info here: dpds.opendatamesh.org
  8. Data contract definitions Data product descriptor specification The DPDS divides

    the structure of a data product descriptor document in the following three main parts: 1. general info 2. interface components 3. internal components Note: The content of general info part and interface components part is shared with other data products and the platform to enable products discoverability and self-service usage. Internal components instead are accessible only to the product team and to the platform.
  9. Data Product Descriptor Data Product Public Interface Definitions Promesis Through

    promises, the data product declares the port's intent. Examples of promises are descriptions of services’ API, SLO, deprecation policy, etc. Expectations Through expectations the data product declares how it wants the port to be used by its consumers. Examples of expectations are intended usage, intended audience, etc. Contracts Through contracts the data product declares promises and expectations that must be respected both by itself and its consumers respectively. Examples of contracts are terms of conditions, SLA, billing policy, etc. DPDS uses the following concepts of promises theory to formally describe the interface components of a data product
  10. Data Product Descriptor Data Product Public Interface API The promise

    block describes the port API. The APIs are described using external standards according to the type of service (e.g. OpenAPI, AsyncAPI, etc…) The API definition generally contains: 1. data schema 2. endpoints 3. access modalities
  11. Data contract enforcing Open data mesh platform Open Data Mesh

    Platform is an open source platform that manages the full lifecycle of a data product from deployment to retirement. More info here: platform.opendatamesh
  12. Data contract enforcing (validation) Open data mesh platform References resolution

    Syntax validation Compatibility validation Compliance validation Syntax normalization Metadata publication Contract registration ODM Platform Registry Module Notification Service Policy Service Open Policy Agent Blindata Standard Interface Pluggable Implementation Collibra … Dara Product Experience Plane Infra Utility Plane Customer Infra Cue Lang … Git Repository descriptor files Dev branch descriptor files Main branch Merge Git hook or action /register endpoint
  13. Data contract enforcing (monitoring) Open data mesh platform ODM Platform

    Observability Module Data Product Container Output port Schema checks Quality checks Observability port SLO & SLA checks Trust Score Calculation Quality Service Great Expectations Standard Interface Pluggable Implementation Dara Product Experience Plane Infra Utility Plane Customer Infra SODA … Event management Metadata publication Notification Service Blindata Collibra … Runtime environment
  14. Adopting data contract The organizational side Data Contract expose ingest

    Anti corruption layer Data Contract Registry Producers must be accountable for the data they expose to their consumers, not the other way around. Operational producers are NOT an exception.
  15. Data contracts Adoption journey Automate, Automate, Automate Expand the scope

    with a value driven approach Start small, demonstrate the value Build a business case to sustain the scale out of the new operating model Align operational team incentives with the new operating model Measure, demonstrate the value, iterate Cross the chasm