Slide 1

Slide 1 text

Semantic Data Products Why, who and how

Slide 2

Slide 2 text

Modern data management Key principles Discovery & Observability Ports Output Ports Input Ports Control Ports Data Product Data & Metadata Infrastructure Code Data Product Owner Domain oriented decentralized data ownership Organizational Plane Federated Computational Governance Domain Agnostic Self- serve Data Platform Technological Plane Data Fabric

Slide 3

Slide 3 text

Quantyca We are a privately owned technological consulting firm specialized in data and metadata management Who we are We help our customers to fully exploit the potential hidden in their data in order to implement innovative omnichannel services improve the quality and the velocity of the decision making process

Slide 4

Slide 4 text

Data products centred architectures Adoption journey Foundation Source aligned data products Mobilization Domain aligned data products Optimize Value stream aligned data products Expand Domain aligned data products (just more domains) VALUE

Slide 5

Slide 5 text

Data products centred architectures Where semantics come into play Foundation Mobilization Expand Optimize Source aligned data products Domain aligned data products Value stream aligned data products Domain aligned data products (just more domains) Domain Semantics (aka ubiquitous languages) Unified Semantics S E M A N T I C C H A S M VALUE

Slide 6

Slide 6 text

Semantics Who owns it? DP Domain Aligned Data Products Marketing Domain Financial Domain Sales Domain XYZ Domain DP DP DP DP DP DP DP DP DP DP Value Stream Aligned Data Product Semantic shift Order to Cash Stream Procure to Pay Stream Customer Lifecycle Stream

Slide 7

Slide 7 text

Semantics Management models Semantic is defined at the source Consumers do the semantic shift Fully decentralized semantic management model Semantic is NOT defined at the source Consumers DON’T need to do the semantic shift A central modelling team define the shared semantic and does the semantic shift Fully centralized semantic management model Semantic shift is performed at the source through semantic linking Consumers DON’T need to do the semantic shift A central modelling team define the shared semantic Federated semantic management model Semantic silos Semantic inconsistencies across consumers Central team is a bottleneck, it doesn’t scale

Slide 8

Slide 8 text

Federated semantic management model A value driven approach to implementation Map all use cases for which centralizing conceptual modelling can produce value Prioritize mapped use cases in term of value and feasibility Select the use case to implement Plan Map the data needed by the use case Extend the central ontology with the concepts needed to describe the required data Annotate the needed data with semantic links to proper ontology concepts Execute Iterate

Slide 9

Slide 9 text

Federated semantic management model How it is implemented on the web A central team at schema.org define sand evolves the shared ontology The webpage are annotated at the source using JSONLD The consumer leverage the ontology and the semantic linking

Slide 10

Slide 10 text

Federated semantic management model How to implement it internally Data Product Descriptor Data Product Interfaces (aka Ports) Infrastructural Resources Application Resources Port API Expectations (ex.Terms & Conditions) Promises (ex. SLO) Schema Physical Model Semantic linking Constraints Product Team Modelling Team Enterprise Ontology This model decentralizes physical modelling and centralizes conceptual modelling to find the best balance between agility and control

Slide 11

Slide 11 text

Identify the ontology framework Start from a standard! The schemas are a set of 'types' (aka Business Concepts), each associated with a set of properties. The types are arranged in a hierarchy. Schemas are described by a versioned collection of JSONLD and SHACL files served through a REST API

Slide 12

Slide 12 text

Define the semantic structure Lightweight Ontologies and Glossaries Store defined concepts in a central catalogue Leverage catalogue functionalities to optionally add: Controlled vocabularies Concept classification Business rules Define and connect high-level conceptual entities (ontology)

Slide 13

Slide 13 text

Semantic Linking Connect physical data to ontology Use ontology to reason on data Connect data to semantic structure

Slide 14

Slide 14 text

Semantic Linking Example Here we link to a public ontology but of course it is also possible to link to an internal ontology or link to both

Slide 15

Slide 15 text

Build and index the complete graph LLM Graphs and search engines

Slide 16

Slide 16 text

Using semantics to supercharge LLM Overview Decentralized physical modelling Centralized conceptual modelling Enterprise Knowledge Graph Vector Database DP Descriptor DP Descriptor Data Product Data Product Physical Modelling Agent Serving Agent Conceptual Modelling Agent Enterprise Ontology Concept embedding Data product registry

Slide 17

Slide 17 text

LLM Agents Class and goals Physical Modelling Agent Product Team Given the following physical schema {schema}, what are the concepts present in the corporate ontology onto which the properties in the schema can be mapped? Conceptual Modelling Agent Modelling Team Given the following description of the "Order to Cash" process {description}, what are the key concepts involved? If possible, use the concepts already present in the corporate ontology; otherwise, propose new definitions as extensions of the existing ones. Serving Agent Consumers Where can I find information on the GCP's Data Warehouse regarding customers who have made a payment in the last month? Please generate a SQL!

Slide 18

Slide 18 text

LLM & Semantics Reinforcing feedback loop Semantic LLM Modelling Agents Product & Modelling Teams Extend & generate the semantic dynamically Insert semantic context into the prompt Serving Agents Consumers

Slide 19

Slide 19 text

Semantic data products Wrapping up • To reduce semantic fragmentation while maintaining operational scalability • To boost the automation of data management activities, following the data fabric approach Why • A centralized modelling team is responsible for define the corporate ontology, • Distributed product teams are responsible for semantically linking their data to the corporate ontology Who • With a pragmatic and value-driven approach to semantic modelling • Utilizing the semantic model in conjunction with GenAI to initiate a positive reinforcement loop How

Slide 20

Slide 20 text

Via A. Mauri, 22 20900 Monza (MB) [email protected] www.quantyca.it