Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Semantic Data Products: why, who and how

Andrea Gioia
September 25, 2023
9

Semantic Data Products: why, who and how

In today’s data-driven world, semantic data products are key to unlocking the full potential of your data. In this talk, I dive into what semantic data products are, why they’re important, and how to create and manage them effectively. You’ll leave with a clearer understanding of how to leverage semantic models to unlock the full potential of modern AI and improve data interoperability across your organization.

📽️ The recording of the talk is available here: https://www.youtube.com/watch?v=ap8pWiRBvJQ

Andrea Gioia

September 25, 2023
Tweet

Transcript

  1. Modern data management Key principles Discovery & Observability Ports Output

    Ports Input Ports Control Ports Data Product Data & Metadata Infrastructure Code Data Product Owner Domain oriented decentralized data ownership Organizational Plane Federated Computational Governance Domain Agnostic Self- serve Data Platform Technological Plane Data Fabric
  2. Quantyca We are a privately owned technological consulting firm specialized

    in data and metadata management Who we are We help our customers to fully exploit the potential hidden in their data in order to implement innovative omnichannel services improve the quality and the velocity of the decision making process
  3. Data products centred architectures Adoption journey Foundation Source aligned data

    products Mobilization Domain aligned data products Optimize Value stream aligned data products Expand Domain aligned data products (just more domains) VALUE
  4. Data products centred architectures Where semantics come into play Foundation

    Mobilization Expand Optimize Source aligned data products Domain aligned data products Value stream aligned data products Domain aligned data products (just more domains) Domain Semantics (aka ubiquitous languages) Unified Semantics S E M A N T I C C H A S M VALUE
  5. Semantics Who owns it? DP Domain Aligned Data Products Marketing

    Domain Financial Domain Sales Domain XYZ Domain DP DP DP DP DP DP DP DP DP DP Value Stream Aligned Data Product Semantic shift Order to Cash Stream Procure to Pay Stream Customer Lifecycle Stream
  6. Semantics Management models Semantic is defined at the source Consumers

    do the semantic shift Fully decentralized semantic management model Semantic is NOT defined at the source Consumers DON’T need to do the semantic shift A central modelling team define the shared semantic and does the semantic shift Fully centralized semantic management model Semantic shift is performed at the source through semantic linking Consumers DON’T need to do the semantic shift A central modelling team define the shared semantic Federated semantic management model Semantic silos Semantic inconsistencies across consumers Central team is a bottleneck, it doesn’t scale
  7. Federated semantic management model A value driven approach to implementation

    Map all use cases for which centralizing conceptual modelling can produce value Prioritize mapped use cases in term of value and feasibility Select the use case to implement Plan Map the data needed by the use case Extend the central ontology with the concepts needed to describe the required data Annotate the needed data with semantic links to proper ontology concepts Execute Iterate
  8. Federated semantic management model How it is implemented on the

    web A central team at schema.org define sand evolves the shared ontology The webpage are annotated at the source using JSONLD The consumer leverage the ontology and the semantic linking
  9. Federated semantic management model How to implement it internally Data

    Product Descriptor Data Product Interfaces (aka Ports) Infrastructural Resources Application Resources Port API Expectations (ex.Terms & Conditions) Promises (ex. SLO) Schema Physical Model Semantic linking Constraints Product Team Modelling Team Enterprise Ontology This model decentralizes physical modelling and centralizes conceptual modelling to find the best balance between agility and control
  10. Identify the ontology framework Start from a standard! The schemas

    are a set of 'types' (aka Business Concepts), each associated with a set of properties. The types are arranged in a hierarchy. Schemas are described by a versioned collection of JSONLD and SHACL files served through a REST API
  11. Define the semantic structure Lightweight Ontologies and Glossaries Store defined

    concepts in a central catalogue Leverage catalogue functionalities to optionally add: Controlled vocabularies Concept classification Business rules Define and connect high-level conceptual entities (ontology)
  12. Semantic Linking Connect physical data to ontology Use ontology to

    reason on data Connect data to semantic structure
  13. Semantic Linking Example Here we link to a public ontology

    but of course it is also possible to link to an internal ontology or link to both
  14. Using semantics to supercharge LLM Overview Decentralized physical modelling Centralized

    conceptual modelling Enterprise Knowledge Graph Vector Database DP Descriptor DP Descriptor Data Product Data Product Physical Modelling Agent Serving Agent Conceptual Modelling Agent Enterprise Ontology Concept embedding Data product registry
  15. LLM Agents Class and goals Physical Modelling Agent Product Team

    Given the following physical schema {schema}, what are the concepts present in the corporate ontology onto which the properties in the schema can be mapped? Conceptual Modelling Agent Modelling Team Given the following description of the "Order to Cash" process {description}, what are the key concepts involved? If possible, use the concepts already present in the corporate ontology; otherwise, propose new definitions as extensions of the existing ones. Serving Agent Consumers Where can I find information on the GCP's Data Warehouse regarding customers who have made a payment in the last month? Please generate a SQL!
  16. LLM & Semantics Reinforcing feedback loop Semantic LLM Modelling Agents

    Product & Modelling Teams Extend & generate the semantic dynamically Insert semantic context into the prompt Serving Agents Consumers
  17. Semantic data products Wrapping up • To reduce semantic fragmentation

    while maintaining operational scalability • To boost the automation of data management activities, following the data fabric approach Why • A centralized modelling team is responsible for define the corporate ontology, • Distributed product teams are responsible for semantically linking their data to the corporate ontology Who • With a pragmatic and value-driven approach to semantic modelling • Utilizing the semantic model in conjunction with GenAI to initiate a positive reinforcement loop How