Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building your Enterprise Knowledge Graph one da...

Building your Enterprise Knowledge Graph one data product at a time

This slide deck, used for my Connected Data London 2024 masterclass, explores a distributed, iterative, and value-driven approach to building Enterprise Knowledge Graphs.

Traditional top-down ontology design doesn’t scale, leaving many data assets disconnected. This presentation demonstrates how principles from data mesh can be extended to knowledge management (knowledge mesh), enabling teams to manage simpler ontologies as products and compose them into an enterprise-wide knowledge graph.

Key topics include the role of data products, mapping data to ontologies to create a virtual knowledge graph, and using it to augment generative AI models. Participants will learn how to incrementally build scalable, connected knowledge ecosystems to drive innovation and actionable insights.

📽️ The recording of the masterclass is available here: https://2024.connected-data.london/talks/building-your-enterprise-knowledge-graph-one-data-product-at-a-timee/

Andrea Gioia

December 11, 2024
Tweet

More Decks by Andrea Gioia

Other Decks in Technology

Transcript

  1. Andrea Gioia Hi there 👋 I'm Andrea Gioia, CTO at

    Quantyca, and Co-founder of blindata.io With 20+ years in the game, I have navigated the data universe up and down, one project at a time. LinkedIn: /andreagioia Github: /andrea-gioia
  2. Andrea Gioia Hi there 👋 I'm Andrea Gioia, CTO at

    Quantyca, and Co-founder of blindata.io With 20+ years in the game, I have navigated the data universe up and down, one project at a time. LinkedIn: /andreagioia Github: /andrea-gioia https://amzn.eu/d/7WsNmFC
  3. Knowledge 5 String Thing Concept Evokes Refers to Stands for

    Knowledge refers to the information, understanding, and skills that people acquire through experience, education, or reasoning. Knowledge can be thought of as the internal representation of external realities, whether they are physical, conceptual, or abstract. Knowledge Model Language System External Reality Sign Unit of Meaning “T R E E”
  4. The meaning of meaning The relationships between different concepts within

    the knowledge model are what produce meaning. The meaning is in the structure not in the content. I know the concept of tree because i know that … … a tree is not a mineral … a tree is a plant … a tree is not a brush … plus other facts that I have learned through experience or reasoning … a tree has a trunk 6
  5. Semiotics and structuralism Lecture 8 of Introduction to Theory of

    Literature by Yale Professor Paul Fry is a great introduction to semiotics and structuralism. YouTube: https://youtu.be/VsMfaIOsT3M?si=mn-zFkIXEo_lwshQ
  6. Implicit & explicit knowledge 8 I know the concept of

    tree because i know that … … a tree is not a mineral … a tree is a plant … a tree is not a brush … plus other facts that I have learned through experience or reasoning … a tree has a trunk Mineral Natural thing Plant Bush Tree Trunk :type of :type of :type of :type of :has Implicit & Personal Knowledge Model Explicit & Shared Knowledge Model Formalized into Grounded on 8
  7. Ontology 9 Mineral Natural thing Plant Bush Tree Trunk :type

    of :type of :type of :type of :has An ontology is … … an explicit specification of a conceptualization (Gruber 93) … a formal specification of a shared conceptualization (Borst 97) … a formal, explicit specification of a shared conceptualization (Studer 98) SHARED FORMALIZED
  8. Ontology representations 10 A [Tree] <has> a [Trunk] A [Tree]

    <is> a [Plant] A [Bush] <is> a [Plant] A [Plant] <is> a [Natural Thing] A [Stone] <is> a [Natural Thing] :Tree a rdfs:Class ; rdfs:subClassOf :Plant ; dcterms:hasPart :Trunk . :Trunk a rdfs:Class . :Bush a rdfs:Class ; rdfs:subClassOf :Plant . :Plant a rdfs:Class ; rdfs:subClassOf :NaturalThing . :Stone a rdfs:Class ; rdfs:subClassOf :NaturalThing . :NaturalThing a rdfs:Class . Triples Triples encoded in Turtle
  9. Ontology types An ontology is a formal representation of knowledge,

    defining concepts and their complex relationships. • A named authority is a type of ontology that standardizes names of entities without defining relationships between them. • A taxonomy is a type of ontology that organizes concepts through hierarchical relationships • A thesaurus is is a type of ontology that connects concepts through hierarchical and associative relationships.
  10. Ontology as graph 12 A triple is made up of

    a subject, a predicate, and an object. In a graph • subjects and objects can be represented as nodes, • predicates can be represented as directional relationships. This means a collection of triples can naturally be represented as a graph. While triples are ideal for exchange information about ontologies, graphs excel at putting that information into action. :subClassOf :subClassOf [PLANT] [NATURAL THING] [STONE] :hasPart :subClassOf :subClassOf [TREE] [TRUNK] [BUSH]
  11. Knowledge graph 13 Ontology Data stores Instances Concepts Mappings Mineral

    Natural thing Plant Bush Tree Trunk :type of :type of :type of :type of :has :instance of :instance of
  12. Enterprise Knowledge Graph 14 Upper ontology Number of concepts Applicability

    High Low Context Dependent Context Independent Enterprise Ontology Domain ontology Physical Data Subdomain ontologies An enterprise knowledge graph is built on an enterprise ontology that defines the core concepts shared across the organization. Different business domains can extend the concepts of this enterprise ontology. This ensures semantic interoperability of ontologies and data across domains.
  13. Other knowledge models 15 While ontologies are focused on static,

    structured knowledge representation within a specific domain, world models represent a more dynamic, agent-centric understanding of the world that adapts over time. ONTOLOGIES WORLD MODELS Structure Scope Purpose represent knowledge within a specific domain, defining concepts and relationships. represent an agent's broader understanding of the world, including interactions and dynamic processes. use static, formal relationships (e.g., hierarchy, part-whole). are dynamic, incorporating causal and temporal relationships. enable shared understanding and reasoning within a domain. help agents predict, learn, and make decisions about the world.
  14. Data is the new oil, right? 18 According to the

    International Accounting Standards Board (IASB), an asset is a “resource controlled by an entity as a result of past events, from which future economic benefits are expected to flow to the entity.” So data is an asset like oil, but does not obey the same laws of economics that other assets do.
  15. The nature of data as an asset 19 Like other

    organisational assets, data has a cost (how much it costs to acquire, store and maintain it) and a value (how much it is worth to the organisation). However this is where the similarity ends. Data is in fact... 1. infinitely sharable 2. more valuable the more it is used 3. perishable 4. more valuable the more it is accurate 5. more valuable the more it is combined with other data 6. less valuable when there are too many 7. not depletable https://www.researchgate.net/profile/Faris_Alshubir i/post/How_to_determine_information_asset_value /attachment/59d6278679197b8077985d05/AS%3A 326144877449217%401454770408208/download/1 000.pdf
  16. Data only has value in use 20 -$$$ +$$$ Without

    use, data is just a liability
  17. Information architecture DATA ONTOLOGIES + Understand KNOWLEDGE INTELLIGENCE INSIGHT +

    Act METADATA Comprehend INFORMATION + Data management cannot be limited to just managing data Data management must manage the whole information architecture Collect Processing Cognizing Reasoning Sensing
  18. It’s all about context 22 Data Knowledge Intelligence Data Asset

    Potential Value 28 Collect Information C° London 18/9/2024 Comprehend City Product Country Whether Report Whether Condition Understand Let’s do a promotion on ice creams Act
  19. Greater the Context, Greater the Value 23 An ounce of

    data is worth a pound of information, an ounce of information is worth a pound of knowledge, and an ounce of knowledge is worth a pound of wisdom Source: https://youtu.be/EbLh7rZ3rhU?si=VF_8fvQMSpu9tapS&t=345
  20. Enterprise Ontology DATA KNOWLEDGE INFORMATION Processing Cognizing Reasoning Sensing Modeling

    the information architecture An enterprise knowledge graph is an elegant way to represent the entire information architecture in a single model that is • Shared • Formal • Computable Upper ontology Domain ontology Physical Data Subdomain ontologies
  21. Why build an EKG? 25 Data increases in value the

    more it is used that is, it exhibits increasing returns to use. The major cost of information is in its capture, storage and maintenance— the marginal costs of using it are almost negligible. Data generally becomes more valuable when it can be compared and combined with other data. Connecting different data sources enables deeper insights, uncovers patterns, and supports better decision-making.
  22. The semantic gap 26 Different data assets are composable when

    they are interoperable at the following levels: 1. Technological 2. Syntactic 3. Semantic Semantic interoperability is a major challenge.
  23. Lost in translation 27 Modeling knowledge has a cost, but

    how much does it cost not to do it?
  24. Economic perspective 28 The control of money and the control

    of things no longer dominate the economy. Today, the real control rests with the flow of information and knowledge. The means of production have shifted to those who can handle and process information. From being organized around the flow of things and the flow of money, the economy is being organized around the flow of information. Increasingly, businesses are knowledge-based and knowledge-driven. Peter Drucker
  25. Why build an EKG, now? 30 Domain Ignorance LLMs does

    not know nothing about your specific domain Accuracy LLMs are often prone to hallucinations Costs Augment or fine tuning LLM with custom data can be expensive Explainability LLMs often operate as black boxes, making it difficult to understand or trace how they arrive at their conclusions Market Volatility It’s difficult to build a future proof technology foundation Gen AI
  26. EKGF to the rescue 31 Cons: • Implicit Knowledge •

    Hallucination • Indecisiveness • Black-box • Lacking Domain-specific & New Knowledge Pros • General Knowledge • Language Processing • Generalizability Gen AI Knowledge Graph Statistical AI Symbolic AI Neuro-Symbolic AI Pros: • Structured Knowledge • Accuracy • Decisiveness • Interoperability • Domain-specific Knowledge Cons: • Incompleteness • Lacking Language Understanding • Unseen Facts
  27. Bipolar organizations 33 Increase variety (Engineer to Order) Decrease variations

    (Make to Stock) Economy of Differentiation Economy of Scale Monolithic data platform Siloed data applications
  28. Recommoning data management 34 Increase variety (Engineer to Order) Decrease

    variations (Make to Stock) Standardize to differentiate (Compose to Order) Economy of Differentiation Economy of Scope Economy of Scale Monolithic data platform Siloed data applications Information Architecture
  29. Data management is everybody's business 35 From here on, we

    will see how to mobilize the entire organization to build an enterprise knowledge graph in a value-driven and sustainable way. DATA INFORMATION KNOWLEDGE INTELLIGENCE Data Scientists Data Engineers Information Architects & Data Stuarts Ontologists & Business Experts Cross Functional Team
  30. Data Product A data product is a sw product that

    facilitates an end goal through the use of data A data product is a sw product that facilitates an end goal driven by data
  31. Pure Data Product A pure data product is a modular

    unit within the data architecture, tailored to the cognitive capacity of the responsible team and developed following product management principles to make a data asset accurate, relevant, combinable, and readily usable for future value creation.
  32. Pure Data Product’s “ilyties” Data Product Data Asset Accuracy Relevance

    Reusability Composability VALUE Identify and maintain Share and multiply
  33. Beware of stock Pure data products that don’t support analytical

    applications (stock to order) don’t make sense, as unused data becomes a non-productive cost.
  34. Pure data product anatomy Applications Data Asset +Metadata Infrastructure Interfaces

    Interfaces INTERNAL COMPONENTS Application components acquire, transform, and share data Infrastructural components provide storage and compute resources INTERFACE COMPONENTS Input Ports Output Ports Discovery Ports Observability ports Control Ports
  35. Data Contracts Syntactic & tech. interoperability Data Product Data Contract

    Schema Constraints API Populate Accepts & consume Shared Lifecycle Metadata Data
  36. Data contract specifications 48 Data Contract Specification Open Data Contract

    Standard Open Data Product Specification Data Product Descriptor Specification DATA PRODUCT DATA CONTRACT
  37. Data Product Descriptor Specification DPDS 50 DPDS is an open

    specification that declaratively defines a data product in all its components using a JSON or YAML descriptor document. More info here: dpds.opendatamesh.org
  38. Interface component definitions 51 Promesis Through promises, the data product

    declares the port's intent. Examples of promises are descriptions of services’ API, SLO, deprecation policy, etc. Expectations Through expectations the data product declares how it wants the port to be used by its consumers. Examples of expectations are intended usage, intended audience, etc. Obligations Through obligations the data product declares promises and expectations that must be respected both by itself and its consumers respectively. Examples of contracts are terms of conditions, SLA, billing policy, etc. DPDS uses the following concepts of promises theory to formally describe the interface components of a data product
  39. Example of API specs 54 DATA PRODUCT Output Ports AsyncAPI

    for realtime notifications OpenAPI for point reads DatastoreAPI for analytical queries
  40. Schema definition 55 Every API specification contains a section to

    describe the schema of the exposed data. Typically, this description is based on a standard schema definition languages such as: • JSON Schema • Avro • Protobuf
  41. Schema annotations 56 In a Schema Definition Language, annotations are

    special keywords that provide additional information about the schema without affecting its validation process.
  42. Unit of data and metadata management 57 DATA PRODUCT Discovery

    ports Output ports Data Plane Information Plane Data Product Team
  43. Knowledge Mesh 59 Self-serve Platform X as a Product X

    Domain Ownership Computational Governance Knowledge Mesh Data Mesh Socio Technical
  44. Knowledge as a product 60 Enterprise Ontology Upper Ontology Domain

    Ontologies Subdomain Ontologies Ontology Lifecycle
  45. 61 A federated modeling team composed of representatives from each

    business domain manage the definition of the enterprise ontology. This team can further organize by dividing responsibilities based on data domains. Marketing Sales EMEA Sales Nordics Operations Business Domains Knowledge Domains Ontology Factual Data Customer Product Order Knowledge Domain Owner Knowledge Domain Owner Knowledge Domain Owner Business Domain Owner Business Domain Owner Business Domain Owner Business Domain Owner Knowledge domain ownership
  46. Self-serve Platform 63 Utility Plane Control Plane Experience Plane IT

    Landscape SELF-SERVE PLATFORM Provides access to underlying infrastructural resources (e.g., storage and compute). Automates the design, development, deployment, and monitoring of knowledge products. Provides access to the enterprise knowledge graph
  47. Knowledge graph architectures 65 Data Centric Architecture LOAD MAP Knowledge

    Warehouse Architecture Logical Knowledge Warehouse Architecture Materialized Knowledge Graph Virtualized Knowledge Graph
  48. Federated conceptual modelling Federated Governance Federated Modelling Team Self-serve platform

    Schema Constraints API Enterprise Ontology Data Contracts Data Data Product Defines Populate Links to Semantic interoperability Syntactic & tech. interoperability Uses Enforces Promotes Data Product Team
  49. From data products to EKG 67 Upper ontology Semantic Interop.

    Data products Enterprise Ontology Data products 1. enable access to physical data asset 2. aggregate technical metadata related to exposed data 3. create the semantic link between physical data asset and business concepts modeled in the enterprise ontology Data products are a pivotal element in the incremental and distributed construction of a knowledge graph Domain ontology Physical Data Subdomain ontologies Syntactic Interop.
  50. Conceptual modeling process 70 Ontology Data products Knowledge Graph Linking

    Knowledge Plane Concepts + Relationships Information Plane Data + Metadata Data Management Solution Deploy Deploy Iterate Business Cases/Questions Modeling Team Data Product Team Business Analysis Knowledge Modeling Data Product(s) Implementation 70
  51. DATA INFORMATION KNOWLEDGE Data Product Developer Platform Data Product Catalog

    Knowledge Product Developer Platform XOps Platform 72
  52. DPROD The Data Product (DPROD) specification is a profile of

    the Data Catalog (DCAT) Vocabulary, designed to describe Data Products. 73
  53. DPROD (Example) Distribution Dataset Data Product :distribution of :processed By

    :has output ports :managed asset Customer REST API Customer Schema :has input ports :instance of :instance of :instance of :conform to :instance of Product Person Thing Data Service Active Customers DP Instance Food Products DP Instance FoodDP ACustDP Business level Ontologies Metadata-level Ontologies Factual Metadata DPs Upper Ontology DOLCE, BFO, SUMO, gist, … Domain ontologies FIBO Schema.org, Good Rel, … Metadata ontologies DCAT DPROD PROV-O ODRL R2RML … 74
  54. Unstructured data product 77 Structured Metadata Unstructured data /getText /searchText

    EXTRACT Structured Metadata Tf–idf index Embeddings Semantic Links 77
  55. Semantic linking GraphGeeks Talk Ep8 How To Create Knowledge Graphs

    from Unstructured Data (Paco Nathan) Going Meta S02E04 Ontology-driven end-to-end GraphRAG with the GraphRAG Python Package (Jesus Barrasa) 79
  56. Graph RAG GenAI Application Knowledge Graph Embeddings Question Response Graph

    Retrieval Relevant Context Question & Context Validate & Complete Response LLM 3X Improvement 4X Improvement No KG With KG - 1 With KG - 2 16,7% 54,2% 72,6%
  57. Adhocracy = Agentic mess Customers have opened N tickets this

    week Customers have purchased N items this week We have shipped N items to customers this week When they talk about customers, are they referring to exactly the same thing? Siloed ad-hoc context RAG 82
  58. IA SHARED DATA MESH KNOWLEDGE MESH Shared context = Agentic

    mesh Customers have opened N tickets this week Customers have purchased N items this week We have shipped N items to customers this week AI 83
  59. Meshing is an option, productizing is a must Data Products

    Knowledge Products AI Products VALUE Decentralization 84
  60. Metadata activation and automation 85 “Data Fabric Must Have the

    Ability to Create a Knowledge Graph That Can Operationalize the Data Fabric Design” Knowledge Graph Gen AI Provide context Activate Metadata • Metadata Hydratation • Knowledge Engineering • Intelligent Automation • Chat with data • Domain-specific AI Applications Man in the loop (Hybrid Intelligence)
  61. Information Architecture Flywheel 86 INTELLIGENCE Generate insight and drive actions

    KNOWLEDGE Extend the enterprise ontology INFORMATION Define data contracts DATA Implement data products Use Case gets context from assists with modeling answers questions start here… …and iterate