Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Knowledge-Driven Data Management

Knowledge-Driven Data Management

This slide deck, used for a talk at Big Data London 2024, explores the importance of building modular, distributed, and sustainable socio-technological architectures in data management.

In an era where data drives innovation, it's not enough to simply manage data; organizations must also manage the domain-specific knowledge necessary to interpret, use, and integrate data effectively in support of business strategies. The presentation delves into the integration of advanced knowledge frameworks into data management practices, demonstrating how organizations can transform raw data into actionable insights with improved speed and accuracy.

Key topics covered include the role of artificial intelligence and machine learning in enhancing data processing, the application of semantic technologies to improve data interoperability, and the use of knowledge graphs to create a more connected and intelligent data ecosystem. By embracing a knowledge-driven approach to data management, organizations can unlock the full potential of their data, driving innovation and achieving strategic goals in today’s fast-paced, volatile world.

📽️ The recording of the talk is available here: https://www.youtube.com/watch?v=ethMBybkkCI

Andrea Gioia

September 12, 2024
Tweet

More Decks by Andrea Gioia

Other Decks in Technology

Transcript

  1. Without knowledge, data is just a liability… … so we

    need to start treating knowledge as a first-class citizen in our information architectures… N O W
  2. From data to knowledge It’s all about context Data Knowledge

    Intelligence Potential Value 28 Collect Information C° London 18/9/2024 Comprehend City Product Country Whether Report Whether Condition Understand Let’s do a promotion on ice creams Act
  3. What is knowledge anyway Semantic triangle String Thing Concept Evokes

    Refers to Stands for Knowledge refers to the information, understanding, and skills that people acquire through experience, education, or reasoning. Knowledge can be thought of as the internal representation of external realities, whether they are physical, conceptual, or abstract. Knowledge Model Language System External Reality Sign Unit of Meaning “T R E E”
  4. Knowledge representation Ontologies and Knowledge Graphs Ontology Data stores Instances

    Concepts Mappings Mineral Natural thing Plant Bush Tree Trunk :type of :type of :type of :type of :has :instance of :instance of
  5. Data mesh Lost in translation Operations Domain Marketing Domain Sales

    Domain Logistics Domain #@% xyz $€£ a->b ???
  6. Data Mesh Moving from interoperability to composability Upper ontology Semantic

    Interop. Data products Enterprise Ontology Data products to be interoperable must standardize • access to physical data asset • access to technical metadata related to exposed data Data products to be composable must • define the semantic link between physical data asset and business concepts modeled in a shared enterprise ontology Domain ontology Physical Data Subdomain ontologies Syntactic Interop.
  7. GenAI Brace for impact Domain Ignorance LLMs does not know

    nothing about your specific domain Accuracy LLMs are often prone to hallucinations Costs Augment or fine tuning LLM with custom data can be expensive Explainability LLMs often operate as black boxes, making it difficult to understand or trace how they arrive at their conclusions Market Volatility It’s difficult to build a future proof technology foundation Gen AI
  8. GenAI Knowledge Graph to the rescue Cons: • Implicit Knowledge

    • Hallucination • Indecisiveness • Black-box • Lacking Domain-specific & New Knowledge Pros • General Knowledge • Language Processing • Generalizability Gen AI Knowledge Graph Statistical AI Symbolic AI Neuro-Symbolic AI Pros: • Structured Knowledge • Accuracy • Decisiveness • Interoperability • Domain-specific Knowledge Cons: • Incompleteness • Lacking Language Understanding • Unseen Facts
  9. GenAI Slowing down to go faster System 1 Fast thinking

    Fast but error prone Continuously scans environment Works via shortcuts and intuitions Good in processing signals (perception) Abstract via interpolation (continuous and geometric based) System 2 Slow thinking Slow but reliable Used for specific problem, only if necessary Works via planning Good in processing symbols (reasoning) Abstract via imagination (discrete and topology based) Visual Cortex Prefrontal Cortex Sensory inputs Programming and execution of behaviour Bottom-up Signals Top-down Signals Data-driven Knowledge-driven
  10. GenAI Graph RAG GenAI Application Knowledge Graph Embeddings Question Response

    Graph Retrieval Relevant Context Question & Context Validate & Complete Response LLM 3X Improvement 4X Improvement No KG With KG - 1 With KG - 2 16,7% 54,2% 72,6%
  11. Paradigm shift From data-first to knowledge-first Treating KNOWLEDGE as a

    first-class citizen STRINGS (Data) THINGS (Concepts) Business IT TOOLS HOW??? PEOPLE & PROCESSES SOCIO-TECHNICAL PROBLEM
  12. People Knowledge modeling String Thing Concept Evokes Refers to Stands

    for Implicit Knowledge Model(s) Language System External Reality Sign Unit of Meaning “T R E E” Every person represents knowledge through their own mental model. Developing a shared formal model for representing domain-specific knowledge is a deliberate, purposeful, and effort-intensive human-centric process that cannot be fully automated. Explicit Knowledge Model Knowledge Modeling
  13. Modeling process Responsibilities Federated Governance Federated Modelling Team Self-serve platform

    Schema Constraints API Enterprise Ontology Data Contracts Data Data Product Defines Populate Links to Semantic interoperability Syntactic & tech. interoperability Uses Enforces Promotes Data Product Team
  14. People Data domain ownership A federated modeling team composed of

    representatives from each business domain manage the definition of the enterprise ontology. This team can further organize by dividing responsibilities based on data domains. Marketing Sales EMEA Sales Nordics Operations Business Domains Data Domains Ontology Factual Data Federated Modeling Team Customer Product Order Data Domain Owner Data Domain Owner Data Domain Owner Business Domain Owner Business Domain Owner Business Domain Owner Business Domain Owner
  15. Modeling process Iterative and value driven Ontology Data products Knowledge

    Graph Linking Knowledge Plane Concepts + Relationships Information Plane Data + Metadata Data Management Solution Deploy Deploy Iterate Business Cases/Questions Modeling Team Data Product Team Business Analysis Knowledge Modeling Data Product(s) Implementation
  16. Knowledge mesh Knowledge as a product Enterprise Ontology Upper Ontology

    Domain Ontologies Subdomain Ontologies Ontology Lifecycle
  17. Knowledge Mesh Principles & Practices Self-serve Platform X as a

    Product X Domain Ownership Computational Governance Knowledge Mesh Data Mesh Socio Technical X = Data X = Data X = Business X = Knowledge
  18. Knowledge Mesh Technical architectures Data Centric LOAD MAP Knowledge Warehouse

    Ontology-based data access Materialized Knowledge Graph Virtualized Knowledge Graph
  19. DATA INFORMATION KNOWLEDGE Knowledge Mesh Knowledge Management Platform Distribution Dataset

    Data Product :distribution of :processed By :has output ports :managed asset Customer REST API Customer Schema :has input ports :instance of :instance of :instance of :conform to :instance of Product Person Thing Data Service Active Customers DP Instance Food Products DP Instance FoodDP ACustDP Business level Ontologies Metadata-level Ontologies Factual Metadata DPs Upper Ontology DOLCE, BFO, SUMO, gist, … Domain ontologies FIBO Schema.org, Good Rel, … Metadata ontologies DCAT DPROD PROV-O ODRL R2RML … Search FOR data COMPOSE data Search IN data Data Product Developer Platform Data Product Catalog Knowledge Base
  20. Beyond slideware DPDS, ODM and Blindata Ensure governance, compliance and

    promote composability Computational Governance Policies Standardization of Transactions: Blueprints, Contracts, Metadata & Semantic linking Self Serve Data platform: Deployment & Infrastructure Orchestration Information Plane: Enterprise Data Product Catalog Knowledge Plane: Ontologies & Metadata Management Knowledge Base Data Developer Platform Data Product Catalog Data Product Descriptor Specification Open Data Mesh Platform
  21. Beyond slideware DPDS, ODM and Blindata Knowledge management Manage the

    definition of the enterprise ontology. Data Product Catalog Collaborate, discover, and understand your data products Control plane Blueprints, DevOps, Computational Policies
  22. TAKEAWAYS If you really want to leverage GenAI and scaling

    Data Mesh Treat knowledge as first class citizen within your information architecture Manage knowledge as a graph of linked product Keep in mind that knowledge management is a socio-technical problem