Upgrade to Pro — share decks privately, control downloads, hide ads and more …

From Monolith to Mesh: How to Model Data in the...

From Monolith to Mesh: How to Model Data in the Age of Data Products and AI Agents

This slide deck accompanies a talk presented at Data Mesh Live, exploring a question that is becoming increasingly important in the age of data products and AI agents: how should we model data to make it truly reusable?

The primary purpose of a data product is not simply to expose data, but to make that data easy to discover, understand, consume, and compose with other products. For this reason, data modeling is not a technical afterthought. It is a fundamental aspect of product design.

Over the last decade, the widespread adoption of schema-on-read approaches has often pushed modeling practices into the background. Yet, as organizations move toward distributed architectures built around data products, and as AI agents increasingly become consumers of those products, the need for explicit and well-designed data models is becoming central once again.

Fortunately, we are not starting from scratch. Established modeling approaches such as dimensional modeling, Data Vault, and the Unified Star Schema provide valuable foundations and proven design principles. The challenge lies in adapting these techniques to a world that is no longer organized around monolithic data warehouses and centralized data lakes, but around modular, distributed, and independently managed data products.

In this presentation, we revisit the most widely adopted data modeling techniques, examining their strengths, limitations, and areas of applicability. We then explore how these approaches can evolve to support modern data product architectures, enabling the creation of datasets that are not only fit for purpose, but also easy to reuse, combine, and govern at scale.

Finally, we discuss why data modeling remains one of the most critical human responsibilities in the era of AI. While agents may increasingly automate the implementation of physical models, people must still provide the principles, constraints, and architectural guidance that ensure data products remain understandable, maintainable, and composable over time.

Avatar for Andrea  Gioia

Andrea Gioia

June 13, 2026

More Decks by Andrea Gioia

Other Decks in Technology

Transcript

  1. T H E S P E A K E R

    S WHO WE ARE Andrea Gioia CTO at Quantyca, co-founder of Blindata . 20+ years navigating the data universe one project at a time. Author of Managing Data as a Product. Giorgio Tavecchia Data Strategy Advisor at Quantyca . Builds distributed data platforms in the field — where modeling choices either hold or break.
  2. 2 02 6 THE YEAR OF THE CONTEXT LAYER In

    theory, it’s a shared conceptual model for the entire organization, enabling both humans and AI agents to reason about reality in a consistent and coherent manner. In practice, today, it’s usually a pile of wide, denormalized tables built ad-hoc with some docs on top to make text-to-SQL work. Data plane Knowledge plane orders_wide finance_denorm sales_flat Intelligence plane Ad-hoc data views
  3. … data is 1. A by-product of projects 2. Fragmented

    3. Difficult to integrate cross platform 4. Mostly an IT problem 5. Not governed WAIT …. WE’VE BEEN HERE BEFORE
  4. Shameless reference OF COURSE, WE MUST! Last year, I proposed

    bringing Data Mesh principles to knowledge management. 2 02 5 CAN WE DO BETTER THIS TIME?
  5. T O D A Y BACK TO THE FUTURE The

    goal of knowledge mesh is to build a corporate-wide harmonized information architecture that maps meaning onto data, so agents can reason across the whole organization. Today's focus: the data plane and physical data modeling. Knowledge plane Intelligence plane Data plane semantic linking T O D A Y 2 02 5 Information plane
  6. semantic linking semantic linking C O N C E P

    T U A L v s P H Y S I C A L M O D E L I N G SHIFTING THE BURDEN Knowledge plane Information plane Knowledge plane Information plane Data plane Data plane Search FOR Data Search INTO Data COMPOSE Data If the context sits too far from how data is actually collected and modeled, complexity doesn't vanish, it relocates into the mapping , and we're back to square one.
  7. T H E O BV I O U S O

    B J E C T I O N CAN'T THE AGENTS JUST MODEL IT? Sure they will. Anyway, in the age of AI, the scarce skill is no longer modeling data. It's knowing how data should be modeled. We are moving from data modeling to model governance. Model Review and Steer Explain and guide Building model Governing model
  8. T I M E T O M O D E

    L A G A I N WE ALREADY HAVE A FOUNDATION TO BUILD ON Great analytical modeling techniques already exist. The problem: they were born for centralized, monolithic architectures . In a distributed, data- product world — can we still use them? Dimensional modeling Data Vault Unified Star Schema born for the monolith How we can adapt them to distributed, data- product world ?
  9. F O U N D A T I O N

    S A MODEL IS PURPOSEFUL SEMPLIFICATION "A simplified representation of a thing or phenomenon that intentionally emphasizes certain aspects while ignoring others. Abstraction with a specific use in mind." 01 02 03 Quality = how well it solves the problem, not how faithfully it mirrors reality Best model = fewest details that still solve the problem In data: the model is part of the solution, consumers need it to understand and use the data DO MA IN REA LI TY purposeful abstraction MO DE L A B C what matters for the problem When each product model is explicit and alive, they can compose into a coherent Distributed Enterprise Model
  10. T H E C O R E P R O

    B L E M DATA PRODUCTS BROKE THE OLD ASSUMPTIONS (?) M ON O LI T H D AT A PL AT FO R M One team. One model. One truth. Central authority designs, owns, and enforces consistency across the full model lifecycle. BEFORE UN I FI ED DA T A M O DEL All consumers share the same data model, no exceptions. One size fits all, whether it fits or not
  11. N teams. N models. No central authority. Ownership is distributed

    by design,but interoperability between models is never guaranteed. MESH T H E C O R E P R O B L E M DATA PRODUCTS BROKE THE OLD ASSUMPTIONS (?) SDM SI L OED DA T A M O DEL S SDM SDM SDM SDM A data model owned and defined by the Data Product. Optimized for its own needs, with no guarantees of alignment with other models in the Platform. Consuming a single, self-contained data model Joining data across multiple, independently owned models. How do these models interoperate?
  12. N teams. N models. No central authority. Ownership is distributed

    by design,but interoperability between models is never guaranteed. MESH T H E C O R E P R O B L E M DATA PRODUCTS BROKE THE OLD ASSUMPTIONS (?) CDM C OM P O SA BL E DA T A M O DELS CDM CDM CDM CDM Federated governance defines shared policies and guidelines to ensure models are composable and interoperable Policies F E D E R A T E D G O V E R N A N C E How do distributed models stay coherent — without a central team? THE QUESTION
  13. T H E D I ST R I BU T

    E D M O D E L I N G GA P TWO CONSTRAINTS Ownership Evolution Who owns the model? Each data product team owns its physical model. But shared concepts belong to nobody or everybody. Sales DP Mktg DP Finance DP Customer DP Traditional Modeling Model once. Ship. Move on. Changes are expensive. The model is frozen after delivery. Data Mesh Modeling Ship v1. Learn. Iterate. Forever. Every new consumer and every source change triggers a new version. The model is a living artifact. . v1.0 launch new consumer v1.1 new consumer v1.2 source changed v2.0 THE MO D EL IS NE V ER DO NE KEY SHIFT Stop thinking of modeling as a phase. Start thinking of it as a product practice — as continuous as the data itself.
  14. T H E C O R E P R O

    B L E M THE BIG PICTURE Raw data from one source system, syntactic transforms only, semantics unchanged. Connects multiple products through semantic normalization. Enriches upstream data with business logic. Owns what it generates.
  15. T H E M E T H O D THREE

    LENSES FOR EVERY TECHNIQUE The technique A focused look at how each modeling approach works, applied to our LuX retail example across facts, dimensions, and hubs 1 In the Distributed World How the technique maps onto domains and data products, and what it means for ownership boundaries in a mesh architecture 2 The limits and tradeoffs Where the approach starts to break, what tradeoffs it forces, and how its failure mode naturally points to the next technique 3
  16. 1 D I M E N S I O N

    A L M O D E L I N G FACTS AND DIMENSIONS Fact table at center: measurable business events Dimension tables around it: descriptive context Optimized for analytical queries — fast, simple JOINs STRENGTHS IN CONTEXT Intuitive for business users Fast aggregations with OLAP tools Iterative, business-driven development FACT_SALES - Sale_key - Date_key - Customer_key - Product_key - Store_key - Revenue - Units - Discount DIM_STORE - Store_key - Name - Region - Channel DIM_CUSTOMER - Customer_key - Name - Segment - Region DIM_DATE - Date_key - Year - Month - Quarter DIM_PRODUCT - Product_key - Category - Brand
  17. 2 I N T H E D I S T

    R I B U T E D W O R L D STAR SCHEMA & DATA PRODUCTS Facts and dimensions within a star schema map naturally onto Data Products AGGREGATED DPs FACT FACT_SALES DIM DIM_PRODUCT FACT FACT_SHIPMENT DIM DIM_DATE feeds SOURCE ALIGNED DPs RAW STAGING AREA CONSUMER ALIGNED DPs DATAMART QUARTER_SALES DATAMART YEAR_LOGISTIC Virtualized data products The responsibility for each piece of data they use remains with the upstream data product team that provides it
  18. 3 TH E L I M IT S WHO OWNS

    THE PRODUCT DIMENSION? Without a central integration layer, conformed dimensions and facts must be owned and that raises hard questions. AGGREGATED DPs DIM DIM_PRODUCT ??? feeds CONSUMER ALIGNED DPs DATAMART QUARTER_SALES Sales domain DATAMART YEAR_LOGISTIC Logistic domain CONFORMED DIMENSION Btw make sense to have a unique data product for each conformed dimension?
  19. 3 TH E L I M IT S WHO OWNS

    THE PRODUCT DIMENSION? FACT_SALES - Sale_key - … - Product_key - Revenue - Units - Discount DIM_PRODUCT - Product_key - Category - Brand FACT_SHIPMENT - Ship_key - … - Product_key - Volume - Weight Duplicate data, split ownership DIM_PRODUCT exists in two places with different attributes, different owners, and no guarantee of consistency. Ownership is arbitrary Assigning DIM_PRODUCT to Sales means Logistics manages data outside its domain — and vice versa. Conformed facts, same trap FACT_SALES spans in-store + online. Shared metrics need cross-domain coordination — or get duplicated too. Ownership fails, evolution stalls(?) Sales Business Domain Logistic Business Domain
  20. 1 D A T A V A U L T

    HUBS, LINKS AND SATELLITES Keys, relationships and descriptive attributes separated, but explicitly linked 01 HUBS Business keys — the identity layer 02 LINKS Relationships — the integration layer 03 SATELLITES Context & history — the attribute layer SAT_CUSTOMER CUSTOMER_HK → HUB_CUSTOMER LOAD_DATE with FK → PK LOAD_END_DATE null = current NAME SEGMENT ↺ append-only · new row on every change SAT_SALE_MEASURES SALE_HK → LINK_SALE LOAD_DATE with FK → PK QTY AMOUNT DISCOUNT ↺ append-only · full history preserved SAT_PRODUCT_INFO PRODUCT_HK → HUB_PRODUCT LOAD_DATE with FK → PK NAME BRAND CATEGORY ↺ append-only · full history preserved HUB_CUSTOMER CUSTOMER_HK hash key CUSTOMER_ID natural key LOAD_DATE timestamp RECORD_SOURCE source system BUSINESS ENTITY LINK_SALE SALE_HK hash key CUSTOMER_HK → HUB_CUSTOMER PRODUCT_HK → HUB_PRODUCT LOAD_DATE timestamp RECORD_SOURCE source system RELATIONSHIP HUB_PRODUCT PRODUCT_HK hash key SKU natural key LOAD_DATE timestamp RECORD_SOURCE source system BUSINESS ENTITY
  21. 2 I N T H E D I S T

    R I B U T E D W O R L D ONE HUB, TWO SATELLITES, TWO OWNERS Satellites belong to the data products. The conformed-dimension ownership problem simply dissolves. STRENGTHS IN CONTEXT Adding a new source = add Satellites, never change Hubs Full history preserved — every change is timestamped Natural partitioning by business entity → clean ownership SAT_PRODUCT_SALES PRODUCT_HK → HUB_PRODUCT LOAD_DATE with FK → PK … … … ↺ append-only · full history preserved HUB_PRODUCT PRODUCT_HK hash key SKU natural key LOAD_DATE timestamp RECORD_SOURCE source system BUSINESS ENTITY SAT_PRODUCT_LOGISTIC PRODUCT_HK → HUB_PRODUCT LOAD_DATE with FK → PK … … … ↺ append-only · full history preserved Sales Business Domain Logistic Business Domain
  22. X-Ops Platform Services 2 I N T H E D

    I S T R I B U T E D W O R L D HUBS AND LINKS ARE PLATFORM CONCERNS 01 02 03 DP sends natural keys to the platform On each load, the data product forwards the natural keys of new objects or relationships (e.g., SKU, customer ID) Platform integrates keys → Hub or Link If keys are new, the platform registers them in the Hub or Link and generates a hash key. Existing keys are simply looked up. Platform returns hash keys → DP populates Satellite The DP replaces natural keys with hash keys in its Satellite records. Natural keys should be retained alongside hash keys for resilience. KEY REGISTRATION Idempotent — same natural key always maps to same hash key DEDUPLICATION Same customer from ERP + CRM → single HUB_CUSTOMER entry LINK RESOLUTION Relationships declared by any domain, resolved once by the platform SATELLITE FREEDOM Domain adds attributes any time — no coordination with other teams SAT_PRODUCT_SALES PRODUCT_HK → HUB_PRODUCT LOAD_DATE with FK → PK … … … ↺ append-only · full history preserved HUB_PRODUCT PRODUCT_HK hash key SKU natural key LOAD_DATE timestamp RECORD_SOURCE source system BUSINESS ENTITY SAT_PRODUCT_LOGISTIC PRODUCT_HK → HUB_PRODUCT LOAD_DATE with FK → PK … … … ↺ append-only · full history preserved
  23. T H E H A R D R U L

    E RAW STAYS APART FROM BUSINESS LOGIC Source-aligned products expose the Raw Data Vault Order Info RAW SAT Shipment Info RAW SAT Product Info RAW SAT Aggregated Data Products expose the Business Data Vault Store KPI BIZ SAT e-comm KPI BIZ SAT Sales Margin BIZ SAT No business rules Append-only, stable Owned by source domain teams Business rules applied Validated and enriched Owned by analytical teams KPIs and transformations live in dedicated satellites, owned by specialized products — never tangled with the raw feed KEY SHIFT Clear ownership at every layer — source teams evolve independently, no central bottleneck
  24. HUB LINK HUB SAT SAT SAT 3 T H E

    L I M I T S AN UNDERCOVER GRAPH Hubs are nodes, Links are edges, Satellites are attributes. Physical and conceptual models are naturally aligned. Conceptual Model Physical Model semantic linking
  25. HUB SAT HUB LINK SAT SAT SAT 3 T H

    E L I M I T S GREAT TO INTEGRATE, HARD TO CONSUME The Data Vault is highly normalized — perfect for distributing ownership, but not easily consumable as-is. Someone has to reassemble it. "…how do I query all this?" We need a presentation layer Not a problem for the AI agent. A problem for your wallet.
  26. 1 U N I F I E D S T

    A R S C H E M A JOINING FACTS INFLATES THE NUMBERS Combine sales and shipments on one star and the fan & chasm traps multiply your measures. FAN TRAP Occurs when merging entities connected by a one-to-many relationship, where both entities contain measures CHASM TRAP Two facts share a dimension at different grain — their JOIN generates an uncontrolled cartesian product KEY RISK In a mesh, FACT_SALES and FACT_INVENTORY are owned by separate teams — no central guardian catches the grain mismatch before it silently corrupts the analysis. SalesID Client SalesDate ProductID SalesQTY SalesAmount 1 A 09-Jan PR01 1 100 2 A 09-Jan PR02 1 70 3 B 09-Jan PR02 2 140 4 B 09-Jan PR03 1 300 5 B 09-Jan PR01 40 4000 ShipmentID SalesID ShipDate ProductID ShipQTY ShipAmount 1 1 09-Jan PR01 1 100 2 2 09-Jan PR02 1 70 3 3 09-Jan PR02 2 140 4 4 09-Jan PR03 1 300 5 5 09-Jan PR01 10 1000 6 5 09-Jan PR01 30 3000
  27. 2 I N TH E D I S T R

    I B U T E D W O R L D THE BRIDGE TO THE SOLUTION The Unified Star Schema offers a modeling approach for data within the presentation layer designed to streamline the structure and improve the querying experience by eliminating errors caused by jointraps Stage ProductID SalesID ShipmentID SalesQTY SalesAmount ShipQTY ShipAmount Product PR01 Product PR02 Product PR03 Sales PR01 1 1 100 Sales PR02 2 1 70 Sales PR02 3 2 140 Sales PR03 4 1 300 Sales PR01 5 40 4000 Shipment PR01 1 1 100 Shipment PR02 2 1 70 Shipment PR02 3 2 140 Shipment PR03 4 1 300 Shipment PR01 5 10 1000 Shipment PR01 6 30 3000 PUPPINI BRIDGE 01 02 03 One central fact table consolidates all correlated sources — no separate fact table per use case UNION, not JOIN at load time → preserves original granularity, eliminates fan & chasm traps Sparse by design — stores only keys + measures; descriptive attributes stay in their own tables 04 Hubs, links and satellites already carry clean surrogate keys — so the platform generates a Puppini Bridge on demand when a consumer picks products from the marketplace. Marketplace ✓ Sales DP ✓ Shipments DP · Client DP
  28. 3 T H E L I M I T S

    A PRESENTATION LAYER, NOTHING MORE 01 High cardinality. The bridge table cross-joins every hub entity involved in a business process, producing a very sparse structure. Row counts can explode when many hubs participate. However, because the bridge is partitionable by business key and fully indexable, query engines handle it efficiently, keeping it manageable even at enterprise scale. 02 Needs a good Data Vault underneath. The USS does not generate its own keys: it inherits them. Clean, collision-free hash keys computed consistently across all satellites and links are the strict prerequisite that makes the bridge automatable. Dirty or inconsistent key logic in the vault propagates directly into unresolvable joins at the USS layer. 03 It's presentation. The Unified Star Schema is optimized for analytical consumption: fast, self-service, BI- friendly. It does not absorb raw events, manage historization, or enforce data contracts. The integrated model, whether Data Vault or another approach, must exist beneath it and remain the source of truth.
  29. T A K E - A W A Y EVERY

    TECHNIQUE IN ITS PLACE Consumer-aligned XOps Platform Customer HUB Sales Order HUB Sales LNK Puppini Bridge Source-aligned Aggregated Raw Data Vault Business Data Vault Star/USS Sales Domain Marketing Domain SAT SAP CUSTOMER SAT ADOBE CUSTOMER SAT SF CUSTOMER SAT SALES CUSTOMER SAT MARKETING CUSTOMER B2B NEW CUSTOMERS 2026 No single technique is enough. They combine.
  30. T A K E - A W A Y S

    PHYSICAL DATA MODELING STILL MATTERS 01 Context is king – don’t let your agents guess 02 Avoid semantic silos – don’t trap your agent in a box 03 Align conceptual and physical model – don’t let your agent get lost 04 Don’t reinvent the wheel Agents will do the modeling, but you still need to tell them how to model. When building AI Agents … When design the physical model … 05 Use the right technique in the right place 06 The modeling technique influences data product boundaries.