Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Intelligent Data Platform : Powered by Semantic...

Sponsored · Ship Features Fearlessly Turn features on and off without deploys. Used by thousands of Ruby developers.

Intelligent Data Platform : Powered by Semantic Layers, MCP & Agents - FOST API Days 2026 Singapore

Talk which discusses the self service analytics dream and the challenges when it's attempted by connecting agents directly to a data layer and introduces a unified semantic layer in between with a context assisted as well as a fully mediated pattern to make agents trustworthy . Demo to explain how a data layer only agent and a data + semantic layer agent behaves while answering business questions.

Avatar for Zabeer Farook

Zabeer Farook

April 14, 2026

More Decks by Zabeer Farook

Other Decks in Technology

Transcript

  1. Intelligent Data Platform : Powered by Semantic Layers, MCP &

    Agents How a strong semantic layer makes AI agents trustworthy
  2. HELLO !! I’m Zabeer Farook Principal Engineering Architect @ Credit

    Agricole CIB - Passionate about Data Architecture including Stream data processing & Event Driven Architecture as well as Cloud & DevOps. - Love travelling & exploring places
  3. Today’s Journey The Self-Service Analytics Dream 01 The Problem: Direct

    Agent Access Fails 02 03 The Missing Layer: Semantic Layer 04 The Semantic Layer as an API MCP: The Standard That Connects It All 05 Lakehouse Architecture - Recap 06 Reference Architecture & Demo 07 Key Takeaways . Q & A 08
  4. The Self-Service Analytics Dream Drag-and-drop dashboards. Faster than SQL. But

    still required analysts to build reports. Business users still couldn't ask their own questions BI TOOLS ERA . 2000s NATURAL LANGUAGE ERA . 2015 - 2022 AI AGENTS ERA . NOW NLP-powered BI. Ask questions in plain English. Close — but fragile. Narrow intent recognition. Fell apart on anything complex or ambiguous. Agents can reason, plan, and execute across data sources. For the first time, the gap between a business question and a trusted answer feels genuinely closable. The promise that has always been just out of reach Connecting agents directly to your data layer doesn't work the way we expect. BUT
  5. The Rise of Data & Analytic Agents AI Agents are

    Transforming Data & Analytics Natural Language Queries Autonomous Data Analysis Self-Service Analytics Conversational Interfaces THE OPPORTUNITY “Show me revenue trends by region” “Why did sales drop last month?” THE CHALLENGE BUSINESS USER ASK DIRECTLY: BUILDING AGENTS THAT ARE: ✅ Accurate (not hallucinating answers) ✅ Consistent (same metric = same answer) ✅ Explainable (shows logic & lineage) ✅ Trustworthy (Transparent) “Which customers are at risk of churn?” “What’s our order completion rate?”
  6. AI Data Agents : Dream Vs Reality ✅ Response within

    seconds ✅ Correct Definition (Net Vs Gross) ✅ Explainable ✅ Trustworthy USER “Show revenue by region for premium customers” THE DREAM AGENT (understands instantly) “Here is the net revenue for North America & EMEA” ❌ goes back and forth ❌ 5 different revenue formulas ❌ Black box SQL ❌ Users don’t trust it USER “Show revenue by region for premium customers” THE REALITY AGENT (clarification needed) • Sum total price or net? • Which column is revenue? • How to identify premium customers? THE GAP Agents need more than table schemas, They need BUSINESS CONTEXT
  7. Why It Fails: The Hallucination Problem Query : “Show Q1

    revenue for Premium Customers” ✗ AMBIGUITY IN FORMULA ✗ AMBIGUOUS BUSINESS TERM TotalPrice or extendedPrice or extendedPrice × (1−discount) ✗ WRONG JOIN PATH ★ CONTEXTUAL BLINDNESS Might guess 'PREMIUM' · Is it ('AUTOMOBILE','BUILDING') Used orders · Should use lineitem for revenue calculation Returns an inflated amount with full confidence. No signal it’s wrong What the Agent could generate? SELECT SUM(o.totalprice) as revenue FROM orders o JOIN customer c on o.custkey = c.custkey Where c.mktsegment in ( ‘AUTOMOBILE’, ‘BUILDING’) AND o.orderdate between ‘2026-01-01’ AND ‘2026-03-31’
  8. The familiar Fix THE ANSWER A Unified Semantic Layer to

    provide the BUSINESS CONTEXT LEGACY DB SCHEMA BUSINESS CONTEXT Image Credits: © marketoonist.com An army of Agents ?
  9. What Is a Semantic Layer? Not a single tool -

    a unified framework of components "A standardised framework that organises and abstracts organisational data — giving machines a way to understand data in context, not just access it”. Taxonomy / Business Glossary Ontology Knowledge Graph Metrics / KPI’s Metadata Context about data: lineage, quality, ownership, freshness Shared Vocabulary, term definitions, naming standards Formal model of how concepts relate Entities & relationships applied to real data at scale KPI formulas, filters, join logic - executable actions Your platform may implement one or all of these components. Most organisations have fragments but not always exposed to agents. Enables agents to choose the right data source Stops agents guessing terminologies Enables agents to understand rules of your domain Enables agents to navigate real-world data relationships Enables agents to get accurate metric values
  10. 01 Metadata orders.totalPrice Type: decimal · Currency: USD Grain: order

    · Refresh: daily Owner: Finance Structure & lineage 02 Glossary / Taxonomy "Premium Customer" = High value Customer → Bob qualifies as a Customer Governed definition 03 Ontology Customer —places→ Order Order —contains→ LineItem Premium Customer — is-a→ Customer Concept graph 04 Knowledge Graph Bob placed Order #123 $240 · Mar 20, 2026 → Bob is a Premium Customer Grounded instance 05 Metrics / KPIs This order contributes to: Revenue +$240 Gross Margin ~38% Retention rate ↑ Trusted numbers One order. Five lenses. Scenario : Bob places Order #123 — $240 on March 20, 2026. His account balance = 8500. A unified semantic layer is the trust layer — for every consumer, human or agent Customer — has → AccountBalance Premium Customer = AccountBalance > 8000 "Customer" = Account with at least 1 order Revenue = extendedPrice * (1 - discount) Single source of truth Humans, agents & BI tools consume identical definitions Deterministic for agents No hallucinated metrics — every answer traces to a governed definition Breaks data silos One semantic layer across teams, tools and consumer types Full auditability Every agent output is traceable to a metric, grain and refresh cycle Governed Access Control Agents only access what semantic layer explicitly expose
  11. Why Now ? The Invisible Layer Becomes Visible The semantic

    layer was never missing - it was built for BI tools, not for machines WHERE IT LIVED? WHAT AGENTS COULDN’T REACH? THE ARCHITECTURAL SHIFT LookML . Power BI Measures . Tableau calcs . MSBI metrics ✗ Embedded in BI Tools ✗ In Analyst heads Team wikis . onboarding docs . Slack threads . tribal knowledge ✗ Data dictionaries Spreadsheet . Confluence pages . . Scattered & unstructured Can’t reach inside a BI tool O Agents need context O Tight coupling breaks Works for dashboards, but fails for agents O Knowledge exists But trapped & non-callable Semantic definition no longer owned by a single tool ✅Detach from BI tools ✅ Unify as a standalone layer One platform-agnostic semantic layer for all consumers ✅ Expose to all consumers Serves BI . agents . search & any future consumers equally Served one consumer . Tightly Coupled . Invisible to outside The knowledge was always there, but in the wrong place Not a replacement for BI - a foundation for all consumers
  12. The Semantic layer as API Contract between your organisational knowledge

    and your agents AI Agent (Claude / GPT / any LLM) getLineage() lookupTerm() listComponents() traverseGraph() queryMetric() UNIFIED SEMANTIC LAYER - EXPOSED AS API returns answer returns context getOntology() PATTERN 1 - CONTEXT ASSISTED - Returns definitions . relationships . join paths . rules - Agent reasons from semantic context - Writes informed SQL - not blind guesses When a query needs some detailed context PATTERN 2 - FULLY MEDIATED - Semantic layer executes against Data Source - Returns trusted answer directly - Agent never writes SQL - Zero hallucination against covered definitions When a complete executable definition exist
  13. The Complete Solution Works on any data platform - the

    semantic layer is the constant Snowflake Databricks BigQuery Redshift Lakehouse User / Application “Show me Q1 revenue for premium customers” AI Agent (Claude / GPT / Any LLM) Plans . Reasons . Calls MCP tools . Synthesis results MCP Tools listMetrics() . queryMetric() . lookupTerm() . executeQuery() Unified Semantic Layer Metadata . Taxonomy . Ontology . Knowledge Graph . Metrics ANY DATA PLATFORM
  14. MCP - The bridge for agents to access semantic layer

    MCP (Model Context Protocol) is an open, standardized interface that enables LLMs to interact seamlessly and securely to external systems, API’s and data sources Communication Protocol -> JSON-RPC Transport Protocol -> Stdio / SSE / Streamable HTTP MCP makes the semantic API conversational - one server, any agent, any LLM USB - C for AI Agents
  15. Lakehouse Architecture - Recap Ingestion Layer Ingestion from real time

    sources like Kafka + Batch ingestion Storage Layer Object storage like S3, GCS, Minio. Data files stored as parquet, avro, orc Metadata Layer Made of open table formats like Iceberg and catalog to manage metadata files Processing Layer Raw data can be further processed in batch or streaming mode with engines like Spark , Flink Serving Layer Made of query engines like Trino to expose query and API capabilities to the consumption layer Consumption Layer AI/ML, BI/Reporting, Analytics & Visualization Open Architecture without vendor lock-in and offering the best of DWH + Data Lake
  16. Key Takeaways Data platform + Semantic Layer + MCP +

    Agent DIRECT AGENT ACCESS FAILS - SYSTEMATICALLY Connecting agents directly to your data layer produces confident wrong answers. No business context . No validation . Silent failures that nobody catches UNIFIED SEMANTIC LAYER IS THE MEANING API Metadata, Taxonomy, Knowledge Graph, Ontology, Metrics & rules - one unified framework. 1 2 LET AGENTS ACCESS SEMANTIC LAYER VIA MCP Expose semantic layer via MCP and let agents call meaning, not tables 3 4 START SMALL, BUILD THE FULL LAYER INCREMENTALLY Start to build the semantic layer incrementally. Let it grow along with your agents. INTELLIGENT DATA PLATFORM == > The data layer stores your data. The semantic layer stores your meaning . Expose both — and your agents become trustworthy