Expanding Horizons with Gen-AI: From Retrieval-Aug Pattern Apps to Real-Time Predictions and Beyond

Expanding Horizons with Gen-AI: From Retrieval-Aug Pattern Apps to Real-Time
Predictions and Beyond Khelan Modi Product Manager, Microsoft, USA Richa Gaur Product Manager, Microsoft, India

Agenda • Concepts • Gen-AI use-cases • Why Cosmos DB
for Gen-AI? • Customer Scenarios • Demos • Resources

Vector Embedding Feature Vector Ordered array of numbers typically created
by a human to train a model [Height, Weight, Age, Fur Length, Energy Level] Embedding Vector generated by a model that has semantic meaning “dog” "I took my dog for a walk": [1.5, -0.8, 2.1, ...] dog: [0.9, 0.3, 0.2, ...] Image of a dog: [0.9, 0.3, 0.2, ...] puppy: [0.88, 0.33, 0.21, ...] “dog”

Similarity Searching using Cosine Apple [10, 50] Banana [12, 48]
Dog [48, 12] Cat [50, 10] cos() = 0.99 which is similar cos() = 0.40 which is not similar

Azure Cosmos DB + AI Today

Azure Cosmos DB is the world’s most scalable AI database
Serverless, or provisioned throughput with autoscale Seamless global distribution Guaranteed speed at any scale Mission-critical reliability and security Built-in multi tenancy

Vector Search in Azure Cosmos DB for NoSQL Store data
+ vectors together Reduced Complexity & Cost Transactional Data & Vectors Optimized for App Developers Vector Search + Query Filters Combine with equality, range & spatial filters Optimize query focus Flexible Indexing Flat, quantized flat, and DiskANN indexing available Azure Cosmos DB for NoSQL Capabilities Serverless or provisioned throughput Built-in multitenancy Instant & dynamic autoscale <10ms point-reads Globally-replicated Industry-leading 99.999% SLA

Scalable and cost effective, ideal for multi-tenant apps Multi-modal database
with built-in DiskANN* *In preview, GA November ‘24 Algorithms Large Vectors { D1, D2, D3, D4, D5, …, D99, D100 } Compressed Vectors { D1, D2 .., D10 } Vector compression Quantization RAM Compressed vectors SSD Storage and graph construction Full vectors + graph Unlimited scale Low latency Robust to data changes Serverless

GenAI use cases with Azure Cosmos DB What Why When
Semantic Caching Drastically reduces latency Saves on Token consumption Reduces costs and latency for LLM Conversational context UX improvements LLM optimizations Auditing Chat History Retrieval Augmented Generation (RAG) Vector + Operational Database Personalize LLM on your data Cheaper than fine tuning Faster iteration on new data No ETL Consistent data Reduce complexity & costs Slow moving / static content FAQs, Policies… A MUST for Chat sessions Improving cost & performance Any workload for GenAI apps Data & vectors together Cosmos DB scale & performance

AI scenarios with Azure Cosmos DB for NoSQL Chat history
Retrieval Augmented Generation (RAG) Real-time Recommendations Real-time Anomaly Detection Multi-Agent AI Multi-tenant AI apps

Cosmos DB for GenAI applications

Azure Cosmos DB for NoSQL Oct 2024 Semantic Kernel LangChain
Native Vector Indexing and Search (Public Preview) DiskANN Index (Public Preview) Integrations

 Open AI stores ChatGPT conversations and all other user
interactions in Azure Cosmos DB, 40+ workloads ChatGPT scales with Azure Cosmos DB OpenAI Challenge • Meet incredible demand from traffic spikes, without having to worry about database operations Outcomes • Rapidly and seamlessly scaled as service grew, with zero downtime • Able to iterate fast on data shapes thanks to schemaless flexibility • Maintained high performance and availability Key Azure products used: Azure Cosmos DB Azure Kubernetes Service Azure AI Search

KymChat KymChat is an AI agent to streamline KPMG employee
operational tasks such as research, drafting proposals, documents, and communications. Leveraging Vector Search in Azure Cosmos DB for MongoDB (vCore) enabled KPMG to provide value to their employees at scale. Accuracy PCI, a key relevancy metric increased from 50% to 90%+ Scalability Performance improvements enabled rollout to all KPMG member firms Performance 7,000+ employees Up to 50% productivity gain KymChat demo at Ignite 2023

Dynamic Scaling (Per-region and per-partition billing)

Retrieval Augmented Generation Grounding the searches with vector data seamlessly
Support large Knowledge bases for ingestion and retrieval Semantic caching Prompt history Empower LLMs with Operational Data context 1 Documents Embedding model Chunked docs 2 embeddings 3 “Create a quiz with 10 questions based on xyz data” Embedding model User query 4 Vectorized prompt Chat Large language model (LLM) 7 Prompt + context 8 Generated output Vector search 6 User query 5 9 Response Retrieval augmented generation Prompt history &cache 9

Real-time Recommendation System Building real-time recommendation system for retail application
is a challenge needing huge engineering and operational effort. Earlier, it required specialized ML models and extensive data pipelines to build features for the models. Azure Cosmos DB utilizes transactional data to find similar products with its powerful DiskANN based vector search, in real-time.

This is what it looked like earlier

This is what it looks like now

Fraud detection Vector-Based Anomaly Detection: Uses embeddings to detect suspicious
transactions by analyzing location and transaction patterns. Cosmos DB Integration: Stores transactions and location embeddings for efficient querying and anomaly detection. Why Choose Azure Cosmos DB? • Transaction Storage: Data stored in Cosmos DB with location vectors. • Embeddings Generation: Converts geographical locations into vector embeddings using Azure OpenAI. • Vector Search: Identifies anomalies by comparing current transaction vectors with historical data.

Multi-tenancy needs Good isolation Low cost per tenant

How do we ensure tenant isolation? Isolate tenants by database
account Isolate tenants by partition key Shared throughput at the database level and/or dedicated throughput at the container level Share throughput across tenants grouped into the same container

Isolate by partition key Tenant 2 Tenant 3 Tenant 1
Share throughput across tenants grouped into the same container Lowest cost per tenant Easy querying across tenants Noisy neighbor Appropriate for workloads that do not need guaranteed RUs on a single tenant and can share Tradeoffs Azure Cosmos DB account Benefits

Isolate by database account Shared throughput at the database level
and/or dedicated throughput at the container level Tenant 1 Tenant 2 Tenant 3 Account Account Account Very easy tenant management Independent control of account level features Best security isolation (customer managed keys adds extra layer of security) Benefits High maintenance and dollar costs per tenant Hard to query across tenants Tradeoffs

Enterprise customers Free trial customers Isolate by database account Isolate
by partition key Hybrid Approach

Resources • Learn more about Vector search on Azure Cosmos
DB for NoSQL: aka.ms/CosmosDBVectorSearch • Learn more about vCore-based Azure Cosmos DB for MongoDB: aka.ms/tryvcore • Azure Cosmos DB AI samples: aka.ms/CosmosAISamples • Dynamic Scaling: aka.ms/dynamicscaling • AI Advantage offer: aka.ms/AIAdvantageBlog

Expanding Horizons with Gen-AI: From Retrieval-...

Expanding Horizons with Gen-AI: From Retrieval-Aug Pattern Apps to Real-Time Predictions and Beyond

Khelan Modi

More Decks by Khelan Modi

Other Decks in Technology

Featured

Transcript