Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Expanding Horizons with Gen-AI: From Retrieval-...

Khelan Modi
September 27, 2024

Expanding Horizons with Gen-AI: From Retrieval-Aug Pattern Apps to Real-Time Predictions and Beyond

Gen-AI has revolutionized the way we interact with technology, enabling natural language interactions between users and Large Language Models (LLMs) in Retrieval-Augmented Generation (RAG) Pattern applications. But the potential of Gen-AI extends far beyond this now familiar use-case.
In this session, we will explore how Gen-AI can be harnessed to architect and build a variety of innovative applications, including Real-Time Recommendation Engines, Real-Time Anomaly and Fraud Detection systems, and Multi-tenant AI apps. We will provide practical guidance on how to implement these applications, offering insights into the architectural considerations and best practices involved.
Whether you’re a business decision-maker looking to understand the potential of Gen-AI, a technical decision-maker planning your next AI project, or a developer keen to expand your skills, this session will provide valuable insights and practical guidance.
Join us as we delve into the exciting world of Gen-AI and discover how it can be used to create cutting-edge applications that push the boundaries of what’s possible

Khelan Modi

September 27, 2024
Tweet

More Decks by Khelan Modi

Other Decks in Technology

Transcript

  1. Expanding Horizons with Gen-AI: From Retrieval-Aug Pattern Apps to Real-Time

    Predictions and Beyond Khelan Modi Product Manager, Microsoft, USA Richa Gaur Product Manager, Microsoft, India
  2. Agenda • Concepts • Gen-AI use-cases • Why Cosmos DB

    for Gen-AI? • Customer Scenarios • Demos • Resources
  3. Vector Embedding Feature Vector Ordered array of numbers typically created

    by a human to train a model [Height, Weight, Age, Fur Length, Energy Level] Embedding Vector generated by a model that has semantic meaning “dog” "I took my dog for a walk": [1.5, -0.8, 2.1, ...] dog: [0.9, 0.3, 0.2, ...] Image of a dog: [0.9, 0.3, 0.2, ...] puppy: [0.88, 0.33, 0.21, ...] “dog”
  4. Similarity Searching using Cosine Apple [10, 50] Banana [12, 48]

    Dog [48, 12] Cat [50, 10] cos() = 0.99 which is similar cos() = 0.40 which is not similar
  5. Azure Cosmos DB is the world’s most scalable AI database

    Serverless, or provisioned throughput with autoscale Seamless global distribution Guaranteed speed at any scale Mission-critical reliability and security Built-in multi tenancy
  6. Vector Search in Azure Cosmos DB for NoSQL Store data

    + vectors together Reduced Complexity & Cost Transactional Data & Vectors Optimized for App Developers Vector Search + Query Filters Combine with equality, range & spatial filters Optimize query focus Flexible Indexing Flat, quantized flat, and DiskANN indexing available Azure Cosmos DB for NoSQL Capabilities Serverless or provisioned throughput Built-in multitenancy Instant & dynamic autoscale <10ms point-reads Globally-replicated Industry-leading 99.999% SLA
  7. Scalable and cost effective, ideal for multi-tenant apps Multi-modal database

    with built-in DiskANN* *In preview, GA November ‘24 Algorithms Large Vectors { D1, D2, D3, D4, D5, …, D99, D100 } Compressed Vectors { D1, D2 .., D10 } Vector compression Quantization RAM Compressed vectors SSD Storage and graph construction Full vectors + graph Unlimited scale Low latency Robust to data changes Serverless
  8. GenAI use cases with Azure Cosmos DB What Why When

    Semantic Caching Drastically reduces latency Saves on Token consumption Reduces costs and latency for LLM Conversational context UX improvements LLM optimizations Auditing Chat History Retrieval Augmented Generation (RAG) Vector + Operational Database Personalize LLM on your data Cheaper than fine tuning Faster iteration on new data No ETL Consistent data Reduce complexity & costs Slow moving / static content FAQs, Policies… A MUST for Chat sessions Improving cost & performance Any workload for GenAI apps Data & vectors together Cosmos DB scale & performance
  9. AI scenarios with Azure Cosmos DB for NoSQL Chat history

    Retrieval Augmented Generation (RAG) Real-time Recommendations Real-time Anomaly Detection Multi-Agent AI Multi-tenant AI apps
  10. Azure Cosmos DB for NoSQL Oct 2024 Semantic Kernel LangChain

    Native Vector Indexing and Search (Public Preview) DiskANN Index (Public Preview) Integrations
  11.  Open AI stores ChatGPT conversations and all other user

    interactions in Azure Cosmos DB, 40+ workloads ChatGPT scales with Azure Cosmos DB OpenAI Challenge • Meet incredible demand from traffic spikes, without having to worry about database operations Outcomes • Rapidly and seamlessly scaled as service grew, with zero downtime • Able to iterate fast on data shapes thanks to schemaless flexibility • Maintained high performance and availability Key Azure products used: Azure Cosmos DB Azure Kubernetes Service Azure AI Search
  12. KymChat KymChat is an AI agent to streamline KPMG employee

    operational tasks such as research, drafting proposals, documents, and communications. Leveraging Vector Search in Azure Cosmos DB for MongoDB (vCore) enabled KPMG to provide value to their employees at scale. Accuracy PCI, a key relevancy metric increased from 50% to 90%+ Scalability Performance improvements enabled rollout to all KPMG member firms Performance 7,000+ employees Up to 50% productivity gain KymChat demo at Ignite 2023
  13. Retrieval Augmented Generation Grounding the searches with vector data seamlessly

    Support large Knowledge bases for ingestion and retrieval Semantic caching Prompt history Empower LLMs with Operational Data context 1 Documents Embedding model Chunked docs 2 embeddings 3 “Create a quiz with 10 questions based on xyz data” Embedding model User query 4 Vectorized prompt Chat Large language model (LLM) 7 Prompt + context 8 Generated output Vector search 6 User query 5 9 Response Retrieval augmented generation Prompt history &cache 9
  14. Real-time Recommendation System Building real-time recommendation system for retail application

    is a challenge needing huge engineering and operational effort. Earlier, it required specialized ML models and extensive data pipelines to build features for the models. Azure Cosmos DB utilizes transactional data to find similar products with its powerful DiskANN based vector search, in real-time.
  15. Fraud detection Vector-Based Anomaly Detection: Uses embeddings to detect suspicious

    transactions by analyzing location and transaction patterns. Cosmos DB Integration: Stores transactions and location embeddings for efficient querying and anomaly detection. Why Choose Azure Cosmos DB? • Transaction Storage: Data stored in Cosmos DB with location vectors. • Embeddings Generation: Converts geographical locations into vector embeddings using Azure OpenAI. • Vector Search: Identifies anomalies by comparing current transaction vectors with historical data.
  16. How do we ensure tenant isolation? Isolate tenants by database

    account Isolate tenants by partition key Shared throughput at the database level and/or dedicated throughput at the container level Share throughput across tenants grouped into the same container
  17. Isolate by partition key Tenant 2 Tenant 3 Tenant 1

    Share throughput across tenants grouped into the same container Lowest cost per tenant Easy querying across tenants Noisy neighbor Appropriate for workloads that do not need guaranteed RUs on a single tenant and can share Tradeoffs Azure Cosmos DB account Benefits
  18. Isolate by database account Shared throughput at the database level

    and/or dedicated throughput at the container level Tenant 1 Tenant 2 Tenant 3 Account Account Account Very easy tenant management Independent control of account level features Best security isolation (customer managed keys adds extra layer of security) Benefits High maintenance and dollar costs per tenant Hard to query across tenants Tradeoffs
  19. Resources • Learn more about Vector search on Azure Cosmos

    DB for NoSQL: aka.ms/CosmosDBVectorSearch • Learn more about vCore-based Azure Cosmos DB for MongoDB: aka.ms/tryvcore • Azure Cosmos DB AI samples: aka.ms/CosmosAISamples • Dynamic Scaling: aka.ms/dynamicscaling • AI Advantage offer: aka.ms/AIAdvantageBlog