フォーム統合 最先端の 検索ランキング エンタープライズ 対応の基盤 Generally available Public preview Generally available ベクトル検索 Azure AI Search in Azure AI Studio セマンティックランカー 統合されたベクトル化 Generative AI での用途
になる可能性 →「十分に良い」根拠データの閾値を設定するの に役立ちます Source: Lost in the Middle: How Language Models Use Long Contexts, Liu et al. arXiv:2307.03172 50 55 60 65 70 75 5 10 15 20 25 30 Accuracy Number of documents in input context
for OpenAI embeddings | OpenAI Cookbook Azure Cognitive Search — 🦜🔗 LangChain 0.0.198 Azure Cognitive Search - LlamaIndex 🦙 0.8.46 (gpt-index.readthedocs.io) Azure-Samples/azure-search-openai- demo (github.com) Azure-Samples/chat-with-your-data- solution-accelerator: A Solution Accelerator for the RAG pattern running in Azure, using Azure Cognitive Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences. This includes most common requirements and best practices. (github.com) Azure-Samples/azure-search-comparison- tool: A demo app showcasing Vector Search using Azure Cognitive Search, Azure OpenAI for text embeddings, and Azure AI Vision for image embeddings. (github.com) Docs Vector search - Azure Cognitive Search | Microsoft Learn Azure/cognitive-search-vector- pr: Private repository for the Vector search feature in Azure Cognitive Search. (github.com) Azure OpenAI Service - Documentation, quickstarts, API reference - Azure Cognitive Services | Microsoft Learn Image Retrieval concepts - Image Analysis 4.0 - Azure Cognitive Services | Microsoft Learn Azure Cognitive Search: Outperforming vector search with hybrid retrieval and ranking capabilities - Microsoft Community Hub
statement representing your information need (e.g., search string) Object: Entity within your content collection (e.g., document, image, audio) Relevance: Quantitative measure of how well an object satisfies the intent of the query Ranking: Ordered list of relevant results based on their desirability or relevance score
Relevance via Boolean Search: Retrieve documents containing specific terms (e.g., "apples" AND "oranges") Ranking via BM25: A ranking algorithm influenced by 3 key factors: Term Frequency (TF): More occurrences of the search term indicate higher relevance Inverse Document Frequency (IDF): The rarer a term across documents, the more important it is Field Length: Terms found in shorter fields (fewer words) are more likely to be relevant than terms in longer fields (more words) BM25 formula
Strengths of Lexical and Semantic Approaches in Retrieval Discrete (Lexical) Representations Advantages: Exact matching Precise control and easy explainability Limitations: Struggle to capture nuances in language Limited understanding of conceptual similarity Dense (Vector/Semantic) Representations Advantages: Capture conceptual similarity Better understanding of language nuances Limitations: Not built to match exact terms Reduced explainability compared to discrete Bottom Line: Achieve optimal recall by leveraging the strengths of both discrete and semantic representations for a comprehensive understanding of language
books, each containing unique insights and knowledge Scenario Finding a book on a specific topic or theme can be time- consuming and overwhelming, especially when the content is scattered. The Challenge A skilled librarian can quickly connect you to books with similar topics or themes The Solution
semantically similar items from large-scale data sources. Scenario Sifting through large-scale databases to find related items can be resource- intensive and time- consuming. The Challenge By calculating similarity metrics like cosine similarity, vector search organizes data and retrieves semantically similar items within the high-dimensional space. The Solution
for efficient analysis and processing in various applications. Definition Abstract, dense, compact, learned numerical representations of data Map complex structures into simpler, fixed-size vectors Applicable to diverse data types (text, images, audio, etc.) Purpose Facilitate analysis and processing of diverse data types Enable similarity measurement, clustering, and classification Power applications like Vector search and recommendation systems Benefits Efficient search and organization of vast datasets Improved accuracy and relevance of search results Scalable and adaptable to various industries and use cases
vectors (embeddings) and find the most similar objects according to metric App/UX Images Audio Video Text Transform using embedding model Vector Representation Vector index Vector Representation Transform using embedding model -2, -1 , 0, 1 2, 3, 4, 5 6, 7, 8, 9 Results 2, 3, 4, 5 Data Sources ...and more! Azure Cognitive Search
Model for Your Use Case Model Characteristics Task Specificity Performance Context Awareness Model Size and Inference Speed Language Support Customizability (ability to fine-tune) Implementation Considerations Training Time and Complexity Pre-Trained Models Integration Community Support and Updates Cost We recommend Azure OpenAI sercice “text-embedding-ada-002” for text embeddings We recommend Azure AI Vision Image Retrieval API for image embeddings
AI Definition A fast search method for finding approximate nearest neighbors in high- dimensional spaces Purpose Applicable in image search, NLP, recommendation systems Benefits Provide faster search results in high-dimensional data by trading off a small degree of recall for significant performance gains compared to exhaustive vector search Vector indexes are data structures that let us perform approximate nearest neighbor search
Similarity Search HNSW (Hierarchical Navigable Small World) Hierarchical graph structure Fast search performance FLAT (Brute Force) Exhaustive search in high-dimensional data Slower but highly accurate LSH (Locality-Sensitive Hashing) Hash-based similarity search Trade-off between speed and accuracy IVF (Inverted File Index) Reduces search space using quantization Scalable and memory-efficient https://github.com/erikbern/ann-benchmarks When productionizing, it doesn’t matter what algorithm you chose, but rather the information retrieval system it’s part of.
their own, are simply just a data structure. • Limited Scalability: Struggle with vertical and horizontal scaling in large-scale data sets • Memory Constraints: High memory consumption affecting search performance and resource efficiency • Persistence and Durability: Lack of built-in mechanisms for data storage, recovery, and metadata management • Simplistic Query Support: Limited capabilities for combining sparse and dense retrieval methods • Hosting Challenges: Complex setup and hosting requirements • No Built-in Security Features: Open-source solutions often lack advanced security features • Embedding Management: Limited support for managing the embedding functions themselves
Most Relevant Data Back The Dream Scenario for Vector Search Effortless Data Management and Relevant Search Results Common Challenges - Scalability - Preprocessing - Splitting/Chunking - Embedding management - Query understanding - Query flexibility - Ranking accuracy - Result Diversity - Search algorithm
Scaling • Data Volume: Can the system handle increasing amounts of data? • Storage capacity and management • Indexing and search performance • Query Load: How well does the system respond to growing query demands? • Query execution speed and response times • Handling concurrent queries and user connections • Distributed Infrastructure: Does the system support distributed and parallel processing? • Horizontal scaling across multiple nodes • Load balancing and fault tolerance • Cost Efficiency: How does the system optimize resource usage and cost management? • Balancing performance and cost requirements • Efficient use of hardware and cloud resources Common Challenges - Scalability - Preprocessing - Splitting/Chun king - Embedding management - Query understanding - Query flexibility - Ranking accuracy - Result Diversity - Search algorithm
Search • Text Preprocessing: Ensuring clean and structured data for the embedding model • Tokenization (or segmentation): Breaking text into words, phrases, or symbols • Lowercasing and normalization: Standardizing text representation • Stopword removal: Eliminating common words with little semantic value • Stemming and lemmatization: Reducing words to their root forms • Document Splitting: Adapting documents to fit within embedding model limits • Chunking: Dividing long documents into smaller, manageable sections • Passage extraction: Identifying and retaining meaningful segments • Overlap management: Ensuring continuity and context preservation • Model Compatibility: Preparing data to align with the chosen embedding model • Input requirements: Adhering to model-specific formatting and length constraints • Vocabulary coverage: Maximizing the overlap between document vocabulary and model vocabulary • Evaluation and Iteration: Continuously improving preprocessing and splitting strategies • Performance monitoring: Assessing the impact of preprocessing and splitting on search quality • Strategy refinement: Adjusting techniques based on observed results and user feedback Common Challenges - Scalability - Preprocessing - Splitting/Chun king - Embedding management - Query understanding - Query flexibility - Ranking accuracy - Result Diversity - Search algorithm
• Embedding Quality: Ensuring high-quality and accurate vector representations • Selecting appropriate embedding models (e.g., OpenAI, BERT) • Fine-tuning models for domain-specific vocabulary and context • Dimensionality: Balancing embedding size and search performance • Reducing dimensions while retaining semantic information • Implementing dimensionality reduction techniques (e.g., PCA, t-SNE) • Indexing and Storage: Efficiently managing and storing embeddings • Using optimized data structures for quick look-up and retrieval (e.g., Approximate Nearest Neighbors) • Embedding Updates: Keeping vector representations up-to-date with evolving data • Incremental updates to embeddings based on new or updated documents • Periodic model retraining for continuous improvement and/or model version updating • Evaluation and Iteration: Continuously assessing and refining embedding management strategies • Monitoring performance metrics (e.g., search relevance, recall, precision) • Adjusting techniques based on observed results and user feedback Common Challenges - Scalability - Preprocessing - Splitting/Chun king - Embedding management - Query understanding - Query flexibility - Ranking accuracy - Result Diversity - Search algorithm
Query Understanding • Beyond Similarity: Addressing complex search scenarios beyond "most similar documents" • Understanding user intent: Identifying specific search goals and requirements • Query Flexibility: Supporting various search parameters and filters • Boolean operators: Handling AND, OR, and NOT conditions • Filtering and Faceting: Allowing users to filter results based on specific attributes • Query Transformation: Converting user queries into vector representations • Text-to-vector conversion: Transforming query text into compatible embeddings • Query expansion: Incorporating additional keywords or phrases to improve search relevance • Evaluation and Iteration: Continuously refining query language understanding • Monitoring query performance metrics (e.g., query success rate, user satisfaction) • Adjusting techniques based on observed results and user feedback Common Challenges - Scalability - Preprocessing - Splitting/Chun king - Embedding management - Query understanding - Query flexibility - Ranking accuracy - Result Diversity - Search algorithm
Diversity, and Adaptability • Ranking Accuracy: Ensuring highly relevant results are ranked at the top • Hyperparameter tuning: leverage hyperparameters as needed to tradeoff recall/latency • Rank fusion (hybrid, re-ranker, HyDE): Combining multiple ranking signals for improved accuracy • Result Diversity: Balancing the variety and relevance of search results • Diversification strategies: Introducing variety while maintaining relevance • Document-level vs. Chunk-level search: Considering the impact of chunking long documents • More focused and relevant results from individual chunks (good or bad? -> depends on task) • Top results may all belong to the same document, reducing result diversity (good or bad? -> depends on task) • Search Algorithm Adaptability: Customizing search behavior based on the task at hand • Task-oriented search: Adjusting search algorithms for specific tasks or user requirements • Evaluation and Iteration: Continuously refining search relevance strategies • Monitoring search performance metrics (e.g., precision, recall, user satisfaction) • Adjusting techniques based on observed results and user feedback Common Challenges - Scalability - Preprocessing - Splitting/Chun king - Embedding management - Query understanding - Query flexibility - Ranking accuracy - Result Diversity - Search algorithm
retrieval augmented generation for LLM Apps Images Audio Video Graphs Text • Leverage data from any data store • Improve relevancy • Query across multiple types of data • Quickly search through large data sets • Deploy with enterprise-grade security • Easily scale with changing workloads • Build retrieval plugins for OpenAI's ChatGPT using Azure OpenAI service
ANN Search Integration • Hybrid search allows you to take advantage of multiple scoring algorithms such as BM25 and ANN vector similarity so you can get the benefits of both keyword search and semantic search
BM25 and ANN Search Integration • Reciprocal Rank Fusion (RRF): A technique for combining the results of multiple search strategies, resulting in improved search relevance and ranking • Azure Cognitive Search incorporates RRF by merging the ranked results of BM25 and ANN search, allowing the best features of both methods to contribute to the final search relevance https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf
No Additional Cost No additional cost for using Vector search! Note: Cognitive Search does not generate embeddings out of the box, therefore, you are responsible for the cost of generating your embeddings. You only pay for the storage of the vectors.
Your Vector Index Size Needs For more information on how to calculate Vector index size, please visit Vector index size limit - Azure Cognitive Search | Microsoft Learn Tier Vector index size GB/partition Max index size for a service (GB) Basic 1 1 S1 3 36 S2 12 144 S3 36 432 L1 12 144 L2 36 432
sources Gather your data sources Generate document embeddings Generate embeddings for your data using you own model Add to your index Insert your vectors into your search index as a collection of floats via the Push API or the Indexer via a Custom Embedding Skill Create your vector configuration Configure your algorithm, similarity function, and parameters Generate query embedding Generate an embedding for your query using the same model as your docs Search using vectors Search your index using a vector representation of your data
Index to Calculate Your Production Requirements ✓ Perform a PoC ✓ Index a representative sample of your production workload using the desired schema ✓ Sample to Production Ratio ✓ Calculate the ratio between the sample index size and raw data source size to estimate the corresponding production index size and data source size¥ ✓ Analyze and Adjust ✓ Consider the estimated number of documents, index size, and data source size for your production workload, and add a buffer for future growth ✓ Choose the Right SKU ✓ Use the estimated production index size and required number of partitions and replicas to determine the appropriate Azure Cognitive Search SKU ✓ Estimate Monthly Cost ✓ Use the pricing calculator to estimate the SKU cost Note: This estimation is for Azure Cognitive Search service costs only and does NOT include the cost of generating embeddings or other AI Enrichment features. For a more back-of-envelope calculation, see Vector index size limit - Azure Cognitive Search | Microsoft Learn
custom skills to generate embeddings and process data during indexing Streamlined Indexing “Push” Manual Data Ingestion giving you full control over the indexing process Quick and easy to get started High Flexibility Getting Data into Your Search Index Comparing Push and Pull Approaches for Indexing
Small World • Performance-Optimized - HNSW is designed to offer a high-performance, memory-efficient solution for approximate nearest neighbor search in high-dimensional spaces • HNSW creates a multi-layer graph structure that enables fast search for nearest neighbors in high-dimensional data • Customize HNSW's behavior by adjusting key parameters for optimal performance and accuracy • Key Parameters: • "m": Controls the degree of the graph, affecting search speed and accuracy • "ef_construction": Influences the index construction time and quality • "ef_search": Determines the search time and accuracy trade-off • "metric": Specifies the distance function used, such as "cosine"
between Recall, Latency, and Indexing 1. Increase 'ef_search' to improve recall without reindexing; monitor for potential latency increases. 2. If increasing 'ef_search' isn't effective or causes high latency, consider reindexing with higher values of ‘m' and/or 'ef_construction'. 3. Enhance the quality of the HNSW graph by increasing 'ef_construction', keeping in mind it may result in longer indexing latency. 4. Carefully increase the ‘m' value only if other parameters don't sufficiently improve recall after trying previous steps.
Definition I want to create vector field types that will be supported in my nearest neighbor search. { "name": “contentVector", "type": "Collection(Edm.Single)", "dimensions": 1536, “vectorSearchProfile": "my-vector-profile", "searchable": true, "retrievable": true | false, "filterable": false, "sortable": false, "facetable": false }
to exhaustively search all the vectors in my index to find the ground-truth values. Alternatively, you can use this with smaller index sizes. { "vectorQueries": [ { "kind": "text", "text": “healthy foods", "fields": "vector" } ], "exhaustive": "true", }
to use vectors with pre- filtering, so that I can limit the number of matched documents. I also want to use query vectorization. { "vectorQueries": [ { "kind": "text", "text": “healthy foods", "fields": "vector" } ], "vectorFilterMode": "preFilter", "filter": “category eq ‘fruits’" }
to only use query vectorizer and to do a vector search and rank my search results by cosine similarity score, so that I get the full user intent of my search results. { "vectorQueries": [ { "kind": "text", "text": “healthy foods", "fields": "vector" } ], }
want to only use query vectorizer and to do a vector search and rank my search results by cosine similarity score, so that I get the full user intent of my search results. { "vectorQueries": [ { "kind": "vector", "vector": [1, 2, 3], "fields": "vector" } ], }
use vectors and search over multiple fields so that I can leverage multiple vector fields into my similarity function. { "vectorQueries": [ { "kind": "vector", "vector": [1, 2, 3], "fields": “titlevector, contentVector" } ], }
hybrid search (text + vectors) so that I can leverage both vectors and keywords for my search relevance. { "vectorQueries": [ { "kind": "text", "text": "healthy foods", "fields": "vector" } ], "search": " healthy foods" }
multi-vector queries to pass in two different query embeddings for my multi-modal search use case using CLIP. { "vectorQueries": [ { "kind": "text", "text": "yummy vanilla ice cream", "fields": "textVector" }, { "kind": "text", "text": "vanilla.png", "fields": "imageVector" } ], } City eq 'New York
want to use hybrid search (text + vectors) so that I can take advantage of vectors, keywords, and Semantic search capabilities such as captions and answers. { "vectorQueries": [ { "kind": "text", "text": "healthy foods", "fields": "vector" } ], "search": “healthy foods” "semanticConfiguration": “config" "queryType": "semantic" "answers": "extractive" "captions": "extractive" } City eq 'New York