Slide 1

Slide 1 text

Azure AI Search 概要資料 https://aka.ms/mfs_discord https://aka.ms/daka_linkedin https://aka.ms/daka_x https://aka.ms/daka_qiita

Slide 2

Slide 2 text

Azure AI Searchの一般的な用途 Workplace Search 内部チームがデータベースやファイルを探索 するのを助ける • 効率と生産性を向上させる • データアクセスを強化する • 意思決定を改善する SaaS Search 顧客向けの市場対応アプリケー ションを構築する • ユーザーエクスペリエンスを向上 させる • 開発時間を短縮する eCommerce 顧客が商品やサービスを見つけて購入 するのを手助けする • パーソナライズされたレコメンドを提供 する • ユーザーエクスペリエンスを改善する • 製品発見を強化する • コンバージョン率を増加させる Website Search 訪問者が情報を迅速かつ容易に見 つけられるよう支援する • 見つけやすさを向上させる • ユーザーの行動とニーズをより良 く理解する

Slide 3

Slide 3 text

Azure AI Search  プラットフォーム・アズ・ア・サービス セマンティック検索 管理不要 キーワード検索 ファセティング 言語分析 地理空間サポート サジェスチョン/オートコンプリート カスタマイズ可能なスコアリング 近接検索 同義語 認知スキル など

Slide 4

Slide 4 text

スペルミス 地理空間クエリ フィルターとファセット スニペットとハイライト 提案と自動補完 ランキング ページング

Slide 5

Slide 5 text

Azure AI Search 機能豊富な ベクトルデータベース あらゆるデータタイプを、 どんなソースからでも 取り込む シームレスなデータ およびプラット フォーム統合 最先端の 検索ランキング エンタープライズ 対応の基盤 Generally available Public preview Generally available ベクトル検索 Azure AI Search in Azure AI Studio セマンティックランカー 統合されたベクトル化 Generative AI での用途

Slide 6

Slide 6 text

Azure AI Searchにおけるベクトル検索 機能豊富でエンタープライズ対応

Slide 7

Slide 7 text

Azure AI Searchにおけるベクトル検索  包括的なベクトル検索ソリューション  エンタープライズ対応  → スケーラビリティ、セキュリティ、コンプライアン ス  Semantic Kernel, LangChain, LlamaIndex, Azure OpenAI Service, Azure AI Studioなどと統合済み Generally available

Slide 8

Slide 8 text

ベクトル検索戦略 ANN search  スケールでの高速ベクトル検索  優れたパフォーマンス・リコールプロファイルを 持つグラフ手法のHNSWを使用  インデックスパラメーターの細かい制御が可 能 Exhaustive KNN search  クエリごと、またはスキーマに組み込まれてる  リコールベースラインを作成するのに便利  高度に選択的なフィルターを使用するシナリオ  例:密集したマルチテナントアプリケーション r = search_client.search( None, top=5, vector_queries=[RawVectorQuery( vector=search_vector, k=5, fields="embedding")]) r = search_client.search( None, top=5, vector_queries=[RawVectorQuery( vector=search_vector, k=5, fields="embedding", exhaustive=True)])

Slide 9

Slide 9 text

リッチなベクトル検索クエリ機能 フィルター付きベクトル検索 日付範囲、カテゴリ、地理的距離などに対応  豊かなフィルター表現 事前/事後フィルタリング  事前フィルター:選択的なフィルターに適しており、リコールの乱れが ありません  事後フィルター:選択性の低いフィルターには適していますが、結果 が空にならないよう注意が必要です r = search_client.search( None, top=5, vector_queries=[RawVectorQuery( vector=query_vector, k=5, fields="embedding")], vector_filter_mode=VectorFilterMode.PRE_FILTER, filter= "category eq 'perks' and created gt 2023-11-15T00:00:00Z") r = search_client.search( None, top=5, vector_queries=[ RawVectorQuery( vector=query1, k=5, fields="embedding"), RawVectorQuery( vector=query2, k=5, fields="embedding") ]) マルチベクトルシナリオ  文書ごとに複数のベクトルフィールド  マルチベクトルクエリ  必要に応じて組み合わせ可能

Slide 10

Slide 10 text

エンタープライズ対応のベクトルデータベース データ暗号化 顧客管理の暗号化キーのオプションを含む セキュアな認証 管理されたアイデンティティとRBACのサポート ネットワークの隔離 プライベートエンドポイント、仮想ネットワーク コンプライアンス認証 金融、医療、政府など、幅広い分野での広範な認証

Slide 11

Slide 11 text

テキストだけではない  画像、音声、グラフなど マルチモーダル埋め込み - 例:Azure AI Visionでの画像+文章 既存のベクトル → ベクトル検索が適用される GPT-4 Turbo with Visionを使った画像付きRAG

Slide 12

Slide 12 text

Azure AI Search: シームレスなデータおよびプラットフォーム統合

Slide 13

Slide 13 text

RAGアプリケーションのためのデータ準備 Chunking  チャンキング  長文テキストを短いパッセージに分割する  LLMのコンテキスト長の制限  コンテンツの焦点を絞ったサブセット  複数の独立したパッセージ  Basics  パッセージあたり約200~500トークン  語彙の境界を維持する  オーバーラップを導入する  Layout  レイアウト情報は価値がある、例えば、表 ベクトル化  インデックス作成時:パッセージをベクトルに変換 クエリ時:クエリをベクトルに変換

Slide 14

Slide 14 text

Azure AI Studio & Azure AI SDK  ファーストクラスの統合  Blobストレージ、Microsoft Fabricなどの データからインデックスを構築する。  既存のAzure AI Searchインデックスにア タッチする。

Slide 15

Slide 15 text

統合ベクトル化 RAGに合わせたエンドツーエンドのデータ処理 データソース アクセス • Blob Storage • ADLSv2 • SQL DB • CosmosDB • … + インクリメンタル変 更追跡 ファイル形式の 解析 • PDFs • Office documents • JSON files • … + 画像とテキストの 抽出、必要に応 じてOCR チャンキング • テキストをパッセー ジに分割 • ドキュメントのメタ データを伝播 ベクトル化 • チャンクをベクトル に変換 • OpenAIの埋め込 みまたはあなたの カスタムモデル インデックス作成 • ドキュメントインデックス • チャンクインデックス • 両方 In preview

Slide 16

Slide 16 text

Azure AI Search: 最先端の検索システム

Slide 17

Slide 17 text

Semantic ranker SOTAリランキングモデル 最高性能の検索モード 新しい従量課金制の価格設定:月1,000リクエスト無 料、追加1,000リクエストごとに$1 多言語対応 抽出型回答、キャプション、ランキングを含む Generally available *Formerly semantic search

Slide 18

Slide 18 text

関連性  RAGアプリにとって関連性は重要です。  プロンプト内の多数のパッセージ →品質の低下→リコールだけに焦点を当てること はできません  プロンプト内の不正確なパッセージ →おそらく根拠はしっかりしているが間違った回答 になる可能性 →「十分に良い」根拠データの閾値を設定するの に役立ちます Source: Lost in the Middle: How Language Models Use Long Contexts, Liu et al. arXiv:2307.03172 50 55 60 65 70 75 5 10 15 20 25 30 Accuracy Number of documents in input context

Slide 19

Slide 19 text

関連性の向上 すべての情報検索のトリックが適用されます 完全な検索スタックがより良い結果を出します:  ハイブリッド検索(キーワード+ベクトル)>純粋なベクトルまた はキーワード検索  ハイブリッド+リランキング>ハイブリッド 良い候補と悪い候補を特定する  セマンティックランカーからの正規化されたスコア  閾値以下のドキュメントを除外する Vector Keywords Fusion (RRF) Reranking

Slide 20

Slide 20 text

0 10 20 30 40 50 60 70 80 Customer datasets [NDCG@3] Beir [NDCG@10] Miracl [NDCG@10] Keyword Vector (ada-002) Hybrid (Keyword + Vector) Hybrid + Semantic ranker メソッドによる情報検索の関連性 Retrieval comparison using Azure AI Search in various retrieval modes on customer and academic benchmarks Source: Outperforming vector search with hybrid + reranking

Slide 21

Slide 21 text

クエリの種類が関連性に与える影響 Source: Outperforming vector search with hybrid + reranking Query type Keyword [NDCG@3] Vector [NDCG@3] Hybrid [NDCG@3] Hybrid + Semantic ranker [NDCG@3] Concept seeking queries 39 45.8 46.3 59.6 Fact seeking queries 37.8 49 49.1 63.4 Exact snippet search 51.1 41.5 51 60.8 Web search-like queries 41.8 46.3 50 58.9 Keyword queries 79.2 11.7 61 66.9 Low query/doc term overlap 23 36.1 35.9 49.1 Queries with misspellings 28.8 39.1 40.6 54.6 Long queries 42.7 41.6 48.1 59.4 Medium queries 38.1 44.7 46.7 59.9 Short queries 53.1 38.8 53 63.9

Slide 22

Slide 22 text

Retrieval-augmented generation (RAG)

Slide 23

Slide 23 text

今週に関連する「Falcon Climate Finance」についてのTeamsメッセージを 見つける 1 結果を表示し、 参照を伝播させる 3 プロンプトを作成する: 指示 コンテキスト 取得したコンテンツ 2

Slide 24

Slide 24 text

Large Language Model 検索システム あなたの カスタム Copilot データソース (files, databases, etc.) RAG – Retrieval Augmented Generation

Slide 25

Slide 25 text

RAGを高度な検索機能で強化 最先端の検索技術に投資して、結果を向上 R A G 検索システム(Retriever)の 品質が重要です Azure AI Searchは、以下を通じて最高の検索ソ リューションを提供することに尽力しています: - ベクター検索機能 - ハイブリッド検索 - 高度なフィルタリング - ドキュメントセキュリティ - L2再ランキング/最適化 - 組み込みのチャンキング - 自動ベクトル化 - その他多くの機能!

Slide 26

Slide 26 text

例 RAGアプリケーションのための堅牢な検索機能  取得されたデータの良さがレスポンスの質を決めます  キーワード検索のリコールの課題  「語彙のギャップ」  自然言語の質問ではさらに精度が下がる  ベクトルベースの検索は、意味的類似性によって文書を 見つけます  概念の表現方法の変化に強い(単語の選択、形態、特異性 など) Question: 「水中活動に関するレッスンを探して います」 Won’t match: 「スキューバダイビングのクラス」 「シュノーケリングのグループセッション」

Slide 27

Slide 27 text

ベクトルとベクトルデータベース

Slide 28

Slide 28 text

ベクトル 学習されたベクトル表現  アイテムをベクトルにエンコードするモデル  類似したアイテムは近いベクトルにマッピン グされる  文章、画像、グラフなど ベクトル検索  「クエリ」ベクトルを与えられて、最も近い K個のベクトルを見つける  徹底的に検索するか、近似値を通じて 検索する

Slide 29

Slide 29 text

ベクトルデータベース  大規模にベクトルとメタデータを耐 久的に保存し、インデックスを作成 する  様々なインデックス作成と検索戦 略  ベクトルクエリをメタデータフィルター と組み合わせる  アクセス制御を可能にする

Slide 30

Slide 30 text

Vectors in Azure databases データをそのまま保持する: ネイティブベクター検索機能 Azure Cosmos DB MongoDB vCore および Azure Cosmos DB for PostgreSQL サービスに 組み込まれています Azure AI Search 最高の関連性: 最高品質の結果 Azureデータソースから自動的にデータを インデックス化: SQL DB、Cosmos DB、Blobストレージ、 ADLSv2など Azureにおけるベクターデータベース

Slide 31

Slide 31 text

RAG at scale Azure AI Searchを活用して、巨大 でミッションクリティカルなRAGワーク ロードを強化

Slide 32

Slide 32 text

Presentation resources Repos Azure Cognitive Search as a vector database for OpenAI embeddings | OpenAI Cookbook Azure Cognitive Search — 🦜🔗 LangChain 0.0.198 Azure Cognitive Search - LlamaIndex 🦙 0.8.46 (gpt-index.readthedocs.io) Azure-Samples/azure-search-openai- demo (github.com) Azure-Samples/chat-with-your-data- solution-accelerator: A Solution Accelerator for the RAG pattern running in Azure, using Azure Cognitive Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences. This includes most common requirements and best practices. (github.com) Azure-Samples/azure-search-comparison- tool: A demo app showcasing Vector Search using Azure Cognitive Search, Azure OpenAI for text embeddings, and Azure AI Vision for image embeddings. (github.com) Docs Vector search - Azure Cognitive Search | Microsoft Learn Azure/cognitive-search-vector- pr: Private repository for the Vector search feature in Azure Cognitive Search. (github.com) Azure OpenAI Service - Documentation, quickstarts, API reference - Azure Cognitive Services | Microsoft Learn Image Retrieval concepts - Image Analysis 4.0 - Azure Cognitive Services | Microsoft Learn Azure Cognitive Search: Outperforming vector search with hybrid retrieval and ranking capabilities - Microsoft Community Hub

Slide 33

Slide 33 text

次ページ以降は参考情報 / English Only ・Understanding Vector Search ・Vector Search in Azure AI Search

Slide 34

Slide 34 text

How might we take our enterprise search or RAG scenarios to the next level?

Slide 35

Slide 35 text

Understanding Vector Search

Slide 36

Slide 36 text

Let’s review the basics! Traditional information retrieval  Query: Formal statement representing your information need (e.g., search string)  Object: Entity within your content collection (e.g., document, image, audio)  Relevance: Quantitative measure of how well an object satisfies the intent of the query  Ranking: Ordered list of relevant results based on their desirability or relevance score

Slide 37

Slide 37 text

Let’s review the basics! Search via Inverted Indexes Document 1: “apple orange banana” Document 2: “orange apple grape” Document 3: “banana grape apple” Term Freq Documents apple 3 1, 2, 3 orange 2 1, 2 banana 2 1, 3 grape 2 2, 3 Dictionary Postings Lists

Slide 38

Slide 38 text

Let’s review the basics! Ranking & Relevance in Traditional Search  Relevance via Boolean Search: Retrieve documents containing specific terms (e.g., "apples" AND "oranges")  Ranking via BM25: A ranking algorithm influenced by 3 key factors:  Term Frequency (TF): More occurrences of the search term indicate higher relevance  Inverse Document Frequency (IDF): The rarer a term across documents, the more important it is  Field Length: Terms found in shorter fields (fewer words) are more likely to be relevant than terms in longer fields (more words) BM25 formula

Slide 39

Slide 39 text

Combining Lexical and Semantic Representations for Optimal Recall Leveraging the Strengths of Lexical and Semantic Approaches in Retrieval  Discrete (Lexical) Representations  Advantages:  Exact matching  Precise control and easy explainability  Limitations:  Struggle to capture nuances in language  Limited understanding of conceptual similarity  Dense (Vector/Semantic) Representations  Advantages:  Capture conceptual similarity  Better understanding of language nuances  Limitations:  Not built to match exact terms  Reduced explainability compared to discrete Bottom Line: Achieve optimal recall by leveraging the strengths of both discrete and semantic representations for a comprehensive understanding of language

Slide 40

Slide 40 text

Search AI Better information in the context Better representations of data (embeddings) Search + AI Better Together

Slide 41

Slide 41 text

Vector Search at a High Level A diverse collection of books, each containing unique insights and knowledge Scenario Finding a book on a specific topic or theme can be time- consuming and overwhelming, especially when the content is scattered. The Challenge A skilled librarian can quickly connect you to books with similar topics or themes The Solution

Slide 42

Slide 42 text

Vector Search: Deeper Dive Organizations need efficient methods to retrieve semantically similar items from large-scale data sources. Scenario Sifting through large-scale databases to find related items can be resource- intensive and time- consuming. The Challenge By calculating similarity metrics like cosine similarity, vector search organizes data and retrieves semantically similar items within the high-dimensional space. The Solution

Slide 43

Slide 43 text

How can I create vector representations of my data?

Slide 44

Slide 44 text

Embeddings: Convert Data into Vector Representation Simplifying complex data structures for efficient analysis and processing in various applications. Definition  Abstract, dense, compact, learned numerical representations of data  Map complex structures into simpler, fixed-size vectors  Applicable to diverse data types (text, images, audio, etc.) Purpose  Facilitate analysis and processing of diverse data types  Enable similarity measurement, clustering, and classification  Power applications like Vector search and recommendation systems Benefits  Efficient search and organization of vast datasets  Improved accuracy and relevance of search results  Scalable and adaptable to various industries and use cases

Slide 45

Slide 45 text

Why do I need embeddings? Vectors are a universal representation of data 3.4MB 4.1MB 1.1GB 0.1912 0.4123 . . . 0.9128

Slide 46

Slide 46 text

©Microsoft Corporation Azure Find relevant objects with embeddings Convert data vectors (embeddings) and find the most similar objects according to metric App/UX Images Audio Video Text Transform using embedding model Vector Representation Vector index Vector Representation Transform using embedding model -2, -1 , 0, 1 2, 3, 4, 5 6, 7, 8, 9 Results 2, 3, 4, 5 Data Sources ...and more! Azure Cognitive Search

Slide 47

Slide 47 text

Choosing an Embedding Model Key Factors for Selecting the Optimal Model for Your Use Case Model Characteristics  Task Specificity  Performance  Context Awareness  Model Size and Inference Speed  Language Support  Customizability (ability to fine-tune) Implementation Considerations  Training Time and Complexity  Pre-Trained Models  Integration  Community Support and Updates  Cost We recommend Azure OpenAI sercice “text-embedding-ada-002” for text embeddings We recommend Azure AI Vision Image Retrieval API for image embeddings

Slide 48

Slide 48 text

Approximate Nearest Neighbor Search Efficient and Scalable Similarity Search with AI Definition  A fast search method for finding approximate nearest neighbors in high- dimensional spaces Purpose  Applicable in image search, NLP, recommendation systems Benefits  Provide faster search results in high-dimensional data by trading off a small degree of recall for significant performance gains compared to exhaustive vector search Vector indexes are data structures that let us perform approximate nearest neighbor search

Slide 49

Slide 49 text

Similarity Metrics  Cosine similarity  Dot Product  Euclidean distance  Angular  Jaccard  Many more! https://platform.openai.com/docs/guides/embeddings/which-distance-function-should-i-use

Slide 50

Slide 50 text

Common Approximate Nearest Neighbor Algorithms Exploring AI Techniques for Efficient Similarity Search HNSW (Hierarchical Navigable Small World)  Hierarchical graph structure  Fast search performance FLAT (Brute Force)  Exhaustive search in high-dimensional data  Slower but highly accurate LSH (Locality-Sensitive Hashing)  Hash-based similarity search  Trade-off between speed and accuracy IVF (Inverted File Index)  Reduces search space using quantization  Scalable and memory-efficient https://github.com/erikbern/ann-benchmarks When productionizing, it doesn’t matter what algorithm you chose, but rather the information retrieval system it’s part of.

Slide 51

Slide 51 text

Limitations of common OSS ANN indexes  ANN indexes on their own, are simply just a data structure. • Limited Scalability: Struggle with vertical and horizontal scaling in large-scale data sets • Memory Constraints: High memory consumption affecting search performance and resource efficiency • Persistence and Durability: Lack of built-in mechanisms for data storage, recovery, and metadata management • Simplistic Query Support: Limited capabilities for combining sparse and dense retrieval methods • Hosting Challenges: Complex setup and hosting requirements • No Built-in Security Features: Open-source solutions often lack advanced security features • Embedding Management: Limited support for managing the embedding functions themselves

Slide 52

Slide 52 text

Dump a Bunch of Data Run a Query Get the Most Relevant Data Back The Dream Scenario for Vector Search Effortless Data Management and Relevant Search Results Common Challenges - Scalability - Preprocessing - Splitting/Chunking - Embedding management - Query understanding - Query flexibility - Ranking accuracy - Result Diversity - Search algorithm

Slide 53

Slide 53 text

Scalability in Vector Search Key Questions and Considerations for Efficient Scaling • Data Volume: Can the system handle increasing amounts of data? • Storage capacity and management • Indexing and search performance • Query Load: How well does the system respond to growing query demands? • Query execution speed and response times • Handling concurrent queries and user connections • Distributed Infrastructure: Does the system support distributed and parallel processing? • Horizontal scaling across multiple nodes • Load balancing and fault tolerance • Cost Efficiency: How does the system optimize resource usage and cost management? • Balancing performance and cost requirements • Efficient use of hardware and cloud resources Common Challenges - Scalability - Preprocessing - Splitting/Chun king - Embedding management - Query understanding - Query flexibility - Ranking accuracy - Result Diversity - Search algorithm

Slide 54

Slide 54 text

Preprocessing and Document Chunking Optimizing Data Preparation for Efficient Vector Search • Text Preprocessing: Ensuring clean and structured data for the embedding model • Tokenization (or segmentation): Breaking text into words, phrases, or symbols • Lowercasing and normalization: Standardizing text representation • Stopword removal: Eliminating common words with little semantic value • Stemming and lemmatization: Reducing words to their root forms • Document Splitting: Adapting documents to fit within embedding model limits • Chunking: Dividing long documents into smaller, manageable sections • Passage extraction: Identifying and retaining meaningful segments • Overlap management: Ensuring continuity and context preservation • Model Compatibility: Preparing data to align with the chosen embedding model • Input requirements: Adhering to model-specific formatting and length constraints • Vocabulary coverage: Maximizing the overlap between document vocabulary and model vocabulary • Evaluation and Iteration: Continuously improving preprocessing and splitting strategies • Performance monitoring: Assessing the impact of preprocessing and splitting on search quality • Strategy refinement: Adjusting techniques based on observed results and user feedback Common Challenges - Scalability - Preprocessing - Splitting/Chun king - Embedding management - Query understanding - Query flexibility - Ranking accuracy - Result Diversity - Search algorithm

Slide 55

Slide 55 text

Challenge of Embedding Management Overcoming Embedding Management in Vector Search • Embedding Quality: Ensuring high-quality and accurate vector representations • Selecting appropriate embedding models (e.g., OpenAI, BERT) • Fine-tuning models for domain-specific vocabulary and context • Dimensionality: Balancing embedding size and search performance • Reducing dimensions while retaining semantic information • Implementing dimensionality reduction techniques (e.g., PCA, t-SNE) • Indexing and Storage: Efficiently managing and storing embeddings • Using optimized data structures for quick look-up and retrieval (e.g., Approximate Nearest Neighbors) • Embedding Updates: Keeping vector representations up-to-date with evolving data • Incremental updates to embeddings based on new or updated documents • Periodic model retraining for continuous improvement and/or model version updating • Evaluation and Iteration: Continuously assessing and refining embedding management strategies • Monitoring performance metrics (e.g., search relevance, recall, precision) • Adjusting techniques based on observed results and user feedback Common Challenges - Scalability - Preprocessing - Splitting/Chun king - Embedding management - Query understanding - Query flexibility - Ranking accuracy - Result Diversity - Search algorithm

Slide 56

Slide 56 text

Addressing the Query Language Challenge Enhancing Vector Search Through Improved Query Understanding • Beyond Similarity: Addressing complex search scenarios beyond "most similar documents" • Understanding user intent: Identifying specific search goals and requirements • Query Flexibility: Supporting various search parameters and filters • Boolean operators: Handling AND, OR, and NOT conditions • Filtering and Faceting: Allowing users to filter results based on specific attributes • Query Transformation: Converting user queries into vector representations • Text-to-vector conversion: Transforming query text into compatible embeddings • Query expansion: Incorporating additional keywords or phrases to improve search relevance • Evaluation and Iteration: Continuously refining query language understanding • Monitoring query performance metrics (e.g., query success rate, user satisfaction) • Adjusting techniques based on observed results and user feedback Common Challenges - Scalability - Preprocessing - Splitting/Chun king - Embedding management - Query understanding - Query flexibility - Ranking accuracy - Result Diversity - Search algorithm

Slide 57

Slide 57 text

Enhancing Search Relevance in Vector Search Achieving Accurate Ranking, Result Diversity, and Adaptability • Ranking Accuracy: Ensuring highly relevant results are ranked at the top • Hyperparameter tuning: leverage hyperparameters as needed to tradeoff recall/latency • Rank fusion (hybrid, re-ranker, HyDE): Combining multiple ranking signals for improved accuracy • Result Diversity: Balancing the variety and relevance of search results • Diversification strategies: Introducing variety while maintaining relevance • Document-level vs. Chunk-level search: Considering the impact of chunking long documents • More focused and relevant results from individual chunks (good or bad? -> depends on task) • Top results may all belong to the same document, reducing result diversity (good or bad? -> depends on task) • Search Algorithm Adaptability: Customizing search behavior based on the task at hand • Task-oriented search: Adjusting search algorithms for specific tasks or user requirements • Evaluation and Iteration: Continuously refining search relevance strategies • Monitoring search performance metrics (e.g., precision, recall, user satisfaction) • Adjusting techniques based on observed results and user feedback Common Challenges - Scalability - Preprocessing - Splitting/Chun king - Embedding management - Query understanding - Query flexibility - Ranking accuracy - Result Diversity - Search algorithm

Slide 58

Slide 58 text

How do I deal with these common challenges?

Slide 59

Slide 59 text

Vector Search in Azure AI Search

Slide 60

Slide 60 text

Introducing vector search in Azure AI Search Revolutionize indexing and retrieval augmented generation for LLM Apps Images Audio Video Graphs Text • Leverage data from any data store • Improve relevancy • Query across multiple types of data • Quickly search through large data sets • Deploy with enterprise-grade security • Easily scale with changing workloads • Build retrieval plugins for OpenAI's ChatGPT using Azure OpenAI service

Slide 61

Slide 61 text

©Microsoft Corporation Azure What is vector search? Convert data into vector representations where distances represent similarity App/UX Images Audio Video Text Transform into embeddings Vector Representation Approximate Nearest Neighbor Vector Representation Transform into embedding -2, -1 , 0, 1 2, 3, 4, 5 6, 7, 8, 9 Results 2, 3, 4, 5 Data Sources ...and more! Azure AI Search

Slide 62

Slide 62 text

©Microsoft Corporation Azure Full-text search (BM25) Pure Vector search (ANN) Hybrid search (BM25 + ANN) Exact keyword match Proximity search Term weighting Semantic similarity search Multi-modal search Multi-lingual search Retrieval Modes Vector search is good, but Hybrid search is even better!

Slide 63

Slide 63 text

Why is Hybrid Search important? Hybrid Queries with BM25 and ANN Search Integration • Hybrid search allows you to take advantage of multiple scoring algorithms such as BM25 and ANN vector similarity so you can get the benefits of both keyword search and semantic search

Slide 64

Slide 64 text

Reciprocal Rank Fusion in Azure AI Search Hybrid Queries with BM25 and ANN Search Integration • Reciprocal Rank Fusion (RRF): A technique for combining the results of multiple search strategies, resulting in improved search relevance and ranking • Azure Cognitive Search incorporates RRF by merging the ranked results of BM25 and ANN search, allowing the best features of both methods to contribute to the final search relevance https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf

Slide 65

Slide 65 text

Cost and Pricing Harness the Power of Vector Search at No Additional Cost No additional cost for using Vector search! Note: Cognitive Search does not generate embeddings out of the box, therefore, you are responsible for the cost of generating your embeddings. You only pay for the storage of the vectors.

Slide 66

Slide 66 text

Scale: OpenAI Text-Embedding-Ada-002 Example Choose the Right Tier to Meet Your Vector Index Size Needs For more information on how to calculate Vector index size, please visit Vector index size limit - Azure Cognitive Search | Microsoft Learn Tier Vector index size GB/partition Max index size for a service (GB) Basic 1 1 S1 3 36 S2 12 144 S3 36 432 L1 12 144 L2 36 432

Slide 67

Slide 67 text

How do I get started with vector search? Ingest data sources Gather your data sources Generate document embeddings Generate embeddings for your data using you own model Add to your index Insert your vectors into your search index as a collection of floats via the Push API or the Indexer via a Custom Embedding Skill Create your vector configuration Configure your algorithm, similarity function, and parameters Generate query embedding Generate an embedding for your query using the same model as your docs Search using vectors Search your index using a vector representation of your data

Slide 68

Slide 68 text

Estimating the Right SKU for Your Needs Utilize a PoC Index to Calculate Your Production Requirements ✓ Perform a PoC ✓ Index a representative sample of your production workload using the desired schema ✓ Sample to Production Ratio ✓ Calculate the ratio between the sample index size and raw data source size to estimate the corresponding production index size and data source size¥ ✓ Analyze and Adjust ✓ Consider the estimated number of documents, index size, and data source size for your production workload, and add a buffer for future growth ✓ Choose the Right SKU ✓ Use the estimated production index size and required number of partitions and replicas to determine the appropriate Azure Cognitive Search SKU ✓ Estimate Monthly Cost ✓ Use the pricing calculator to estimate the SKU cost Note: This estimation is for Azure Cognitive Search service costs only and does NOT include the cost of generating embeddings or other AI Enrichment features. For a more back-of-envelope calculation, see Vector index size limit - Azure Cognitive Search | Microsoft Learn

Slide 69

Slide 69 text

“Pull”  Automated data ingestion using our Indexer  Utilize custom skills to generate embeddings and process data during indexing  Streamlined Indexing “Push”  Manual Data Ingestion giving you full control over the indexing process  Quick and easy to get started  High Flexibility Getting Data into Your Search Index Comparing Push and Pull Approaches for Indexing

Slide 70

Slide 70 text

Deep Dive into HNSW Algorithm ”Performance-Optimized” Search with Hierarchical Navigable Small World • Performance-Optimized - HNSW is designed to offer a high-performance, memory-efficient solution for approximate nearest neighbor search in high-dimensional spaces • HNSW creates a multi-layer graph structure that enables fast search for nearest neighbors in high-dimensional data • Customize HNSW's behavior by adjusting key parameters for optimal performance and accuracy • Key Parameters: • "m": Controls the degree of the graph, affecting search speed and accuracy • "ef_construction": Influences the index construction time and quality • "ef_search": Determines the search time and accuracy trade-off • "metric": Specifies the distance function used, such as "cosine"

Slide 71

Slide 71 text

Tuning HNSW Parameters for Optimal Performance Striking the Right Balance between Recall, Latency, and Indexing 1. Increase 'ef_search' to improve recall without reindexing; monitor for potential latency increases. 2. If increasing 'ef_search' isn't effective or causes high latency, consider reindexing with higher values of ‘m' and/or 'ef_construction'. 3. Enhance the quality of the HNSW graph by increasing 'ef_construction', keeping in mind it may result in longer indexing latency. 4. Carefully increase the ‘m' value only if other parameters don't sufficiently improve recall after trying previous steps.

Slide 72

Slide 72 text

Search Configuration Customer datasets [NDCG@3] Beir [NDCG@10] Multilingual Academic (MIRACL) [NDCG@10] Keyword 40.6 40.6 49.6 Vector (Ada-002) 43.8 45.0 58.3 Hybrid (Keyword + Vector) 48.4 48.4 58.8 Hybrid + Semantic ranker 60.1 50.0 72.0 Introducing semantic ranker Outperform vector search with hybrid search + Semantic re-ranking • SOTA re-ranking encoder model • Highest performing retrieval mode • Free 1000 queries/month • Multilingual capabilities • Includes extractive answers, captions, and highlights just like Bing.con

Slide 73

Slide 73 text

Classified as Microsoft Confidential Vector Configuration w/Azure OpenAI service Vectorizer I want to create a vector search configuration so that I can set the appropriate parameters for my search experience. "vectorSearch": { "algorithms": [ { "name": "myHnsw", "kind": "hnsw“ }, { "name": "myExhaustiveKNN", "kind": “exhaustiveKnn“ }, ], "vectorizers": [ { "name": “myAzureOpenAIVectorizer", "kind": "azureOpenAI", "azureOpenAIParameters": { "resourceUri" : "https://my-openai.openai.azure.com", "apiKey" : “xxx", "deploymentId" : "text-embedding-ada-002" } }, ], "profiles": [ { "name": "myHnswProfile", "algorithm": "myHnsw", "vectorizer":"myAzureOpenAIVectorizer" } ] },

Slide 74

Slide 74 text

Classified as Microsoft Confidential Vector Configuration w/Custom Vectorizer I want to create a vector search configuration so that I can set the appropriate parameters for my search experience. "vectorSearch": { "algorithms": [ { "name": "myHnsw", "kind": "hnsw“ }, { "name": "myExhaustiveKNN", "kind": “exhaustiveKnn“ }, ], "vectorizers": [ { "name": “myCustomVectorizer", "kind": “customWebApi", “customVectorizerParameters": { “authIdentity" : “user-assigned", “httpHeaders" : “application/json", “httpMethod" : “POST“, “uri" : “https://my-custom-embedding-model.azure.com" } }, ], "profiles": [ { "name": "myHnswProfile", "algorithm": "myHnsw", "vectorizer":"myAzureOpenAIVectorizer" } ] },

Slide 75

Slide 75 text

Classified as Microsoft Confidential Configure Vector fields in your Index Definition I want to create vector field types that will be supported in my nearest neighbor search. { "name": “contentVector", "type": "Collection(Edm.Single)", "dimensions": 1536, “vectorSearchProfile": "my-vector-profile", "searchable": true, "retrievable": true | false, "filterable": false, "sortable": false, "facetable": false }

Slide 76

Slide 76 text

Classified as Microsoft Confidential Pure vector search (Exhaustive) I want to exhaustively search all the vectors in my index to find the ground-truth values. Alternatively, you can use this with smaller index sizes. { "vectorQueries": [ { "kind": "text", "text": “healthy foods", "fields": "vector" } ], "exhaustive": "true", }

Slide 77

Slide 77 text

Classified as Microsoft Confidential Vectors search with Filters I want to use vectors with pre- filtering, so that I can limit the number of matched documents. I also want to use query vectorization. { "vectorQueries": [ { "kind": "text", "text": “healthy foods", "fields": "vector" } ], "vectorFilterMode": "preFilter", "filter": “category eq ‘fruits’" }

Slide 78

Slide 78 text

Classified as Microsoft Confidential Pure Vector search w/Vectorizer I want to only use query vectorizer and to do a vector search and rank my search results by cosine similarity score, so that I get the full user intent of my search results. { "vectorQueries": [ { "kind": "text", "text": “healthy foods", "fields": "vector" } ], }

Slide 79

Slide 79 text

Classified as Microsoft Confidential Pure Vector search w/Raw Vector I want to only use query vectorizer and to do a vector search and rank my search results by cosine similarity score, so that I get the full user intent of my search results. { "vectorQueries": [ { "kind": "vector", "vector": [1, 2, 3], "fields": "vector" } ], }

Slide 80

Slide 80 text

Classified as Microsoft Confidential Cross-Field Vector Query I want to use vectors and search over multiple fields so that I can leverage multiple vector fields into my similarity function. { "vectorQueries": [ { "kind": "vector", "vector": [1, 2, 3], "fields": “titlevector, contentVector" } ], }

Slide 81

Slide 81 text

Classified as Microsoft Confidential Hybrid Search I want to use hybrid search (text + vectors) so that I can leverage both vectors and keywords for my search relevance. { "vectorQueries": [ { "kind": "text", "text": "healthy foods", "fields": "vector" } ], "search": " healthy foods" }

Slide 82

Slide 82 text

Classified as Microsoft Confidential Multi-Vector Query I want to use multi-vector queries to pass in two different query embeddings for my multi-modal search use case using CLIP. { "vectorQueries": [ { "kind": "text", "text": "yummy vanilla ice cream", "fields": "textVector" }, { "kind": "text", "text": "vanilla.png", "fields": "imageVector" } ], } City eq 'New York

Slide 83

Slide 83 text

Classified as Microsoft Confidential Hybrid Search with Semantic reranking I want to use hybrid search (text + vectors) so that I can take advantage of vectors, keywords, and Semantic search capabilities such as captions and answers. { "vectorQueries": [ { "kind": "text", "text": "healthy foods", "fields": "vector" } ], "search": “healthy foods” "semanticConfiguration": “config" "queryType": "semantic" "answers": "extractive" "captions": "extractive" } City eq 'New York

Slide 84

Slide 84 text

Generate document embeddings docsEmbeddings.py # generate document embeddings def generate_embeddings(text): response = openai.Embedding.create( input=text, engine="text-embedding-ada-002") embeddings = response['data'][0]['embedding'] return embeddings

Slide 85

Slide 85 text

Generate query embedding queryEmbedding.py # generate query embedding response = openai.Embedding.create( input=“healthy foods", engine="text-embedding-ada-002" ) embeddings = response['data'][0]['embedding']

Slide 86

Slide 86 text

Thank you