Upgrade to Pro — share decks privately, control downloads, hide ads and more …

.NET Day 2025: Supercharged Search with Semanti...

.NET Day 2025: Supercharged Search with Semantic Search and Vector Embeddings

Avatar for dotnetday

dotnetday

August 29, 2025
Tweet

More Decks by dotnetday

Other Decks in Technology

Transcript

  1. Giorgi Dalakishvili • 17 years experience • .Net, C#, Postgres,

    SQL Server • Follow me at https://github.com/Giorgi/ • Website: https://giorgi.dev • Visualize query plans with EFCore.Visualizer • Handle exceptions with EntityFramework.Exceptions
  2. Semantic Search Semantic Search Results that match based on meaning.

    Might returns results without direct word matches but will match user’s intent Learning how to use a database – SQL Crash Course Returns results that match the exact words or derived words. Lexical Search
  3. Benefits Semantic Search • Can distinguish between meanings - "chocolate

    milk" vs “milk chocolate.” • Easier to use for users – No need to remember exact terms or names, input vague search queries • Concepts are more robust than keywords - By matching concepts rather than keywords, semantic search produces more accurate results • Better for business - Semantic search can boost sales and customer satisfaction
  4. Semantic Search – How it Works LLMs Embeddings Trained on

    vast amounts of text data Understand context and relationships between words. LLMs form the foundation for capturing the semantics of language LLMs generate vector representations, called embeddings, encapsulating their semantic meaning. Embeddings map textual data to a high-dimensional space, where items with similar meanings are closer together, represented as vectors Measure the similarity between vectors. Determine the relevance of search results by comparing the query’s embeddings to the embeddings of documents in the search index. Similarity Metrics
  5. Vector Embeddings • Numeric representation of data. They represent the

    meaning and the context processed by model. • Can be generated from any kind of data – text, images, audio, etc • Two objects with similar semantics will have vectors close to each other • Used for semantic search, text classification, anomaly detection, recommendation system [0.2, 0.8, -0.4, 0.6, ...]
  6. Similarity Metrics • A function that takes two vectors as

    input and calculates a distance value between them • use the calculated distance to judge how close or far apart two vector embeddings are • No ‘one size fits all' distance metric • Different similarity measures balance the speed and accuracy. • Use the distance metric that matches the model
  7. Similarity Metrics Meaning Properties considered Euclidean distance Distance between ends

    of vectors Magnitudes and direction Cosine Cosine of angle between vectors Only direction Dot product Cosine multiplied by lengths of both vectors Magnitudes and direction
  8. Embedding Generator • OpenAI API • Azure OpenAI • Hugging

    Face Inference API • Ollama, OllamaSharp • Run ONNX models - Semantic Kernel • SentenceTransformers - Python Third-Party API Run Model Locally • Unified approach for representing AI components • IEmbeddingGenerator – Abstraction over a generator of embeddings • Multiple implementations: Ollama, OpenAI, ONNX, etc Microsoft.Extensions.AI
  9. Vector Stores • Chroma • Pinecone • Milvus • Qdrant

    • Weaviate • SQL Server 2025 • PostgreSQL • Oracle 23ai • Azure Cosmos DB • Redis • MongoDB • Couchbase • Elasticsearch Vector Database Relational Other
  10. Vector data type – SQL Server 2025 • Exact and

    approximate nearest neighbor search • Cosine distance, Euclidian distance, dot product, etc • Vector indexes • All the other features of SQL Server Vector similarity search for SQL Server Database
  11. Why SQL Server for vectors • Collocate data for easy

    querying and consistency • Fast and cost-effective solution • ACID compliance • Operational Advantages • Backup, better observability and tooling, point in time recovery • All drivers and languages compatible with vector type – C#, Java (JDBC), Entity Framework Core
  12. Vector data type (preview) CREATE TABLE items (id INT PRIMARY

    KEY, embedding VECTOR(3) NOT NULL); INSERT INTO items (id, embedding) VALUES (0, '[0.1, 2, 3]'), (1, '[-4.24, 5, 6]’); DECLARE @target AS VECTOR(3)='[3,1,2]'; SELECT * FROM items ORDER BY VECTOR_DISTANCE('cosine', embedding, @target); SELECT *, VECTOR_DISTANCE('cosine', embedding, @target) distance FROM items ORDER BY distance
  13. Vector data type – ADO.NET • SQL Server stores vectors

    in an optimized binary format but exposes them as JSON arrays for convenience • Older clients can work with vector as if it were a JSON array • Microsoft.Data.SqlClient: Version 6.1.1 introduces the SqlVector type, reducing payload size and eliminates overhead of JSON
  14. Vector data type – EF Core • Uses JSON compatibility

    for vector support • Requires EFCore.SqlServer.VectorSearch • Native support for vectors type • No extra dependencies EF Core 9 EF Core 10
  15. Vector Indexes • Exact search without index - kNN (k-nearest

    neighbors). • knn search is computationally very expensive. • Approximate Nearest Neighbors - trade off a bit of accuracy for performance
  16. Approximate Nearest Neighbors • Find similar vectors without searching all

    of them • Ann offers significant computational efficiency • Recall – how many retrieved neighbors are true nearest neighbors
  17. ANN Vector Indexes • IVFFlat - Inverted File with Flat

    Compression • HNSW - Hierarchical navigable small world • DiskANN – available in SQL Server 2025, Azure Cosmos DB, Azure Database for PostgreSQL
  18. DiskANN • Developed by Microsoft Research • High recall, high

    throughput, and low query latency • Surpasses HNSW and IVFFlat in latency and accuracy • Open-Source: https://github.com/microsoft/DiskANN • VECTOR_SEARCH function - Search for vectors similar to a given query vectors using an ANN vector search algorithm CREATE VECTOR INDEX vec_idx ON NewsItems(Embedding) WITH (metric = 'cosine', type = 'diskann');
  19. CREDITS: This presentation template was created by Slidesgo, and includes

    icons by Flaticon, and infographics & images by Freepik THANKS! QUESTIONS? https://github.com/Giorgi/Semantic-Search-Demo/ https://bsky.app/profile/giorgi.dev https://github.com/Giorgi/ https://giorgi.dev