Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Reverse Image Search for Ecommerce Without Going Crazy

Reverse Image Search for Ecommerce Without Going Crazy

Traditional full-text-based search engines have been on the market for a while. We're all currently trying to extend them with semantic search. Still, it might be more beneficial for some businesses to introduce reverse image search capabilities instead of relying only on text. However, both semantic search and reverse image may and should coexist!
You may encounter common pitfalls while implementing both, so why don't we discuss the best practices? Let's go ahead and check how to extend your existing search system with reverse image search without getting lost.

Kacper Łukawski

July 17, 2023
Tweet

More Decks by Kacper Łukawski

Other Decks in Programming

Transcript

  1. When should we consider reverse image search? 1. Visual aspects

    of our inventory are essential. 2. The items we sell are distinguishable from each other. 3. Used by people unfamiliar with the language used in product data. 4. Majority of our traffic comes from mobile devices.
  2. Solution? Semantic search! - Based on vector representations (embeddings) of

    the input data - Uses neural networks to encode the data into the embeddings - Catches the semantics, not words or pixels - two similar vectors should represent similar items - Might be used for text, images, audio, video and actually any other data type
  3. How does the semantic search for images work? Offline phase:

    1. Convert all the images of our inventory into vectors, using selected model. 2. Store the vectors in a way that will simplify fast retrieval of the closest entries. 3. Repeat the process on each change, to keep everything in sync. Online phase: 1. Get the image uploaded by the user and vectorize it using the same model. 2. Perform the nearest neighbours search in the inventory vectors to receive the closest entries. 3. Serve the closest items back to the user.
  4. Available models Using the pretrained models is easy if you

    choose a library that exposes them with a convenient interface. Some of the possibilities are: - torchvision - part of PyTorch - embetter - if you prefer using pandas-like API, that's a great choice - Sentence-Transformers - one of the standard libraries for NLP exposes OpenAI CLIP model as well There are also SaaS solutions, such as Aleph Alpha.
  5. Available tools - TensorFlow Serving - TensorRT - NVIDIA Triton

    - ONNXRuntime - CLIP-as-service - … Choosing SaaS might be the simplest way, if no experience in serving ML models in organization.
  6. Textual query vs image query With the proposed stack, we

    have two separate pipelines: 1. Textual query - uses only the full-text search mechanism. 2. Image search - uses only the vector database with embeddings model. CLIP, as a multimodal model, might be useful for semantic search done on texts! We can use a single model to enrich our existing text-based search mechanism and build a hybrid search.
  7. Challenges of hybrid search 1. When do we prefer full-text

    over semantic search and the other way around? 2. How do we perform the reranking of the candidates produced by both systems?
  8. Combination of full-text and semantic search usually requires some reranking

    step. https://qdrant.tech/articles/hybrid-search/ Hybrid search
  9. Qdrant vector database Qdrant is a vector search database using

    HNSW for Approximate Nearest Neighbours. • Written in Rust and offers great performance. • Allows to interact by HTTP or gRPC protocols, official SDKs available. • Runs in single and cluster modes. • Incorporates category, geocoordinates and full-text filters to the vector search. • Makes vector search affordable.
  10. Challenges of vector search 1. How to choose the right

    model? 2. How to host it? 3. What if selected model doesn’t work well in our domain? 4. How do I update the embeddings if I change the model? 5. Is there any way to measure the quality of embeddings?