$30 off During Our Annual Pro Sale. View Details »

Reverse Image Search for Ecommerce Without Going Crazy

Reverse Image Search for Ecommerce Without Going Crazy

Traditional full-text-based search engines have been on the market for a while. We're all currently trying to extend them with semantic search. Still, it might be more beneficial for some businesses to introduce reverse image search capabilities instead of relying only on text. However, both semantic search and reverse image may and should coexist!
You may encounter common pitfalls while implementing both, so why don't we discuss the best practices? Let's go ahead and check how to extend your existing search system with reverse image search without getting lost.

Kacper Łukawski

July 17, 2023
Tweet

More Decks by Kacper Łukawski

Other Decks in Programming

Transcript

  1. Reverse Image Search for
    Ecommerce Without Going Crazy
    Kacper Łukawski, Developer Advocate

    View Slide

  2. Status quo

    View Slide

  3. A typical structure of any ecommerce application

    View Slide

  4. Search mechanism based on textual queries only

    View Slide

  5. Reverse image search

    View Slide

  6. When should we
    consider reverse
    image search?
    1. Visual aspects of our inventory are
    essential.
    2. The items we sell are distinguishable
    from each other.
    3. Used by people unfamiliar with the
    language used in product data.
    4. Majority of our traffic comes from
    mobile devices.

    View Slide

  7. Solution?
    Semantic search!
    - Based on vector representations
    (embeddings) of the input data
    - Uses neural networks to encode
    the data into the embeddings
    - Catches the semantics, not
    words or pixels - two similar
    vectors should represent similar
    items
    - Might be used for text, images,
    audio, video and actually any
    other data type

    View Slide

  8. Converting images into vectors

    View Slide

  9. How does the semantic search for images work?
    Offline phase:
    1. Convert all the images of our inventory into
    vectors, using selected model.
    2. Store the vectors in a way that will simplify
    fast retrieval of the closest entries.
    3. Repeat the process on each change, to
    keep everything in sync.
    Online phase:
    1. Get the image uploaded by the user and
    vectorize it using the same model.
    2. Perform the nearest neighbours search in
    the inventory vectors to receive the closest
    entries.
    3. Serve the closest items back to the user.

    View Slide

  10. A typical structure of any ecommerce application

    View Slide

  11. Target architecture for reverse image search next to existing full-text search

    View Slide

  12. Available models

    View Slide

  13. Available models Using the pretrained models is easy if you
    choose a library that exposes them with a
    convenient interface. Some of the
    possibilities are:
    - torchvision - part of PyTorch
    - embetter - if you prefer using
    pandas-like API, that's a great choice
    - Sentence-Transformers - one of the
    standard libraries for NLP exposes
    OpenAI CLIP model as well
    There are also SaaS solutions, such as
    Aleph Alpha.

    View Slide

  14. View Slide

  15. CLIP is available in Sentence-Transformers, even in a multilingual version (50+ languages)

    View Slide

  16. Serving self-hosted models

    View Slide

  17. Available tools - TensorFlow Serving
    - TensorRT
    - NVIDIA Triton
    - ONNXRuntime
    - CLIP-as-service
    - …
    Choosing SaaS might be the simplest way, if
    no experience in serving ML models in
    organization.

    View Slide

  18. Why CLIP is a good choice?

    View Slide

  19. Textual query vs
    image query
    With the proposed stack, we have two
    separate pipelines:
    1. Textual query - uses only the full-text
    search mechanism.
    2. Image search - uses only the vector
    database with embeddings model.
    CLIP, as a multimodal model, might be
    useful for semantic search done on texts!
    We can use a single model to enrich our
    existing text-based search mechanism and
    build a hybrid search.

    View Slide

  20. Challenges of hybrid
    search
    1. When do we prefer full-text
    over semantic search and the
    other way around?
    2. How do we perform the
    reranking of the candidates
    produced by both systems?

    View Slide

  21. Combination of full-text and semantic search
    usually requires some reranking step.
    https://qdrant.tech/articles/hybrid-search/
    Hybrid search

    View Slide

  22. Vector search in production

    View Slide

  23. Qdrant vector
    database
    Qdrant is a vector search database using
    HNSW for Approximate Nearest
    Neighbours.
    ● Written in Rust and offers great
    performance.
    ● Allows to interact by HTTP or gRPC
    protocols, official SDKs available.
    ● Runs in single and cluster modes.
    ● Incorporates category,
    geocoordinates and full-text filters to
    the vector search.
    ● Makes vector search affordable.

    View Slide

  24. The importance of additional filters in semantic search

    View Slide

  25. Wrapping up

    View Slide

  26. An example of reverse image search

    View Slide

  27. An example of semantic search on texts

    View Slide

  28. Challenges of vector
    search
    1. How to choose the right
    model?
    2. How to host it?
    3. What if selected model doesn’t
    work well in our domain?
    4. How do I update the
    embeddings if I change the
    model?
    5. Is there any way to measure
    the quality of embeddings?

    View Slide

  29. Ecommerce notebook
    https://github.com/qdrant/examples

    View Slide

  30. Questions?
    Kacper Łukawski
    Developer Advocate
    Qdrant
    https://www.linkedin.com/in/kacperlukawski/
    https://twitter.com/LukawskiKacper
    https://github.com/kacperlukawski

    View Slide