list of similar images by the content of the image itself rather than keywords, tags, or descriptions associated with the image. https://en.wikipedia.org/wiki/Content-based_image_retrieval
› Pros: fast computation, memory-friendly › Cons: Accuracy trade-off SOTA Approach: Represent images by float embedding and retrieve similar images by ranking similarity scores using a distance metric › Pros: The retrieval is accurate › Cons: Float embedding is computational expensive and memory-inefficient Traditional Approach: Color histogram, texture, shape as features › Pros: Idea is simple and somewhat easy to implement › Cons: Not accurate
preserving manners: Pairwise, Multi-wise › Classification oriented Our approach: hybrid › Obtaining the binary code by classifying if a pair of images is similar or not
search is slow even with binary codes in large-scale › Need to reduce the search space: Approximate Nearest Neighbor Search › nprobe: The number of closest centroids in centroid search.
› 10M sticker packages › Each package has from 8 to 40 stickers › Current search system: 20 CPUs, 256 GB Memory › Performance: 0.01 second per sticker Search time Sticker Search system › Number of centroids N: 2^17 = 131072 centroids › nprobe: 1000
sticker search on the database of over 300M stickers. › The system supports the sticker review process and saves up to 3 hours of review time for each reviewer a day. › A large-scale image similarity search system has been developed at LINE.