[CVPR 2020 Tutorial] A Large-Scale Visual Search System in the C2C Marketplace App Mercari

Slide 1

Slide 1 text

A Large-Scale Visual Search System in the C2C Marketplace App Mercari Takuma Yamaguchi (Mercari, Inc.) CVPR 2020 Tutorial - Image Retrieval in the Wild 2020.06.19

Slide 2

Slide 2 text

Slide 3

Slide 3 text

(c) 2020 Mercari, Inc. Image Search or Visual Search? 3 Image Retrieval / Image Search Search by metadata, keywords, tags and images Visual Search Reverse Image Search Content-based Image Retrieval (CBIR) Search by images

Slide 4

Slide 4 text

Slide 5

Slide 5 text

(c) 2020 Mercari, Inc. Visual Search Applications / Services Visual Search Applications ● Google Images ● Bing Images ● Pinterest Visual Search ● Yandex Images ● TinEye ● eCommerce apps ● Retailers apps ● etc... 5 Visual Search Services ● Google Cloud Vision Product Search ● Bing Image Search ● TinEye API ● Alibaba Cloud Image Search ● Visenze Visual Search ● Syte Camera Search ● etc… Visual search may bring better user experience and better discoverability of contents and items

Slide 6

Slide 6 text

Slide 7

Slide 7 text

(c) 2020 Mercari, Inc. What is Mercari? 7 16+ Million  Monthly active users  The Mercari app is a C2C marketplace where individuals can easily sell used items 1.5+ Billion  Total number of items  The system should be highly scalable

Slide 8

Slide 8 text

(c) 2020 Mercari, Inc. Why Visual Search? 8 How do you query when you look for clothes like this? ● “Plaid” and “Shirt”? ● “Plaid” and “Blouse”? ● “Checks” and “Blouse”? ● “Checks” and “Blouse” and “Frill”? ● “Gingham Plaid” and “Blouse” and “Frill”? ● “Gingham Plaid” and “Blouse” and “Frill” and “Collar”? ● ... If you ﬁnd what you want, you are lucky If not, you just get tired...

Slide 9

Slide 9 text

(c) 2020 Mercari, Inc. Why Visual Search? 9 In a consumer-to-consumer (C2C) marketplace, enough information for each item is not always provided. Even if buyers know how to describe what they want properly, items without enough information may not be found by text-based search. This item cannot be discovered by “gingham plaid blouse”, even though the query terms are correct black and white plaid shirt

Slide 10

Slide 10 text

(c) 2020 Mercari, Inc. Visual Search vs. Text-based Search on the Mercari App 10 Query: “black white plaid shirt” Query: Text-based Search Visual Search (*) The visual search is available only on the Mercari Japan iOS app as of now Query: “Gingham Plaid Frill Blouse”

Slide 11

Slide 11 text

(c) 2020 Mercari, Inc. Visual Search Surpasses Text-based Search? 11 Query: Text-based Search Visual Search (*) The visual search is available only on the Mercari Japan iOS app as of now Query: “iphone xs max 256gb” (iPhone XS Max 256GB) iPhone X iPhone XS iPhone 6s / 7 iPhone XS Max 256GB Text-based search is better when: ● sellers and buyers describe products in the same way ● visual information is not enough to explain products

Slide 12

Slide 12 text

(c) 2020 Mercari, Inc. Visual Search Statistics 12 ● By 2021, retailers that support visual and voice search will increase their e-commerce revenue by 30% (Gartner) ○ https://www.gartner.com/smarterwithgartner/gartner-top-strategic-predictions-for-2018-and-beyond/ ● 62% of millenials want a visual search to discover products on their mobile devices (ViSenze) ○ https://www.fastgrowthbrands.com/2018/08/retailers-must-optimise-omnichannel-mobile-for-millennials/ ● 70+% of online shoppers in the UK in the under 25-year-old group have used or plan to use a visual search tool (GlobalData) ○ https://www.retail-insight-network.com/comment/rise-of-visual-search-2019/ ● 36% of consumers have performed or used visual search (Intent Lab) ○ https://www.businesswire.com/news/home/20190204005613/en/Visual-Search-Wins-Text-Consumers'-Trusted-Information

Slide 13

Slide 13 text

Slide 14

Slide 14 text

Slide 15

Slide 15 text

(c) 2020 Mercari, Inc. ● Faiss ● nmslib ● Annoy ● etc... Visual Search Processing Flow 15 Search Indexing Image Database Search Index Storage Similarity Search Search Index Builder ANN search should be used practically Image Feature Extraction Query Image

Slide 16

Slide 16 text

(c) 2020 Mercari, Inc. Visual Search System 16 Search Index Storage Similarity Search Visual Search API Image Feature Extraction Query Image Search Result Query Image Image Feature Image IDs Image Feature Search Index Multiple servers are needed to keep the system running for each component Requirements ● All the servers are being monitored ● Unhealthy servers are replaced with new servers ● When the system load becomes high, #of servers is increased and/or more computation resources are allocated.

Slide 17

Slide 17 text

(c) 2020 Mercari, Inc. Dockerize the System 17 Search Index Storage Similarity Search Visual Search API Image Feature Extraction Query Image Search Result Query Image Image Feature Image IDs Image Feature Search Index Docker makes your development, test, and deployment safe and eﬃcient Requirements ● All the servers are being monitored ● Unhealthy servers are replaced with new servers ● When the system load becomes high, #of servers is increased and/or more computation resources are allocated.

Slide 18

Slide 18 text

(c) 2020 Mercari, Inc. Query Image Search Result Run the System on Kubernetes 18 Search Index Storage Similarity Search Visual Search API Image Feature Extraction Query Image Image Feature Image IDs Image Feature Search Index Kubernetes provides self-healing, auto-scaling, and resource management ● Google Kubernetes Engine ● Amazon Elastic Container Service for Kubernetes ● Azure Kubernetes Service

Slide 19

Slide 19 text

(c) 2020 Mercari, Inc. How to Update Search Index 19 Similarity Search Service Visual Search API Service Image Feature Extraction Service A Pod is the smallest unit which consists of one or more containers, like a feature extraction container and a logging agent container A service is an abstraction deﬁnes a group of pods Query Image Search Result

Slide 20

Slide 20 text

(c) 2020 Mercari, Inc. How to Update Search Index 20 Similarity Search Service Visual Search API Service Image Feature Extraction Service A Pod is the smallest unit which consists of one or more containers, like a feature extraction container and a logging agent container A service is an abstraction deﬁnes a group of pods Image Feature Extractor Image Database Search Index Builder Similarity Search Docker Image Builder Docker Image Registry Search Index Images Image Feature Vectors Google Container Registry Amazon Elastic Container Registry Azure Container Registry Docker Image Autom ated Deploym ent Batch Processing If the search index doesn’t have to be updated real time, this system would be practical enough to handle a few million images

Slide 21

Slide 21 text

(c) 2020 Mercari, Inc. How to Scale the System 21 Monthly Similarity Search Service Visual Search API Service Image Feature Extraction Service Daily Similarity Search Service Hourly Similarity Search Service Docker Image Registry Similarity searches are executed in all the similarity search services and the results are merged by similarity scores Every month Everyday Every hour In mercari, at least hundreds of millions of images have to be handled in the system

Slide 22

Slide 22 text

(c) 2020 Mercari, Inc. Image Feature Extraction Service Latency Reduction by Edge Computing 22 Monthly Similarity Search Service Visual Search API Service Daily Similarity Search Service Hourly Similarity Search Service Docker Image Registry If the feature extraction model is small enough to run on mobile devices, you could reduce the latency and network traﬃc. Every month Everyday Every hour Feature Extraction Query Image Feature Vector MobileNet V2

Slide 23

Slide 23 text

(c) 2020 Mercari, Inc. Visual Search System of Mercari Japan 23 Index Building Kubernetes Cluster Monthly Index Builder Daily Index Builder Serving Kubernetes Cluster Visual Search Service Object Detection Service Feature Extraction Service Item DB Item Image Storage Feature Vec Storage Hourly Index Builder Item Explorer Item Image Downloader Feature Extractor ANN Index Builder ANN Index Storage Feature Vec Downloader ANN Index Builder Docker Registry Docker Image Builder Monthly ANN Service Monthly ANN Service Monthly Similarity Search Service Monthly ANN Service Monthly ANN Service Daily Similarity Search Service Monthly ANN Service Monthly ANN Service Hourly Similarity Search Service Feature Vec Downloader ANN Index Builder Kubernetes Engine Elastic Kubernetes Service Since item images are in AWS (S3) and our services are running on GCP, we use the both cloud services (*) The actual system architecture is slightly diﬀerent from this.

Slide 24

Slide 24 text

(c) 2020 Mercari, Inc. Processing Time 24 Serving Kubernetes Cluster Visual Search Service Object Detection Service Feature Extraction Service Monthly ANN Service Monthly ANN Service Monthly ANN Service Monthly ANN Service Monthly ANN Service Monthly ANN Service Kubernetes Engine 168ms 62ms 255ms Assuming that items in the past 1 year are searchable by the system, the system will have 11 monthly ANN services, 30 daily ANN services, and 24 hourly services. (*) The number of the items of each ANN service is different from actual one Monthly Similarity Search Service (30M items) Daily Similarity Search Service (1M items) Hourly Similarity Search Service (100K items) Monthly Similarity Search Service (30M items) Daily Similarity Search Service (1M items) Hourly Similarity Search Service (100K items) Visual Search Service 20ms 13ms 12ms 11 Services 30 Services 24 Services ANN (Similarity Search): Library: Faiss Index type: IVFADC (IndexIVFPQ) Code length per vector: 64B #cells visited for each query: 32, #cells: 8,192 In this experiment, 4 CPU cores are allocated for each service. Practically, resource allocation and the parameters of ANN should be optimized for each ANN service based on the number of items/images for each service. Docker allows us to allocate resources flexibly, like 1.5 CPU cores. Parallelly processed (*) The actual system architecture is slightly different from this. 362.4M image feature vectors Monthly Similarity Search Service Daily Similarity Search Service Hourly Similarity Search Service 12ms 13ms 20ms

Slide 25

Slide 25 text

(c) 2020 Mercari, Inc. 25 Index Building Kubernetes Cluster Monthly Index Builder Daily Index Builder Serving Kubernetes Cluster Visual Search Service Object Detection Service Feature Extraction Service Item DB Item Image Storage Feature Vec Storage Hourly Index Builder Item Explorer Item Image Downloader Feature Extractor ANN Index Builder ANN Index Storage Feature Vec Downloader ANN Index Builder Docker Registry Docker Image Builder Monthly ANN Service Monthly ANN Service Monthly ANN Service Monthly ANN Service Monthly ANN Service Monthly ANN Service Feature Vec Downloader ANN Index Builder Kubernetes Engine Elastic Kubernetes Service Since item images are in AWS (S3) and our services are running on GCP, we use the both cloud services (*) The actual system architecture is slightly diﬀerent from this. Can we simplify the system architecture? Monthly Similarity Search Service Daily Similarity Search Service Hourly Similarity Search Service

Slide 26

Slide 26 text

(c) 2020 Mercari, Inc. Visual Search with Elasticsearch 26 Visual Search API Service Feature Vector Image Feature Extractor Image Database Images Image Feature Vectors Search Result Search Result Feature Vector Batch Processing Elasticsearch Index Lucene Index Shard Segments Lucene Index Shard Segments Lucene Index Shard Segments The fewer segments bring better performance. Merging segments before rolling it out is recommended. In the Open Distro kNN, nmslib (HNSW) is used. If the memory consumption is acceptable and you are familiar with Elasticsearch, it may be an option to realize simpler visual search. https://medium.com/@kumon/similarity-search-and-similar-image-search-in-elasticsearch-14552a8a8dea https://medium.com/@kumon/how-to-realize-similarity-search-with-elasticsearch-3dd5641b9adb

Slide 27

Slide 27 text

Slide 28

Slide 28 text

(c) 2020 Mercari, Inc. Image Feature Extraction Model 28 14k Classes Global Average Pooling Input (224 x 224 x 3) Class labels category x brand x texture e.g. Nike striped men’s golf polos LuLaRoe ﬂoral girl’s dresses Louis Vuitton women’s long wallets Feature vector (1792D: 1,280D x 1.4) 9M images from Mercari MobileNet V2 (width multiplier: 1.4)

Slide 29

Slide 29 text

Slide 30

Slide 30 text

Slide 31

Slide 31 text

Slide 32

Slide 32 text

Slide 33

Slide 33 text

(c) 2020 Mercari, Inc. 33 What is the problem? (a) Query Image (b) Bad Results (c) Better Results Mercari is a C2C marketplace, where most sellers and buyers are not professional people ● Most of apparel items are not worn by models ● Apparels are laid flat on a surface or hung on hangers When a query image is a fitted apparel, fitted apparels tend to be retrieved more, even though such items have a very small proportion in the marketplace. Returning many items listed by professional sellers can cause problems for C2C marketplace, for example, by hurting buyer experience and discouraging nonprofessional sellers from listing items.

Slide 34

Slide 34 text

(c) 2020 Mercari, Inc. Removing Human Features from the Image Features 34 ーー ≃ ー ≃ ー extracts a feature vector which has no negative value elements and whose L2 norm is 1 T. Yamaguchi et al., Closing the Gap Between Query and Database through Query Feature Transformation in C2C e-Commerce Visual Search, SIGIR 2019 Workshop on eCommerce Fitted apparel Flat apparel Flat apparel Flat apparel

Slide 35

Slide 35 text

(c) 2020 Mercari, Inc. Removing Human Features from the Image Features 35 ーー ≃ ー ≃ ー extracts a feature vector which has no negative value elements and whose L2 norm is 1 T. Yamaguchi et al., Closing the Gap Between Query and Database through Query Feature Transformation in C2C e-Commerce Visual Search, SIGIR 2019 Workshop on eCommerce L2 norm normalization Negative value elements removal Fitted apparel Flat apparel Flat apparel Flat apparel

Slide 36

Slide 36 text

(c) 2020 Mercari, Inc. Removing Human Features from the Image Features 36 ーー ≃ ー ≃ ー extracts a feature vector which has no negative value elements and whose L2 norm is 1 T. Yamaguchi et al., Closing the Gap Between Query and Database through Query Feature Transformation in C2C e-Commerce Visual Search, SIGIR 2019 Workshop on eCommerce Thanks to this characteristic, the feature transformation can be applied to any kinds of images Fitted apparel Flat apparel Flat apparel Flat apparel

Slide 37

Slide 37 text

(c) 2020 Mercari, Inc. Image Feature Extraction Model 37 14k Classes Global Average Pooling Input (224 x 224 x 3) Class labels category x brand x texture e.g. Nike striped men’s golf polos LuLaRoe ﬂoral girl’s dresses Louis Vuitton women’s long wallets Feature vector (1792D: 1,280D x 1.4) 9M images from Mercari MobileNet V2 (width multiplier: 1.4) ReLU L2 norm Normalization T. Yamaguchi et al., Closing the Gap Between Query and Database through Query Feature Transformation in C2C e-Commerce Visual Search, SIGIR 2019 Workshop on eCommerce

Slide 38

Slide 38 text

(c) 2020 Mercari, Inc. How to Generate the Human Feature Vector 38 Fitted tops Flat tops Fitted bottoms Flat bottoms T. Yamaguchi et al., Closing the Gap Between Query and Database through Query Feature Transformation in C2C e-Commerce Visual Search, SIGIR 2019 Workshop on eCommerce Human vector of tops Human vector of bottoms

Slide 39

Slide 39 text

(c) 2020 Mercari, Inc. Experiment Results 39 mAP@100 The testset had 20,000 images (flat apparel: 10,000 / fitted apparel: 10,000). 2,000 of them were used as query images. A retrieved item was evaluated as correctly selected only when it was an image of flat apparel in the same category as the query. Significant improvement for the fitted apparel queries in every category Also positively influenced flat apparel queries T. Yamaguchi et al., Closing the Gap Between Query and Database through Query Feature Transformation in C2C e-Commerce Visual Search, SIGIR 2019 Workshop on eCommerce

Slide 40

Slide 40 text

Slide 41

Slide 41 text

Slide 42

Slide 42 text

Slide 43

Slide 43 text

(c) 2020 Mercari, Inc. Visual Search Applications 43 ● Search Items by Image ● Find Similar Items to Sold Item ● Item Information Prediction for Sellers ● Price Estimation by Image ● Item Monitoring / Prohibited Item Detection

Slide 44

Slide 44 text

Slide 45

Slide 45 text

Slide 46

Slide 46 text

(c) 2020 Mercari, Inc. 46 Item Information Prediction Item category, brand and title are predicted using item information of existing visually similar listings Amazon Echo Dot 3rd Generation Smart Speakers & Assistants Amazon (*) This feature is available in the both Mercari Japan and Mercari US apps

Slide 47

Slide 47 text

(c) 2020 Mercari, Inc. What we learnt through the project Are you considering integrating visual search features into your services? 47 1. Try some visual search services first 2. If you find a service which meets your expectations, use it 3. Even if the pricing is an issue, use such a service first a. Confirm if the feature is useful for your service b. If necessary, consider developing your visual search system on your own for performance improvement and cost reduction 4. If you need a highly flexible and scalable system, build it on your own 5. The cost for developing and operating a visual search system would be high.

Slide 48

Slide 48 text

Thanks! Have fun with visual search system development