(University of Tsukuba) * Equal contribution. Large-scale Landmark Retrieval/Recognition under a Noisy and Diverse Dataset (arXiv:1906.04087) (1st place solution, retrieval)
Cosine-based softmax loss with a “cleaned subset” Related topic: (Arandjelović & Zisserman, CVPR’10) “Three important things to improve image retrieval” Related topic: (Wang+, ECCV’18) “The Devil of Face Recognition is in the Noise” 2. Rediscover the idea of “Discriminative QE” technique
Cosine-based softmax loss with a “cleaned subset” Related topic: (Arandjelović & Zisserman, CVPR’10) “Three important things to improve image retrieval” Related topic: (Wang+, ECCV’18) “The Devil of Face Recognition is in the Noise” 2. Rediscover the idea of “Discriminative QE” technique
label to . xi Automated Data Cleaning With local feature matching & spatial verification (inlier-count). For each train image , 1. kNN (k=1000) from the train set. (image representation is learned from the Google-Landmarks-v1) 3. Spatial Verification (\w DELFv2) is performed on up to the 100-NN xi ✔ 4. Add into our clean train set when the count of verified images is greater than the threshold (=2). xi
(inlier-count). For each train image , 1. kNN (k=1000) from the train set. (image representation is learned from the Google-Landmarks-v1) 3. Spatial Verification (\w DELFv2) is performed on up to the 100-NN xi 2. Select up to the 100-NN assigned to the same label to . ✔ 4. Add into our clean train set when the count of verified images is greater than the threshold (=2). xi xi
(inlier-count). For each train image , 1. kNN (k=1000) from the train set. (image representation is learned from the Google-Landmarks-v1) 3. Spatial Verification (w/ DELFv2) is performed on up to the 100-NN xi ✔ 4. Add into our clean train set when the count of verified images is greater than the threshold (=2). xi 2. Select up to the 100-NN assigned to the same label to . xi
(inlier-count). For each train image , 1. kNN (k=1000) from the train set. (image representation is learned from the Google-Landmarks-v1) 3. Spatial Verification (w/ DELFv2) is performed on up to the 100-NN xi ✔ 4. Add into our clean train set when the count of verified images is greater than the threshold (=2). xi 2. Select up to the 100-NN assigned to the same label to . xi
Cosine-based softmax loss with a “cleaned subset” Related topic: (Arandjelović & Zisserman, CVPR’10) “Three important things to improve image retrieval” Related topic: (Wang+, ECCV’18) “The Devil of Face Recognition is in the Noise” 2. Rediscover the idea of “Discriminative QE” technique
which are not retrieved by the similarity search. Positive, which are not retrieved Query Positive samples are moved to the left of the negative samples in the ranking. Query Query Positive Positive Positive Positive Positive Positive Negative Negative
which are not retrieved by the similarity search. Positive, which are not retrieved Query Positive samples are moved to the left of the negative samples in the ranking. Query Query Positive Positive Positive Positive Positive Positive Negative Negative
which are not retrieved by the similarity search. Positive, which are not retrieved Query Positive samples are moved to the left of the negative samples in the ranking. Query Query Positive Positive Positive Positive Positive Positive Negative Negative 1
2019 1. Cosine-based softmax loss with a “cleaned subset” Related topic: (Arandjelović & Zisserman, CVPR’10) “Three important things to improve image retrieval” Related topic: (Wang+, ECCV’18) “The Devil of Face Recognition is in the Noise” 2. Rediscover the idea of “Discriminative QE” technique
0.85 1.00 0.75 1.00 0.60 0.50 Query Euclidean search TOP k (k=3) nearest neighbors in the train set Similarity term Inlier-count term a set of q's neighbors (top3) and its members are assigned to l. Inlier-count The New Town Hall in Hanover Hamburg City Hall Our recognition method is based on accumulating top-K nearest neighbors in the train set. ˆ y = argmax = sl Hamburg City Hall l =
metric in our solution. • Successful methods in face recognition. • Also in landmark retrieval/recognition, we found out cosine-based softmax losses are very effective. • Hyperparameter: m=0.3 and s=30 were used in both. • There are many winning solutions using cosine-based softmax losses: • Whale Humpbuck - 1st place • Protain Classification - 1st place [1] J. Deng, J. Guo, and S. Zafeiriou. Arcface: Additive angular margin loss for deep face recognition. arXiv:1801.07698, 2018. [2] H. Wang, Y. Wang, Z. Zhou, X. Ji, D. Gong, J. Zhou, Z. Li, and W. Liu. Cosface: Large margin cosine loss for deep face recognition. In CVPR, pages 5265–5274, 2018.
SE-ResNeXt-101 • Data augmentation, “soft” and “hard” strategy. • “Soft”: 5 epochs with random cropping and scaling. • “Hard”: 7 epochs with random brightness shift, random sheer translation, random cropping, and scaling. • Combine various techniques: • Aspect preserving of input images. • Cosine annealing LR scheduler. • GeM-pooling (generalized mean pooling). • Fine-tuning at full resolution on the last epoch with freezing BN.