(Retrieval) Large-scale Landmark Retrieval/Recognition under a Noisy and Diverse Dataset

Team smlyaka: Kohei Ozaki * (Recruit Technologies) Shuhei Yokoo *
(University of Tsukuba)   * Equal contribution. Large-scale Landmark Retrieval/Recognition  under a Noisy and Diverse Dataset (arXiv:1906.04087) (1st place solution, retrieval)

Final Results Our pipeline is based on the standard method.
CNN-based global descriptor + Euclidean search + Re-ranking Single Model (0.318) + Ensemble (0.330) + Re-ranking

Two important things to improve landmark retrieval in 2019 1.
Cosine-based softmax loss with a “cleaned subset” Related topic: (Arandjelović & Zisserman, CVPR’10) “Three important things to improve image retrieval” Related topic: (Wang+, ECCV’18) “The Devil of Face Recognition is in the Noise” 2. Rediscover the idea of “Discriminative QE” technique

Cleaning the Google-Landmarks-v2 landmark_id=140690 The Google-Landmarks-v2 is a quite noisy
and diverse dataset. Metric learning methods are usually sensitive to noise, and it is essential to clean the dataset before applying it.

Cleaning the Google-Landmarks-v2 landmark_id=140690 ✗ ✗ ✗ ✗ ✗ To
address the noise issue, we developed an automated data cleaning system, and apply it to the Google-Landmarks-v2.

2. Select up to the 100-NN assigned to the same
label to . xi Automated Data Cleaning With local feature matching & spatial verification (inlier-count). For each  train image , 1. kNN (k=1000) from the train set.  (image representation is learned from the Google-Landmarks-v1) 3. Spatial Verification (\w DELFv2) is performed on up to the 100-NN xi ✔ 4. Add into our clean train set  when the count of verified images is greater than the threshold (=2). xi

Automated Data Cleaning With local feature matching & spatial verification
(inlier-count). For each  train image , 1. kNN (k=1000) from the train set.  (image representation is learned from the Google-Landmarks-v1) 3. Spatial Verification (\w DELFv2) is performed on up to the 100-NN xi 2. Select up to the 100-NN assigned to the same label to . ✔ 4. Add into our clean train set  when the count of verified images is greater than the threshold (=2). xi xi

Automated Data Cleaning With local feature matching & spatial verification
(inlier-count). For each  train image , 1. kNN (k=1000) from the train set.  (image representation is learned from the Google-Landmarks-v1) 3. Spatial Verification (w/ DELFv2) is performed on up to the 100-NN xi ✔ 4. Add into our clean train set  when the count of verified images is greater than the threshold (=2). xi 2. Select up to the 100-NN assigned to the same label to . xi

Two important things to improve landmark retrieval in 2019 1.
Cosine-based softmax loss with a “cleaned subset” Related topic: (Arandjelović & Zisserman, CVPR’10) “Three important things to improve image retrieval” Related topic: (Wang+, ECCV’18) “The Devil of Face Recognition is in the Noise” 2. Rediscover the idea of “Discriminative QE” technique

Discriminative Reranking Predict a landmark_id of each sample from the
test set and index set. Recognition Pipeline id=1 id=2 predict id=1 predict predict test index ・・・・・・・・・

Discriminative Reranking Append positive samples from the entire index set,
which are not retrieved by the similarity search. Positive, which are not retrieved Query Positive samples are moved to the left of the negative samples in the ranking. Query Query Positive Positive Positive Positive Positive Positive Negative Negative

Discriminative Reranking Append positive samples from the entire index set,
which are not retrieved by the similarity search. Positive, which are not retrieved Query Positive samples are moved to the left of the negative samples in the ranking. Query Query Positive Positive Positive Positive Positive Positive Negative Negative 1

Key takeaways: Two important things to improve landmark retrieval in
2019 1. Cosine-based softmax loss with a “cleaned subset” Related topic: (Arandjelović & Zisserman, CVPR’10) “Three important things to improve image retrieval” Related topic: (Wang+, ECCV’18) “The Devil of Face Recognition is in the Noise” 2. Rediscover the idea of “Discriminative QE” technique

Appendix

Soft-voting with spatial veriﬁcation Similarity term Inlier-count term Conﬁdence scoring:
0.85 1.00 0.75 1.00 0.60 0.50 Query Euclidean  search TOP k (k=3) nearest neighbors in the train set Similarity term Inlier-count term a set of q's neighbors (top3) and its members are assigned to l. Inlier-count The New Town Hall  in Hanover Hamburg City Hall Our recognition method is based on accumulating top-K nearest neighbors in the train set. ˆ y = argmax = sl Hamburg City Hall l =

Cosine-based Softmax Loss • Employ ArcFace and CosFace for learning
metric in our solution. • Successful methods in face recognition. • Also in landmark retrieval/recognition, we found out cosine-based softmax losses are very eﬀective. • Hyperparameter: m=0.3 and s=30 were used in both. • There are many winning solutions using cosine-based softmax losses: • Whale Humpbuck - 1st place • Protain Classiﬁcation - 1st place [1] J. Deng, J. Guo, and S. Zafeiriou. Arcface: Additive angular margin loss for deep face recognition. arXiv:1801.07698, 2018. [2] H. Wang, Y. Wang, Z. Zhou, X. Ji, D. Gong, J. Zhou, Z. Li, and W. Liu. Cosface: Large margin cosine loss for deep face recognition. In CVPR, pages 5265–5274, 2018.

Modeling | Overview • Backbones: • FishNet-150 • ResNet-101 •
SE-ResNeXt-101 • Data augmentation, “soft” and “hard” strategy. • “Soft”: 5 epochs with random cropping and scaling. • “Hard”: 7 epochs with random brightness shift, random sheer translation, random cropping, and scaling. • Combine various techniques: • Aspect preserving of input images. • Cosine annealing LR scheduler. • GeM-pooling (generalized mean pooling). • Fine-tuning at full resolution on the last epoch with freezing BN.

Modeling | Ensemble Pub/Priv=30.95/33.01 Ensemble Concat + L2N (3072d) FishNet-150
ArcFace, Soft (512d) Pub/Priv=28.66/30.76 FishNet-150 CosFace, Soft (512d) Pub/Priv=29.04/31.56 FishNet-150 ArcFace, Hard (512d) Pub/Priv=29.17/31.26 ResNet-101 ArcFace, Hard (512d) Pub/Priv=28.57/31.07 SE-ResNeXt-101 ArcFace, Hard (512d) Pub/Priv=29.60/31.52 SE-ResNeXt-101 ArcFace, Hard (512d) Pub/Priv=29.42/31.80 Pub/Priv: Public/PrivateLB score L2N: L2-Normalization

Appendix: Another case landmark_id=29

(Retrieval) Large-scale Landmark Retrieval/Reco...

(Retrieval) Large-scale Landmark Retrieval/Recognition under a Noisy and Diverse Dataset

@smly

More Decks by @smly

Other Decks in Research

Featured

Transcript

Team smlyaka: Kohei Ozaki * (Recruit Technologies) Shuhei Yokoo *

Final Results Our pipeline is based on the standard method.

Two important things to improve landmark retrieval in 2019 1.

Two important things to improve landmark retrieval in 2019 1.

Cleaning the Google-Landmarks-v2 landmark_id=140690 The Google-Landmarks-v2 is a quite noisy

Cleaning the Google-Landmarks-v2 landmark_id=140690 ✗ ✗ ✗ ✗ ✗ To

2. Select up to the 100-NN assigned to the same

Automated Data Cleaning With local feature matching & spatial veriﬁcation

Automated Data Cleaning With local feature matching & spatial veriﬁcation

Automated Data Cleaning With local feature matching & spatial veriﬁcation

Two important things to improve landmark retrieval in 2019 1.

Discriminative Reranking Predict a landmark_id of each sample from the

Discriminative Reranking Predict a landmark_id of each sample from the

Discriminative Reranking Append positive samples from the entire index set,

Discriminative Reranking Append positive samples from the entire index set,

Discriminative Reranking Append positive samples from the entire index set,

Key takeaways: Two important things to improve landmark retrieval in

Appendix

Soft-voting with spatial veriﬁcation Similarity term Inlier-count term Conﬁdence scoring:

Cosine-based Softmax Loss • Employ ArcFace and CosFace for learning

Modeling | Overview • Backbones: • FishNet-150 • ResNet-101 •

Modeling | Ensemble Pub/Priv=30.95/33.01 Ensemble Concat + L2N (3072d) FishNet-150

Appendix: Another case landmark_id=29

(Retrieval) Large-scale Landmark Retrieval/Reco...

(Retrieval) Large-scale Landmark Retrieval/Recognition under a Noisy and Diverse Dataset

More Decks by @smly

Other Decks in Research

Featured

Transcript

(Retrieval) Large-scale Landmark Retrieval/Recognition under a Noisy and Diverse Dataset