(Retrieval) Large-scale Landmark Retrieval/Recognition
under a Noisy and Diverse Dataset

4742812a011db89b01a52af6722640b8?s=47 @smly
June 16, 2019

(Retrieval) Large-scale Landmark Retrieval/Recognition
under a Noisy and Diverse Dataset

4742812a011db89b01a52af6722640b8?s=128

@smly

June 16, 2019
Tweet

Transcript

  1. 1.

    Team smlyaka: Kohei Ozaki * (Recruit Technologies) Shuhei Yokoo *

    (University of Tsukuba) 
 * Equal contribution. Large-scale Landmark Retrieval/Recognition
 under a Noisy and Diverse Dataset (arXiv:1906.04087) (1st place solution, retrieval)
  2. 2.

    Final Results Our pipeline is based on the standard method.

    CNN-based global descriptor + Euclidean search + Re-ranking Single Model (0.318) + Ensemble (0.330) + Re-ranking
  3. 3.

    Two important things to improve landmark retrieval in 2019 1.

    Cosine-based softmax loss with a “cleaned subset” Related topic: (Arandjelović & Zisserman, CVPR’10) “Three important things to improve image retrieval” Related topic: (Wang+, ECCV’18) “The Devil of Face Recognition is in the Noise” 2. Rediscover the idea of “Discriminative QE” technique
  4. 4.

    Two important things to improve landmark retrieval in 2019 1.

    Cosine-based softmax loss with a “cleaned subset” Related topic: (Arandjelović & Zisserman, CVPR’10) “Three important things to improve image retrieval” Related topic: (Wang+, ECCV’18) “The Devil of Face Recognition is in the Noise” 2. Rediscover the idea of “Discriminative QE” technique
  5. 5.

    Cleaning the Google-Landmarks-v2 landmark_id=140690 The Google-Landmarks-v2 is a quite noisy

    and diverse dataset. Metric learning methods are usually sensitive to noise, and it is essential to clean the dataset before applying it.
  6. 6.

    Cleaning the Google-Landmarks-v2 landmark_id=140690 ✗ ✗ ✗ ✗ ✗ To

    address the noise issue, we developed an automated data cleaning system, and apply it to the Google-Landmarks-v2.
  7. 7.

    2. Select up to the 100-NN assigned to the same

    label to . xi Automated Data Cleaning With local feature matching & spatial verification (inlier-count). For each
 train image , 1. kNN (k=1000) from the train set.
 (image representation is learned from the Google-Landmarks-v1) 3. Spatial Verification (\w DELFv2) is performed on up to the 100-NN xi ✔ 4. Add into our clean train set
 when the count of verified images is greater than the threshold (=2). xi
  8. 8.

    Automated Data Cleaning With local feature matching & spatial verification

    (inlier-count). For each
 train image , 1. kNN (k=1000) from the train set.
 (image representation is learned from the Google-Landmarks-v1) 3. Spatial Verification (\w DELFv2) is performed on up to the 100-NN xi 2. Select up to the 100-NN assigned to the same label to . ✔ 4. Add into our clean train set
 when the count of verified images is greater than the threshold (=2). xi xi
  9. 9.

    Automated Data Cleaning With local feature matching & spatial verification

    (inlier-count). For each
 train image , 1. kNN (k=1000) from the train set.
 (image representation is learned from the Google-Landmarks-v1) 3. Spatial Verification (w/ DELFv2) is performed on up to the 100-NN xi ✔ 4. Add into our clean train set
 when the count of verified images is greater than the threshold (=2). xi 2. Select up to the 100-NN assigned to the same label to . xi
  10. 10.

    Automated Data Cleaning With local feature matching & spatial verification

    (inlier-count). For each
 train image , 1. kNN (k=1000) from the train set.
 (image representation is learned from the Google-Landmarks-v1) 3. Spatial Verification (w/ DELFv2) is performed on up to the 100-NN xi ✔ 4. Add into our clean train set
 when the count of verified images is greater than the threshold (=2). xi 2. Select up to the 100-NN assigned to the same label to . xi
  11. 11.

    Two important things to improve landmark retrieval in 2019 1.

    Cosine-based softmax loss with a “cleaned subset” Related topic: (Arandjelović & Zisserman, CVPR’10) “Three important things to improve image retrieval” Related topic: (Wang+, ECCV’18) “The Devil of Face Recognition is in the Noise” 2. Rediscover the idea of “Discriminative QE” technique
  12. 12.

    Discriminative Reranking Predict a landmark_id of each sample from the

    test set and index set. Recognition Pipeline id=1 id=2 predict id=1 predict predict test index ・・・ ・・・ ・・・
  13. 13.

    Discriminative Reranking Predict a landmark_id of each sample from the

    test set and index set. Recognition Pipeline id=1 id=2 predict id=1 predict predict test index ・・・ ・・・ ・・・
  14. 14.

    Discriminative Reranking Append positive samples from the entire index set,

    which are not retrieved by the similarity search. Positive, which are not retrieved Query Positive samples are moved to the left of the negative samples in the ranking. Query Query Positive Positive Positive Positive Positive Positive Negative Negative
  15. 15.

    Discriminative Reranking Append positive samples from the entire index set,

    which are not retrieved by the similarity search. Positive, which are not retrieved Query Positive samples are moved to the left of the negative samples in the ranking. Query Query Positive Positive Positive Positive Positive Positive Negative Negative
  16. 16.

    Discriminative Reranking Append positive samples from the entire index set,

    which are not retrieved by the similarity search. Positive, which are not retrieved Query Positive samples are moved to the left of the negative samples in the ranking. Query Query Positive Positive Positive Positive Positive Positive Negative Negative 1
  17. 17.

    Key takeaways: Two important things to improve landmark retrieval in

    2019 1. Cosine-based softmax loss with a “cleaned subset” Related topic: (Arandjelović & Zisserman, CVPR’10) “Three important things to improve image retrieval” Related topic: (Wang+, ECCV’18) “The Devil of Face Recognition is in the Noise” 2. Rediscover the idea of “Discriminative QE” technique
  18. 18.
  19. 19.

    Soft-voting with spatial verification Similarity term Inlier-count term Confidence scoring:

    0.85 1.00 0.75 1.00 0.60 0.50 Query Euclidean
 search TOP k (k=3) nearest neighbors in the train set Similarity term Inlier-count term a set of q's neighbors (top3) and its members are assigned to l. Inlier-count The New Town Hall
 in Hanover Hamburg City Hall Our recognition method is based on accumulating top-K nearest neighbors in the train set. ˆ y = argmax = sl Hamburg City Hall l =
  20. 20.

    Cosine-based Softmax Loss • Employ ArcFace and CosFace for learning

    metric in our solution. • Successful methods in face recognition. • Also in landmark retrieval/recognition, we found out cosine-based softmax losses are very effective. • Hyperparameter: m=0.3 and s=30 were used in both. • There are many winning solutions using cosine-based softmax losses: • Whale Humpbuck - 1st place • Protain Classification - 1st place [1] J. Deng, J. Guo, and S. Zafeiriou. Arcface: Additive angular margin loss for deep face recognition. arXiv:1801.07698, 2018. [2] H. Wang, Y. Wang, Z. Zhou, X. Ji, D. Gong, J. Zhou, Z. Li, and W. Liu. Cosface: Large margin cosine loss for deep face recognition. In CVPR, pages 5265–5274, 2018.
  21. 21.

    Modeling | Overview • Backbones: • FishNet-150 • ResNet-101 •

    SE-ResNeXt-101 • Data augmentation, “soft” and “hard” strategy. • “Soft”: 5 epochs with random cropping and scaling. • “Hard”: 7 epochs with random brightness shift, random sheer translation, random cropping, and scaling. • Combine various techniques: • Aspect preserving of input images. • Cosine annealing LR scheduler. • GeM-pooling (generalized mean pooling). • Fine-tuning at full resolution on the last epoch with freezing BN.
  22. 22.

    Modeling | Ensemble Pub/Priv=30.95/33.01 Ensemble Concat + L2N (3072d) FishNet-150

    ArcFace, Soft (512d) Pub/Priv=28.66/30.76 FishNet-150 CosFace, Soft (512d) Pub/Priv=29.04/31.56 FishNet-150 ArcFace, Hard (512d) Pub/Priv=29.17/31.26 ResNet-101 ArcFace, Hard (512d) Pub/Priv=28.57/31.07 SE-ResNeXt-101 ArcFace, Hard (512d) Pub/Priv=29.60/31.52 SE-ResNeXt-101 ArcFace, Hard (512d) Pub/Priv=29.42/31.80 Pub/Priv: Public/PrivateLB score L2N: L2-Normalization