(Recognition) Large-scale Landmark Retrieval/Recognition
under a Noisy and Diverse Dataset

4742812a011db89b01a52af6722640b8?s=47 @smly
June 16, 2019

(Recognition) Large-scale Landmark Retrieval/Recognition
under a Noisy and Diverse Dataset

4742812a011db89b01a52af6722640b8?s=128

@smly

June 16, 2019
Tweet

Transcript

  1. Team smlyaka: Kohei Ozaki * (Recruit Technologies) Shuhei Yokoo *

    (University of Tsukuba) 
 * Equal contribution. Large-scale Landmark Retrieval/Recognition
 under a Noisy and Diverse Dataset (arXiv:1906.04087) (3rd place solution, recognition)
  2. RANSAC+DELF (pre-trained, v2)
 PrivLB=0.3373 Single Ensemble +Inlier-count term Post-processing Best

    single (512d)
 + soft-voting
 PrivLB=0.2079 Ensemble 6 models
 (dim=3072)
 PrivLB=0.3513 Remove TopFreq (>=30)
 PrivLB=0.3630 (※ Note: Our dataset cleaning method for image representation learning also plays a quite important role in our solution. We will present it on the retrieval session soon later.) We presents (⭐1) “soft-voting” step and (⭐2) “post-processing” step. Solution summary Our solution is a combination of basic techniques: Soft-voting, inlier-count method, ensemble and post-processing. ⭐ ⭐
  3. Soft-voting with spatial verification Our recognition method is based on

    accumulating top-K nearest neighbors in the train set. 0.85 1.00 0.75 1.00 0.60 0.50 Query Euclidean
 search TOP k (k=3) nearest neighbors in the train set Similarity term Inlier-count term The New Town Hall
 in Hanover Hamburg City Hall Similarity term Inlier-count term Confidence scoring: a set of q's neighbors (top3) and its members are assigned to l. Inlier-count ˆ y = argmax = sl Hamburg City Hall l =
  4. Soft-voting with spatial verification Similarity term Inlier-count term Confidence scoring:

    0.85 1.00 0.75 1.00 0.60 0.50 Query Euclidean
 search TOP k (k=3) nearest neighbors in the train set Similarity term Inlier-count term a set of q's neighbors (top3) and its members are assigned to l. Inlier-count The New Town Hall
 in Hanover Hamburg City Hall Our recognition method is based on accumulating top-K nearest neighbors in the train set. ˆ y = argmax = sl Hamburg City Hall l =
  5. Post-processing for distractors We treat categories that appear more frequently

    than 30 times in the test set as non-landmark categories. landmark_id=129232, freq=91 landmark_id=179959, freq=1144 This idea is related to “stop word” in natural language processing. Using Open Images might have similar effects to remove distractors.
  6. RANSAC+DELF (pre-trained, v2)
 PrivLB=0.3373 Single Ensemble +Inlier-count term Post-processing Best

    single (512d)
 + soft-voting
 PrivLB=0.2079 Ensemble 6 models
 (dim=3072)
 PrivLB=0.3513 Remove TopFreq (>=30)
 PrivLB=0.3630 (※ Note: Our dataset cleaning method for image representation learning also plays a quite important role in our solution. We will present it on the retrieval session soon later.) Takeaways (Summary) Our solution is a combination of basic techniques: Soft-voting, inlier-count method, ensemble and post-processing. ⭐ ⭐ The inlier-count term significantly improves the GAP score.