(Recognition) Large-scale Landmark Retrieval/Recognition under a Noisy and Diverse Dataset

Team smlyaka: Kohei Ozaki * (Recruit Technologies) Shuhei Yokoo *
(University of Tsukuba)   * Equal contribution. Large-scale Landmark Retrieval/Recognition  under a Noisy and Diverse Dataset (arXiv:1906.04087) (3rd place solution, recognition)

RANSAC+DELF (pre-trained, v2)  PrivLB=0.3373 Single Ensemble +Inlier-count term Post-processing Best
single (512d)  + soft-voting  PrivLB=0.2079 Ensemble 6 models  (dim=3072)  PrivLB=0.3513 Remove TopFreq (>=30)  PrivLB=0.3630 (※ Note: Our dataset cleaning method for image representation learning also plays a quite important role in our solution. We will present it on the retrieval session soon later.) We presents (⭐1) “soft-voting” step and (⭐2) “post-processing” step. Solution summary Our solution is a combination of basic techniques: Soft-voting, inlier-count method, ensemble and post-processing. ⭐ ⭐

Soft-voting with spatial veriﬁcation Our recognition method is based on
accumulating top-K nearest neighbors in the train set. 0.85 1.00 0.75 1.00 0.60 0.50 Query Euclidean  search TOP k (k=3) nearest neighbors in the train set Similarity term Inlier-count term The New Town Hall  in Hanover Hamburg City Hall Similarity term Inlier-count term Conﬁdence scoring: a set of q's neighbors (top3) and its members are assigned to l. Inlier-count ˆ y = argmax = sl Hamburg City Hall l =

Soft-voting with spatial veriﬁcation Similarity term Inlier-count term Conﬁdence scoring:
0.85 1.00 0.75 1.00 0.60 0.50 Query Euclidean  search TOP k (k=3) nearest neighbors in the train set Similarity term Inlier-count term a set of q's neighbors (top3) and its members are assigned to l. Inlier-count The New Town Hall  in Hanover Hamburg City Hall Our recognition method is based on accumulating top-K nearest neighbors in the train set. ˆ y = argmax = sl Hamburg City Hall l =

Post-processing for distractors We treat categories that appear more frequently
than 30 times in the test set as non-landmark categories. landmark_id=129232, freq=91 landmark_id=179959, freq=1144 This idea is related to “stop word” in natural language processing. Using Open Images might have similar eﬀects to remove distractors.

RANSAC+DELF (pre-trained, v2)  PrivLB=0.3373 Single Ensemble +Inlier-count term Post-processing Best
single (512d)  + soft-voting  PrivLB=0.2079 Ensemble 6 models  (dim=3072)  PrivLB=0.3513 Remove TopFreq (>=30)  PrivLB=0.3630 (※ Note: Our dataset cleaning method for image representation learning also plays a quite important role in our solution. We will present it on the retrieval session soon later.) Takeaways (Summary) Our solution is a combination of basic techniques: Soft-voting, inlier-count method, ensemble and post-processing. ⭐ ⭐ The inlier-count term signiﬁcantly improves the GAP score.

(Recognition) Large-scale Landmark Retrieval/Re...

(Recognition) Large-scale Landmark Retrieval/Recognition under a Noisy and Diverse Dataset

@smly

More Decks by @smly

Other Decks in Research

Featured

Transcript

Team smlyaka: Kohei Ozaki * (Recruit Technologies) Shuhei Yokoo *

RANSAC+DELF (pre-trained, v2)  PrivLB=0.3373 Single Ensemble +Inlier-count term Post-processing Best

Soft-voting with spatial veriﬁcation Our recognition method is based on

Soft-voting with spatial veriﬁcation Similarity term Inlier-count term Conﬁdence scoring:

Post-processing for distractors We treat categories that appear more frequently

RANSAC+DELF (pre-trained, v2)  PrivLB=0.3373 Single Ensemble +Inlier-count term Post-processing Best

(Recognition) Large-scale Landmark Retrieval/Re...

(Recognition) Large-scale Landmark Retrieval/Recognition under a Noisy and Diverse Dataset

@smly

More Decks by @smly

Other Decks in Research

Featured

Transcript

Team smlyaka: Kohei Ozaki * (Recruit Technologies) Shuhei Yokoo *

RANSAC+DELF (pre-trained, v2) PrivLB=0.3373 Single Ensemble +Inlier-count term Post-processing Best

Soft-voting with spatial veriﬁcation Our recognition method is based on

Soft-voting with spatial veriﬁcation Similarity term Inlier-count term Conﬁdence scoring:

Post-processing for distractors We treat categories that appear more frequently

RANSAC+DELF (pre-trained, v2) PrivLB=0.3373 Single Ensemble +Inlier-count term Post-processing Best

(Recognition) Large-scale Landmark Retrieval/Recognition under a Noisy and Diverse Dataset

RANSAC+DELF (pre-trained, v2)  PrivLB=0.3373 Single Ensemble +Inlier-count term Post-processing Best

RANSAC+DELF (pre-trained, v2)  PrivLB=0.3373 Single Ensemble +Inlier-count term Post-processing Best