Upgrade to Pro — share decks privately, control downloads, hide ads and more …

(Retrieval) Large-scale Landmark Retrieval/Recognition
under a Noisy and Diverse Dataset

@smly
June 16, 2019

(Retrieval) Large-scale Landmark Retrieval/Recognition
under a Noisy and Diverse Dataset

@smly

June 16, 2019
Tweet

More Decks by @smly

Other Decks in Research

Transcript

  1. Team smlyaka:
    Kohei Ozaki * (Recruit Technologies)
    Shuhei Yokoo * (University of Tsukuba)

    * Equal contribution.
    Large-scale Landmark Retrieval/Recognition

    under a Noisy and Diverse Dataset (arXiv:1906.04087)
    (1st place solution, retrieval)

    View Slide

  2. Final Results
    Our pipeline is based on the standard method.

    CNN-based global descriptor + Euclidean search + Re-ranking
    Single Model (0.318)
    + Ensemble (0.330)
    + Re-ranking

    View Slide

  3. Two important things
    to improve landmark retrieval in 2019
    1. Cosine-based softmax loss with a “cleaned subset”
    Related topic: (Arandjelović & Zisserman, CVPR’10)

    “Three important things to improve image retrieval”
    Related topic: (Wang+, ECCV’18)

    “The Devil of Face Recognition is in the Noise”
    2. Rediscover the idea of “Discriminative QE” technique

    View Slide

  4. Two important things
    to improve landmark retrieval in 2019
    1. Cosine-based softmax loss with a “cleaned subset”
    Related topic: (Arandjelović & Zisserman, CVPR’10)

    “Three important things to improve image retrieval”
    Related topic: (Wang+, ECCV’18)

    “The Devil of Face Recognition is in the Noise”
    2. Rediscover the idea of “Discriminative QE” technique

    View Slide

  5. Cleaning the Google-Landmarks-v2
    landmark_id=140690
    The Google-Landmarks-v2 is a quite noisy and diverse dataset.
    Metric learning methods are usually sensitive to noise,

    and it is essential to clean the dataset before applying it.

    View Slide

  6. Cleaning the Google-Landmarks-v2
    landmark_id=140690
    ✗ ✗ ✗ ✗

    To address the noise issue, we developed an automated data cleaning
    system, and apply it to the Google-Landmarks-v2.

    View Slide

  7. 2. Select up to the 100-NN assigned to the same label to .
    xi
    Automated Data Cleaning
    With local feature matching & spatial verification (inlier-count).
    For each

    train image ,
    1. kNN (k=1000) from the train set.

    (image representation is learned from the Google-Landmarks-v1)
    3. Spatial Verification (\w DELFv2) is performed on up to the 100-NN
    xi
    ✔ 4. Add into our clean train set

    when the count of verified images is greater than the threshold (=2).
    xi

    View Slide

  8. Automated Data Cleaning
    With local feature matching & spatial verification (inlier-count).
    For each

    train image ,
    1. kNN (k=1000) from the train set.

    (image representation is learned from the Google-Landmarks-v1)
    3. Spatial Verification (\w DELFv2) is performed on up to the 100-NN
    xi
    2. Select up to the 100-NN assigned to the same label to .
    ✔ 4. Add into our clean train set

    when the count of verified images is greater than the threshold (=2).
    xi
    xi

    View Slide

  9. Automated Data Cleaning
    With local feature matching & spatial verification (inlier-count).
    For each

    train image ,
    1. kNN (k=1000) from the train set.

    (image representation is learned from the Google-Landmarks-v1)
    3. Spatial Verification (w/ DELFv2) is performed on up to the 100-NN
    xi
    ✔ 4. Add into our clean train set

    when the count of verified images is greater than the threshold (=2).
    xi
    2. Select up to the 100-NN assigned to the same label to .
    xi

    View Slide

  10. Automated Data Cleaning
    With local feature matching & spatial verification (inlier-count).
    For each

    train image ,
    1. kNN (k=1000) from the train set.

    (image representation is learned from the Google-Landmarks-v1)
    3. Spatial Verification (w/ DELFv2) is performed on up to the 100-NN
    xi
    ✔ 4. Add into our clean train set

    when the count of verified images is greater than the threshold (=2).
    xi
    2. Select up to the 100-NN assigned to the same label to .
    xi

    View Slide

  11. Two important things
    to improve landmark retrieval in 2019
    1. Cosine-based softmax loss with a “cleaned subset”
    Related topic: (Arandjelović & Zisserman, CVPR’10)

    “Three important things to improve image retrieval”
    Related topic: (Wang+, ECCV’18)

    “The Devil of Face Recognition is in the Noise”
    2. Rediscover the idea of “Discriminative QE” technique

    View Slide

  12. Discriminative Reranking
    Predict a landmark_id of each sample
    from the test set and index set.
    Recognition
    Pipeline
    id=1
    id=2
    predict
    id=1
    predict
    predict
    test
    index
    ・・・
    ・・・
    ・・・

    View Slide

  13. Discriminative Reranking
    Predict a landmark_id of each sample
    from the test set and index set.
    Recognition
    Pipeline
    id=1
    id=2
    predict
    id=1
    predict
    predict
    test
    index
    ・・・
    ・・・
    ・・・

    View Slide

  14. Discriminative Reranking
    Append positive samples from the
    entire index set, which are not
    retrieved by the similarity search.
    Positive, which are not retrieved
    Query
    Positive samples are moved to
    the left of the negative samples
    in the ranking.
    Query
    Query
    Positive
    Positive
    Positive Positive
    Positive Positive
    Negative
    Negative

    View Slide

  15. Discriminative Reranking
    Append positive samples from the
    entire index set, which are not
    retrieved by the similarity search.
    Positive, which are not retrieved
    Query
    Positive samples are moved to
    the left of the negative samples
    in the ranking.
    Query
    Query
    Positive
    Positive
    Positive Positive
    Positive Positive
    Negative
    Negative

    View Slide

  16. Discriminative Reranking
    Append positive samples from the
    entire index set, which are not
    retrieved by the similarity search.
    Positive, which are not retrieved
    Query
    Positive samples are moved to
    the left of the negative samples
    in the ranking.
    Query
    Query
    Positive
    Positive
    Positive Positive
    Positive Positive
    Negative
    Negative
    1

    View Slide

  17. Key takeaways:
    Two important things
    to improve landmark retrieval in 2019
    1. Cosine-based softmax loss with a “cleaned subset”
    Related topic: (Arandjelović & Zisserman, CVPR’10)

    “Three important things to improve image retrieval”
    Related topic: (Wang+, ECCV’18)

    “The Devil of Face Recognition is in the Noise”
    2. Rediscover the idea of “Discriminative QE” technique

    View Slide

  18. Appendix

    View Slide

  19. Soft-voting with spatial verification
    Similarity term Inlier-count term
    Confidence scoring:
    0.85
    1.00
    0.75
    1.00
    0.60
    0.50
    Query
    Euclidean

    search
    TOP k (k=3) nearest neighbors in the train set
    Similarity term
    Inlier-count term
    a set of q's neighbors (top3) and its members are assigned to l.
    Inlier-count
    The New Town Hall

    in Hanover
    Hamburg City Hall
    Our recognition method is based on accumulating top-K
    nearest neighbors in the train set.
    ˆ
    y = argmax =
    sl Hamburg City Hall
    l
    =

    View Slide

  20. Cosine-based Softmax Loss
    • Employ ArcFace and CosFace for learning metric in our solution.

    • Successful methods in face recognition.

    • Also in landmark retrieval/recognition, we found out cosine-based
    softmax losses are very effective.

    • Hyperparameter: m=0.3 and s=30 were used in both.

    • There are many winning solutions using cosine-based softmax losses:

    • Whale Humpbuck - 1st place

    • Protain Classification - 1st place
    [1] J. Deng, J. Guo, and S. Zafeiriou. Arcface: Additive angular margin loss for deep face recognition. arXiv:1801.07698, 2018.
    [2] H. Wang, Y. Wang, Z. Zhou, X. Ji, D. Gong, J. Zhou, Z. Li, and W. Liu. Cosface: Large margin cosine loss for deep face
    recognition. In CVPR, pages 5265–5274, 2018.

    View Slide

  21. Modeling | Overview
    • Backbones:

    • FishNet-150

    • ResNet-101

    • SE-ResNeXt-101

    • Data augmentation, “soft” and “hard” strategy.

    • “Soft”: 5 epochs with random cropping and scaling.

    • “Hard”: 7 epochs with random brightness shift, random sheer
    translation, random cropping, and scaling.

    • Combine various techniques:

    • Aspect preserving of input images.

    • Cosine annealing LR scheduler.

    • GeM-pooling (generalized mean pooling).

    • Fine-tuning at full resolution on the last epoch with freezing BN.

    View Slide

  22. Modeling | Ensemble
    Pub/Priv=30.95/33.01
    Ensemble
    Concat + L2N (3072d)
    FishNet-150
    ArcFace, Soft (512d)
    Pub/Priv=28.66/30.76
    FishNet-150
    CosFace, Soft (512d)
    Pub/Priv=29.04/31.56
    FishNet-150
    ArcFace, Hard (512d)
    Pub/Priv=29.17/31.26
    ResNet-101
    ArcFace, Hard (512d)
    Pub/Priv=28.57/31.07
    SE-ResNeXt-101
    ArcFace, Hard (512d)
    Pub/Priv=29.60/31.52
    SE-ResNeXt-101
    ArcFace, Hard (512d)
    Pub/Priv=29.42/31.80
    Pub/Priv: Public/PrivateLB score

    L2N: L2-Normalization

    View Slide

  23. Appendix: Another case
    landmark_id=29

    View Slide