Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Image matching presentation during cubonacci meetup

Marijn
February 27, 2020

Image matching presentation during cubonacci meetup

presented during: https://www.meetup.com/Applied-Machine-Learning-by-Cubonacci/events/267873547/

Marijn Lems (iam.io) will take you through the process of training the deep learning model that they use for image matching.

We discuss some fun and interesting challenges during data collection, hand labeling, cleaning, preparation, modelling and operationalization. Then we will dig deeper in what type of CNN architectures for image matching proved to work and how we optimized hyper parameters. We will look into deep representation learning and cover sampling procedures that boosted image matching performance.

ImageLink (a product of iam.io) replaces QR codes using image matching technology. Instead of scanning a QR code, you scan images. A new and exciting way of activating your (printed) assets.

Marijn

February 27, 2020
Tweet

Other Decks in Research

Transcript

  1. IAM • E-commerce startup founded in 2017 • I joined

    in 2019 • Online platform that offers micro-retailers their own shopping venue • Launch April 2020
  2. • Create an app that can detect predefined planar objects

    like paintings, images, billboards, logos • QR-code scanner experience • Publisher uploads content, end-user scans it like a QR code • ImageLink Focus
  3. Make labeling easier • Object detection algorithm to detect the

    planar object(s) in this query VOC data format
  4. Examine what we got Query dimensions Planar dimensions references 20K

    queries ~5K Domains Artworks, Magazines, Billboards
  5. Data labeling • After eyeballing the data it seemed trivial

    to associate queries and references Screenshot of query (topleft) and 5 possible references
  6. Simple model • Mean RGB pixel difference WHAT? Only 9

    matches Mean pixel value Top-K distance 4 10 11 55 221
  7. Experimentation so far • Local feature descriptors (ORB) • Pre-processing

    • Starting with a simple model • Augmentation • Transformer networks • Fine tuning a pretrained network • Multimodal networks • Text • MPEG7 feature descriptors MPEG7 descriptors Augmentations ORB with RANSAC
  8. Siamese neural network Triplet loss = = = = ℎℎ

    threshold that determines preferred distance between images n-dimensional embeddings, one for each input query positive negative
  9. Embedding space Where does my embedding go? Embedding space •

    Embedding space should not have too many dimensions
  10. Baseline experiment • Pretrained VGG16 on ImageNet • Baseline performance

    on validation set picture from imagenet dataset .75 for free https://arxiv.org/pdf/1409.1556.pdf vgg16
  11. Finetuning my baseline • Freeze the first n blocks •

    Tune the weights of the last n layers • epoch@100, best hyperparams, 4 fold Here we jump 23% to .74 vgg16 Freeze these layers train these layers
  12. Triplet sampling • How you select your triplets affects performance

    and convergence • 5k queries, 20k references makes around 4e+16 possible triplets • Uniform anchor positive negative
  13. Challenges with batch construction • As it turns out, it’s

    easy to sample easy negatives • Its expensive to sample “useful” negatives because you need pairwise similarities https://arxiv.org/abs/1706.07567 https://omoindrot.github.io/triplet-loss A P
  14. Sampling experiment • Pretrained VGG • Randomly sampled hyper parameters

    • {batchsize} • {margin} • {trainable_blocks} • … • Evaluate at epoch 100 • 30 trials Recall @1
  15. Visually our negatives weren’t so hard • Sorry no example

    • Collect a sample of difficult negatives from Azure Similar image search • Hand labeling https://omoindrot.github.io/triplet-loss w reference negative reference negative .78
  16. • Uniform vs semi-hard triplet mining • 60sec vs 6min

    per epoch • Trade-off between • Computation time and memory footprint • Quality of the solution • Online sampling [during batch construction] • Per (query,positive) sample n references • Calculate tripletloss • Keep semi-hard references TODO: Online triplet mining
  17. FAILED: Augmentation • Stop collecting new data • Train on

    augmented references as queries • Validate on real queries https://imgaug.readthedocs.io 1. reference 2. Random Perspective transformed TRAINING INPUT 4. Biggest crop from 2. 3. Inverse Homography Recall @1
  18. How we use it content query Index embedding or query

    storage FAISS Never seen query and content Its also a matter of choice One-shot vs zero-shot publisher End-user Content embedding storage serving