Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Moodstocks - Spatial Transformer Networks

albanD
September 30, 2015

Moodstocks - Spatial Transformer Networks

An overview of the Spatial Transformer Networks paper by DeepMind with a use case on traffic sign recognition for the Paris Deep Learning Meetup #4.
For the animations in slide 17 and 18 please see the blogpost in the first link below.

* http://torch.ch/blog/2015/09/07/spatial_transformers.html
* https://github.com/Moodstocks/gtsrb.torch
* https://moodstocks.com/

albanD

September 30, 2015
Tweet

Other Decks in Technology

Transcript

  1. Spatial Transformer Networks [1] Spatial Transformer Networks, Max Jaderberg, Karen

    Simonyan, Andrew Zisserman, Koray Kavukcuoglu http://arxiv.org/abs/1506.02025v1 [1]
  2. • Improve geometric invariance of CNNs • Localize object of

    interest • End to end learning • No extra supervision • Same loss Benefits
  3. Classic pipeline Spatial Transformer pipeline CNN Softmax Prediction CNN Softmax

    Prediction ST input image input image transformed image
  4. Grid generator: compute the transformed coordinates coordinates in input plane

    coordinates in output plane For each point k in the output plane
  5. Dataset • 43 classes • 39,209 training images • 12,630

    test images • rescaled to 48x48 images [1] The German Traffic Sign Recognition Benchmark http://benchmark.ini.rub.de/?section=gtsrb [1]
  6. State of the art 99.46% • Data augmentation • Jittering

    at training time • Averaging on 25 models • 90M weights [1]Multi-Column Deep Neural Network for Traffic Sign Classification, Dan Cire¸san, Ueli Meier, Jonathan Masci and Jurgen Schmidhuber. http://people.idsia.ch/~juergen/nn2012traffic.pdf [1]
  7. • No data augmentation • No jittering • Single Network

    + 2 ST • 20M weights Spatial Transformer 99.61%
  8. At training time • Localizes the sign • Zooms in

    on the sign • Removes the background fraction of first epoch Input Output
  9. • Finds the interesting object • Removes geometric noise •

    Provides better input for next layers At query time Input Output sequence of video frames
  10. Demo project: gtsrb.torch • Modules: • Data loader • Network

    builder • Trainer for classification • Tools: • ST visualization tool • Benchmarking utility
  11. ST becoming de-facto layer? MNIST classification SVHN classification CUB-200 bird

    classification MNIST addition MNIST co-localization GTSRB classification shoe fine grained recognition