Moodstocks - Spatial Transformer Networks

97fb8df6c05c4bb4b1095a48c69509db?s=47 albanD
September 30, 2015

Moodstocks - Spatial Transformer Networks

An overview of the Spatial Transformer Networks paper by DeepMind with a use case on traffic sign recognition for the Paris Deep Learning Meetup #4.
For the animations in slide 17 and 18 please see the blogpost in the first link below.




September 30, 2015


  1. The power of Spatial Transformer Networks Cédric @deltheil Deep Learning

    Meetup #4 Moodstocks Alban @albanD
  2. Spatial Transformer Networks [1] Spatial Transformer Networks, Max Jaderberg, Karen

    Simonyan, Andrew Zisserman, Koray Kavukcuoglu [1]
  3. • Improve geometric invariance of CNNs • Localize object of

    interest • End to end learning • No extra supervision • Same loss Benefits
  4. Classic pipeline Spatial Transformer pipeline CNN Softmax Prediction CNN Softmax

    Prediction ST input image input image transformed image
  5. Inside the ST layer input image transformed image Locnet Grid

    Generator Sampler
  6. Inside the ST layer input image transformed image Locnet Grid

    Generator Sampler
  7. Localization network: predict the transformation parameters input image CNN Regression

    Layer transformation parameters
  8. Inside the ST layer input image transformed image Locnet Grid

    Generator Sampler
  9. Grid generator: compute the transformed coordinates coordinates in input plane

    coordinates in output plane For each point k in the output plane
  10. Inside the ST layer input image transformed image Locnet Grid

    Generator Sampler
  11. Sampler: create the transformed image The output is a weighted

    sum of the 4 closest input points
  12. Use case: traffic sign recognition

  13. Dataset • 43 classes • 39,209 training images • 12,630

    test images • rescaled to 48x48 images [1] The German Traffic Sign Recognition Benchmark [1]
  14. State of the art 99.46% • Data augmentation • Jittering

    at training time • Averaging on 25 models • 90M weights [1]Multi-Column Deep Neural Network for Traffic Sign Classification, Dan Cire¸san, Ueli Meier, Jonathan Masci and Jurgen Schmidhuber. [1]
  15. • No data augmentation • No jittering • Single Network

    + 2 ST • 20M weights Spatial Transformer 99.61%
  16. Interpretation

  17. At training time • Localizes the sign • Zooms in

    on the sign • Removes the background fraction of first epoch Input Output
  18. • Finds the interesting object • Removes geometric noise •

    Provides better input for next layers At query time Input Output sequence of video frames
  19. ST with Torch

  20. Vanilla training

  21. Vanilla training with ST

  22. SpatialTransformerLayer Transpose ConcatTable Sampler Locnet Transfo Restriction Grid Generator Transpose

  23. Demo project: gtsrb.torch • Modules: • Data loader • Network

    builder • Trainer for classification • Tools: • ST visualization tool • Benchmarking utility
  24. Useful links reddit: Questions? Comments? alban AT

    moodstocks DOT com | @albanD
  25. Extra materials

  26. Results comparison FC FC conv ST conv conv ST

  27. Transformation restriction Rotation Scale Translation Input Output Gradient Forward Backward

  28. ST becoming de-facto layer? MNIST classification SVHN classification CUB-200 bird

    classification MNIST addition MNIST co-localization GTSRB classification shoe fine grained recognition