Moodstocks - Spatial Transformer Networks

The power of Spatial Transformer Networks Cédric @deltheil Deep Learning
Meetup #4 Moodstocks Alban @albanD

Spatial Transformer Networks [1] Spatial Transformer Networks, Max Jaderberg, Karen
Simonyan, Andrew Zisserman, Koray Kavukcuoglu http://arxiv.org/abs/1506.02025v1 [1]

• Improve geometric invariance of CNNs • Localize object of
interest • End to end learning • No extra supervision • Same loss Benefits

Classic pipeline Spatial Transformer pipeline CNN Softmax Prediction CNN Softmax
Prediction ST input image input image transformed image

Inside the ST layer input image transformed image Locnet Grid
Generator Sampler

Localization network: predict the transformation parameters input image CNN Regression
Layer transformation parameters

Generator Sampler

Grid generator: compute the transformed coordinates coordinates in input plane
coordinates in output plane For each point k in the output plane

Generator Sampler

Sampler: create the transformed image The output is a weighted
sum of the 4 closest input points

Use case: traffic sign recognition

Dataset • 43 classes • 39,209 training images • 12,630
test images • rescaled to 48x48 images [1] The German Traffic Sign Recognition Benchmark http://benchmark.ini.rub.de/?section=gtsrb [1]

State of the art 99.46% • Data augmentation • Jittering
at training time • Averaging on 25 models • 90M weights [1]Multi-Column Deep Neural Network for Traffic Sign Classification, Dan Cire¸san, Ueli Meier, Jonathan Masci and Jurgen Schmidhuber. http://people.idsia.ch/~juergen/nn2012traffic.pdf [1]

• No data augmentation • No jittering • Single Network
+ 2 ST • 20M weights Spatial Transformer 99.61%

Interpretation

At training time • Localizes the sign • Zooms in
on the sign • Removes the background fraction of ﬁrst epoch Input Output

• Finds the interesting object • Removes geometric noise •
Provides better input for next layers At query time Input Output sequence of video frames

ST with Torch

Vanilla training

Vanilla training with ST

SpatialTransformerLayer Transpose ConcatTable Sampler Locnet Transfo Restriction Grid Generator Transpose

Demo project: gtsrb.torch • Modules: • Data loader • Network
builder • Trainer for classification • Tools: • ST visualization tool • Benchmarking utility

Useful links github.com/Moodstocks/gtsrb.torch torch.ch/blog/2015/09/07/spatial_transformers reddit: bit.ly/1GgOZgI Questions? Comments? alban AT
moodstocks DOT com | @albanD

Extra materials

Results comparison FC FC conv ST conv conv ST

Transformation restriction Rotation Scale Translation Input Output Gradient Forward Backward

ST becoming de-facto layer? MNIST classification SVHN classification CUB-200 bird
classification MNIST addition MNIST co-localization GTSRB classification shoe fine grained recognition

Moodstocks - Spatial Transformer Networks

Moodstocks - Spatial Transformer Networks

albanD

Other Decks in Technology

Featured

Transcript

The power of Spatial Transformer Networks Cédric @deltheil Deep Learning

Spatial Transformer Networks [1] Spatial Transformer Networks, Max Jaderberg, Karen

• Improve geometric invariance of CNNs • Localize object of

Classic pipeline Spatial Transformer pipeline CNN Softmax Prediction CNN Softmax

Inside the ST layer input image transformed image Locnet Grid

Inside the ST layer input image transformed image Locnet Grid

Localization network: predict the transformation parameters input image CNN Regression

Inside the ST layer input image transformed image Locnet Grid

Grid generator: compute the transformed coordinates coordinates in input plane

Inside the ST layer input image transformed image Locnet Grid

Sampler: create the transformed image The output is a weighted

Use case: traffic sign recognition

Dataset • 43 classes • 39,209 training images • 12,630

State of the art 99.46% • Data augmentation • Jittering

• No data augmentation • No jittering • Single Network

Interpretation

At training time • Localizes the sign • Zooms in

• Finds the interesting object • Removes geometric noise •

ST with Torch

Vanilla training

Vanilla training with ST

SpatialTransformerLayer Transpose ConcatTable Sampler Locnet Transfo Restriction Grid Generator Transpose

Demo project: gtsrb.torch • Modules: • Data loader • Network

Useful links github.com/Moodstocks/gtsrb.torch torch.ch/blog/2015/09/07/spatial_transformers reddit: bit.ly/1GgOZgI Questions? Comments? alban AT

Extra materials

Results comparison FC FC conv ST conv conv ST

Transformation restriction Rotation Scale Translation Input Output Gradient Forward Backward

ST becoming de-facto layer? MNIST classification SVHN classification CUB-200 bird