The evolution of CNN - Speaker Deck

Slide 1

Slide 1 text

The evolution of CNN Yohei KIKUTA  [email protected] https://www.linkedin.com/in/yohei-kikuta-983b29117/ 20170629   Talk about all things Machine Learning

Slide 2

Slide 2 text

Deep Learning & CNN # of papers 2010 2011 2012 2013 2014 2015 2016 2017 1 1 0 13 74 293 653 476 6 7 22 89 188 651 1,304 1,147 deep learning CNN source: https://arxiv.org/ on 20170626 arXiv papers including “deep learning” (“CNN”) in titles or abstracts in Computer Science 2/17

Slide 3

Slide 3 text

The evolution of CNN *This is a very limited chronology Neocognitoron: http://www.cs.princeton.edu/courses/archive/spr08/cos598B/Readings/Fukushima1980.pdf LeNet-5: http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf AlexNet: https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks Network in Network: https://arxiv.org/abs/1312.4400 VGG: https://arxiv.org/abs/1409.1556 Inception(V3): https://arxiv.org/abs/1512.00567 ResNet: https://arxiv.org/abs/1512.03385 SqueezeNet: https://arxiv.org/abs/1602.07360 ENet: https://arxiv.org/abs/1606.02147 Deep Complex Networks: https://arxiv.org/abs/1705.09792 1980 1998 2012 2013 2014 2015 2015 2016 2016 2017 3/17

Slide 4

Slide 4 text

Neocognitron source: www.cs.princeton.edu/courses/archive/spr08/cos598B/Readings/Fukushima1980.pdf • Prototype of CNN • Hierarchical structure • S-cells (convolution)  feature extraction • C-cells (avg. pooling)  robustness to positional deviation • Self-organizing like training • NOT back propagation 4/17

Slide 5

Slide 5 text

LeNet-5 source: yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf • Non-linearity sigmoid, tanh 2 8 -2 3 1 7 2 1 -1 2 9 2 -2 0 3 3 4.5 2 -0.5 4.25 • Convolution feature extraction • Subsampling (avg. pooling) positional invariance, size reduction 5/17

Slide 6

Slide 6 text

AlexNet source: https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks • ReLU keep gradient alive • Max pooling better than average • Dropout generalization • GPU computation accelerate computation 2 8 -2 3 1 7 2 1 -1 2 9 2 -2 0 3 3 8 3 2 9 6/17

Slide 7

Slide 7 text

Network In Network source: https://arxiv.org/abs/1312.4400 • MLP after convolution efficient non-linear combinations  of feature maps • Global Average Pooling one feature map for each class  no Fully Connected layers • Small model size 29 [MB] for ImageNet MLP Softmax 7/17

Slide 8

Slide 8 text

VGG source: https://arxiv.org/abs/1409.1556 • Deep Model with basic building blocks • convolution • max pooling • activation (ReLU) • Fully Connected • softmax • Sequence of small convolutions • 3*3 spatial convolutions • Relatively large parameters • large channels at the early stages • many Fully Connected layers 8/17

Slide 9

Slide 9 text

InceptionV3 source: https://arxiv.org/abs/1512.00567 • Inception module • parallel operations and concatenation  capture different features efficiently • mainly 3*3 convolution  coming from the VGG architecture • 1*1 convolution  reduce the number of channels • Global Average Pooling and Fully Connected • balance accuracy and model size • Good performance! 9/17

Slide 10

Slide 10 text

ResNet source: https://arxiv.org/abs/1512.03385 • Residual structure • shortcut (by-pass) connection  keep gradient alive • 1*1 convolution  reduce the number of channels • Very deep model • total 152 layers • more than 1000 layers 10/17

Slide 11

Slide 11 text

SqeezeNet source: https://arxiv.org/abs/1602.07360 11/17 55*55*96 55*55*16 55*55*64 55*55*128 55*55*64 • Fire module squeeze channels to reduce   computational costs • Deep compression lighten model size  sparse weight, weight quantization, Huffman coding • Small model for 6 bit data, the model size is 0.47 [MB] ! 1*1 squeeze 1*1 expand 3*3 expand

Slide 12

Slide 12 text

ENet source: https://arxiv.org/abs/1606.02147 • Realtime segmentation model • downsampling at the early stages • asymmetric encoder-decoder structure • PReLU • small model ~ 1[MB] • Encoder can be used as CNN • Global Max Pooling encoder decoder input 3 × 512 × 512 12/17

Slide 13

Slide 13 text

Deep Complex Networks 13/17 source: https://arxiv.org/abs/1705.09792 • Complex structure • convolution    • batch normalization • Advantages of Complex value • biological & signal processing aspects  can express firing rate & relative timing  detailed description of objects • parameter efficient   2^(depth) efficient than real value

Slide 14

Slide 14 text

Comparison by acc. vs. G-Ops. 14/17 source: https://arxiv.org/pdf/1605.07678.pdf

Slide 15

Slide 15 text

Comparison by acc. / M-Params 15/17 source: https://medium.com/towards-data-science/neural-network-architectures-156e5bad51ba

Slide 16

Slide 16 text

References 16/17

Slide 17

Slide 17 text

[Review Papers] • On the Origin of Deep Learning:   https://arxiv.org/abs/1702.07800 • Recent Advances in Convolutional Neural Networks:   https://arxiv.org/abs/1512.07108 • Understanding Convolutional Neural Networks:   http://davidstutz.de/wordpress/wp-content/uploads/2014/07/seminar.pdf • AN ANALYSIS OF DEEP NEURAL NETWORK MODELS FOR PRACTICAL APPLICATIONS  https://arxiv.org/abs/1605.07678 [Slides & Web pages] • Recent Progress on CNNs for Object Detection & Image Compression  https://berkeley-deep-learning.github.io/cs294-131-s17/slides/sukthankar-UC-Berkeley-InvitedTalk-2017-02.pdf • CS231n: Convolutional Neural Networks for Visual Recognition:   http://cs231n.github.io/convolutional-networks/ [Blog posts] • Training ENet on ImageNet:   https://culurciello.github.io/tech/2016/06/20/training-enet.html • Neural Network Architectures:  https://medium.com/towards-data-science/neural-network-architectures-156e5bad51ba 17/17