Problems of Neural Networks and its solutions

Slide 1

Slide 1 text

1 / 18 Neural Networks

Slide 2

Slide 2 text

2 / 18 1. NN ! • Residual Network • Batch Normalization 2. 1. • •

Slide 3

Slide 3 text

3 / 18 Plain NNs(&) ' pros #% " (ex. CNN, RNN, ...) cons ! $ $

Slide 4

Slide 4 text

4 / 18 RNN RNN [1] P. Razvan et al ,"On the difficulty of training recurrent neural networks." International Conference on Machine Learning. 2013. !"#$ !" %"&$ %"#$ %" %"&$ '() '() '() '*+, '*+, -!"# = /(!!"# ) -! -!$# %! : input !! : hidden state '%&' : '() : input / !" = '*+, 2 !"#$ + '() %"

Slide 5

Slide 5 text

5 / 18 !" !# !$ %" %# %$ &'( &'( &'( &)*+ &)*+ ,! = .(!! ) ," ,# RNN 3 1, 12 = 1," 12 + 1,# 12 + 1,$ 12 1,$ 12 = 4 "565$ 1,$ 1!$ 7 1!$ 1!6 7 18!6 12 1!$ 1!" = 1!$ 1!# 7 1!# 1!" = &)*+ 9 :;<= >? !# 7 &)*+ 9 :;<= >? !" @A!B @C : !" ~!6E" fix !6

Slide 6

Slide 6 text

6 / 18 RNN Vanishing/Exploding Gradient : !"#$ !%& '( )( … … )* '* ………… ………… +( +* !"#$ (-) !%& (-) '% …… '/ )/ +/

Slide 7

Slide 7 text

7 / 18 ,$+/' !"#$ !- !"#$ 2 % × '()* + ×%,- → # !"#$ !"#$ . 2 % × '()*(+).,-×%,- 1%input or 1)* Loss( RNN ."0& Vanishing/Exploding Gradient

Slide 8

Slide 8 text

8 / 18 +$ DeepNN( ! + " )*&!/#% ' (→ ! Loss func ! Loss func → Residual Connection, Batch No malization

Slide 9

Slide 9 text

9 / 18 0), : Residual Connection –-– F(x) "/#2 →"/ F(x) + x → (4 '$"/ Identity Mapping +%*1&: 3 . ! 3 Identity – [1] He, Kaiming, et al. "Identity mappings in deep residual networks." European Conference on Computer Vision. Springer, Cham, 2016.

Slide 10

Slide 10 text

10 / 18 : Residual Connection –– ' Forward $#& Backward !$"& Deep % & input

Slide 11

Slide 11 text

11 / 18 Residual Connection –– https://icml.cc/2016/tutorials/icml2016_tutorial_deep_residual_networks_kaiminghe.pdf

Slide 12

Slide 12 text

12 / 18 ResNet Batch Normalization ResNet Residual Block • ImplementationBatch Normalization NN ! $# • Batch Normalization" ## http://torch.ch/blog/2016/02/04/resnets.html Plain

Slide 13

Slide 13 text

13 / 18 ( ) 1 2 () n … Batch Normalization –Revisit Gaussian-

Slide 14

Slide 14 text

14 / 18 Batch Normalization -Input Data distribution - (Convergence) !! Input NN → input

Slide 15

Slide 15 text

15 / 18 Batch Normalization -distribution - !"#$% & ' = ) & ' ← ' − , - ~/(,, -2) input

Slide 16

Slide 16 text

16 / 18 Batch Normalization Data distribution ● =(!, ")fix ● Batch Normalization Batch Normalization

Slide 17

Slide 17 text

17 / 18 Batch Normalization – [2]Ioffe, Sergey, and Christian Szegedy. "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift." (2015). !, #!%$( → normalize scaling '"&# nomalize

Slide 18

Slide 18 text

18 / 18 DeepNN+ ! / &-"#.#)%/'( *$ +!→ , Identity – normalize scaling implement Deep Net