handout of RNN camp #2

RNN camp #2 浅川伸一 Shin Asakawa <[email protected]>

謝辞 KUNO 佐藤傑様 C8 lab 新村拓也様 Google 佐藤一憲様

本日のメニュー • 前回の復習（BPTT) • LSTM • GRU • BiRNN •
黒魔法解題 • まとめと次回の展望

ご注意 • TensorFlow 0.10 から RNN パッケージは tf.models.rnn から tf.nn
に移動しました。tf.nn.rnn_cell 以下にあるファイルを使ってください。 • TensorFlow 0.10 moved the recurrent network operations from tf.models.rnn into the tf.nn package where they live along the other neural network operations now. Cells can now be found in tf.nn.rnn_cell.

参考資料 – 「リカレントニューラルネットワーク」 – 「リカレントニューラルネットワークによる文法学習」 – 人工知能事典，共立出版（印刷中）

（やや古い）リンク集 • Olah さんのブログ，2015年の一時期ほぼ世界中の人が彼女のブログで紹介されている RNN の仕組みを話題にしていた。http://colah.github.io/posts/2015-08-Understanding-LSTMs/ • WildML の
Python/Theano チュートリアルも定評がある。http://www.wildml.com/2015/09/recurrent- neural-networks-tutorial-part-1-introduction-to-rnns/ • Cho さんは Nividia の記事として投稿。GRU の発案者だし，NLP, NMT の第一人者なので説得力がある。 https://devblogs.nvidia.com/parallelforall/introduction-neural-machine-translation-with-gpus/ • Danijar Hafner さんは TensorFlow に特化したチュートリアル https://danijar.com/introduction-to- recurrent-networks-in-tensorflow/ • Jiwon Kim さんと Myungsub Choi さんが管理している GitHub 上の awesome RNN https://github.com/kjw0612/awesome-rnn本当に awesome! • Karpathy のブログ

単純再帰型ニューラルネットワークSRN Mikolov (2010) Fig. 1

単純再帰型ニューラルネットワークSRN Booden (2001) Fig. 5

• 畳み込み演算とプーリング処理によって性能が向上 LeNet5， LeCun (1998) 畳み込み演算の定義 Wikipedia よりところが実際には画素単位の演算なので積分
ではなく総和

• 畳み込み演算とプーリング処理によって性能が向上 LeNet5， LeCun (1998) 畳み込み演算の定義 Wikipedia よりところが実際には画素単位の演算なので積分
ではなく総和一方RNNの中間層の状態は

• 畳み込み演算とプーリング処理によって性能が向上 LeNet5， LeCun (1998) 畳み込み演算の定義 c Wikipedia よりところが実際には画素
単位の演算なので積分ではなく総和一方RNNの中間層の状態は似てるし...

表記と基本グラフ h y x y: 出力層ニューロン h: 中間層ニューロン x: 入力層ニューロン

h y x 再帰結合（recurrent connections)

h y x Wx Wy Wh Wy:結合係数行列(中間から出力) Wh:結合係数行列(再帰結合) Wx:結合係数行列(入力から中間）

h y x Wx+bx Wy+by Wh+bh by:バイアス（中間から出力） bh:バイアス（再帰結合） bx:バイアス（入力から中間） Bias
terms will be omitted, henceforth 以降バイアス項は省略

h0 y0 x0 h1 y1 x1 Digits subscripted indicate time
t:= 0... 下付き添字は時刻を表す。カッコで表記する流儀もある (e.g. x(t))

脱線離散時間(discrete time)かつ同期更新(synchronous updating)のみを考える https://deepmind.com/blog#decoupled-neural-interfaces-using-synthetic-gradients Sorry, but we won’t care
about Decoupled neural interfaces Decoupled Neural Interfaces using Synthetic Gradients (arXiv:1608.05343v1 [cs.LG])

なぜ脱線？

デカルトの劇場 Cartesian Theater デカルトの劇場 wikipedia より Consciousness Explained, D. Dennet(1991)
「解明される意識」 (1997) デネット著，山口（訳）ホムンクルス問題の本質的解決

脱線おわり

h0 y0 x0 h1 y1 x1 h1 y1 x1 h2
y2 x2 h3 y3 x3 h4 y4 x4 h4 y4 x4 h5 y5 x5 https://github.com/ShinAsakawa/rnncamp2

h0 y0 x0 h1 y1 x1 h1 y1 x1 h2
y2 x2 h3 y3 x3 h4 y4 x4 h4 y4 x4 h5 y5 x5 teacher error Loss(t,y) Wh Wh Wh Wh Wh

h0 y0 x0 h1 y1 x1 h1 y1 x1 h2
y2 x2 h3 y3 x3 h4 y4 x4 h4 y4 x4 h5 y5 x5 teacher error Loss(t,y) Wh Wh Wh Wh Wh teacher

h0 y0 x0 h1 y1 x1 h1 y1 x1 h2
y2 x2 h3 y3 x3 h4 y4 x4 h4 y4 x4 h5 y5 x5 teacher error Loss(t,y) Wh Wh Wh Wh Wh

完全(Full) BPTT h0 y0 x0 h1 y1 x1 h1 y1
x1 h2 y2 x2 h3 y3 x3 h4 y4 x4 h4 y4 x4 h5 y5 x5 teacher error Loss(t,y) Wh Wh Wh Wh Wh

切断(trancated) BPTT(window width=5) ht+0 yt+0 xt+0 h1 y1 x1 ht+1
yt+1 xt+1 ht+2 yt+2 xt+2 ht+3 yt+3 xt+3 h4 y4 x4 ht+4 yt+4 xt+4 ht+5 yt+5 xt+5 teacher error Loss(t,y) Wh Wh Wh Wh Wh

改良可能？ Can we improve?

ゲートの導入 introducing gates to control hidden state h t-1 y
t-1 x t-1 h1 y1 x1 h t y t x t gate

ゲートの導入 introducing gates to control hidden state h t-1 y
t-1 x t-1 h1 y1 x1 h t y t x t gate でも，なぜゲート? Why gates?

忘却ゲートの導入 ht-1 yt-1 xt-1 h1 y1 x1 ht yt xt
gate Who can control gates? 誰がどうやってゲート制御？

忘却ゲートの導入 ht-1 yt-1 xt-1 h1 y1 x1 ht yt xt
gate Who can control gates? 誰がどうやってゲート制御？ Who can tell me how can I control myself?

忘却ゲートの導入 ht yt xt h1 y1 x1 ht+1 yt+1 xt+1
gate who can control gates? 誰がどうやってゲートを制御？３つ候補

gate who can control gates? 誰がどうやってゲートを制御？３つ候補 1. ht

It’s me 忘却ゲートの導入 ht yt xt h1 y1 x1 ht+1
yt+1 xt+1 gate who can control gates? 誰がどうやってゲートを制御？３つ候補 1. ht

Me, too 忘却ゲートの導入 ht yt xt h1 y1 x1 ht+1
yt+1 xt+1 gate who can control gates? 誰がどうやってゲートを制御？３つ候補 1. ht 2. yt

I can, too 忘却ゲートの導入 ht yt xt h1 y1 x1
ht+1 yt+1 xt+1 gate who can control gates? 誰がどうやってゲートを制御？３つ候補 1. ht 2. yt 3. x t+1

gate ゲート制御 1. ht 2. yt 3. x t+1 ht+1 = ht s(x) • s(x) = (1+e-x)-1 • x = Wf (yt + ht + xt+1)

ゲートによって長距離依存LTDを解消可能

もっと改良可能？ Can we improve more?

入力ゲートの導入 ht yt xt ht+1 yt+1 xt+1 gate gate h
t+1 = h t s(w(ht + x t+1 )) • s(x) = (1+e-x)-1 • x = yt + ht + xt+1

もっともっと可能？ You need more?

出力ゲートの導入 ht yt xt ht+1 yt+1 xt+1 gate gate gate
ht+1 = ht s(w(ht + x t+1 + y t+1 )) • s(x) = (1+e-x)-1 • x = yt + ht + xt+1

LSTM + + + + 忘却ゲート入力ゲートブロックへの入力セル +
出力ゲートピープホールブロックからの出力 g h 入力再帰入力入力再帰入力入力再帰入力入力再帰入力出力再帰入力へ 1.0 g c i f y o • 入力ゲート i = s ( ) • 忘却ゲート f = s ( ) • 出力ゲート o = s ( ) • 入力全体 g = f ( ) • セル c = f @ c + i @ g • 出力 y = o @ f ( ) https://github.com/ShinAsakawa/rnncamp2

LSTM + + + + 忘却ゲート入力ゲートブロックへの入力セル +
出力ゲートピープホールブロックからの出力 g h 入力再帰入力入力再帰入力入力再帰入力入力再帰入力出力再帰入力へ 1.0 g c i f y o

LSTMの生理学的対応物 http://kybele.psych.cornell.edu/~edelman/Psych-2140/week-2-2.html

How does LSTM work? 1. LSTM replaces logistic or tanh
hidden units with “memory cells” that can store an analog value. 2. Each memory cell has its own input and output gates that control. 3. There is a forget gate which the analog value stored in the memory cell decays. 4. For periods when the input and output gates are off and the forget gate is not causing decay, a memory cell simply holds its value over time. Le, Jaitly, & Hinton (2015)

別モデル GRU（An alternative of the LSTM) h ~ h x
y r: reset gate input output uupdate gate u t = s (W u + u u ) h t = f (W h + u h (u t @ ) r t = s (W r + u r h t-1 ) tilde(h) = (1- r t ) h t + r t tilde(h t-1 ) y t = W y tilde(h t )

別モデル GRU（An alternative of the LSTM) h ~ h x
y r: reset gate input output uupdate gate ut = σ (Wu xt + Uu ht−1 ) . ht = ϕ(Wxt + Uh (ut ⊙ht−1 )) , rt = σ (Wr xt + Ur ht−1 ) , ˜ ht = ( 1 − rt) ht + rt ˜ ht−1 , yt = Wy ˜ ht

双方向RNN (Bidirectional RNN) 前行ステート逆行ステート yt-1 xt-1 yt
xt yt+1 xt+1

グレーブス (Graves, 2013)の生成 LSTM 出力中間層入力

深層 LSTM （Depth Gated LSTM） ht− 1 ( a ) 直前
( b ) 生成 ( c ) 再帰 ( d ) 推論 ( e ) 全関与 ht zt xt ht− 1 ht zt xt ht− 1 ht zt xt ht− 1 ht zt xt ht− 1 ht zt xt 図 4.31 種々の LSTM 変種

Pascanu (2014) より y( t ) h( t ) h(
t − 1) x ( t ) y( t ) h( t ) h( t − 1) x ( t ) y( t ) h( t ) h( t − 1) x ( t ) ( a ) ( b ) ( c ) y( t ) h( t ) h( t − 1) x ( t ) y( t ) h( t − 1) x ( t ) z( t ) z( t ) h( t ) ( d ) ( e ) 図 4.27 パスカヌらの文献 108) の図 2 を改変

Pascanu (2014) よりするセルどうしをつないで 2 次元格子状，3 次元格子状に配列することを提案している。図 4.33
はコウトニック（J.Koutonik）の時計状 LSTM である116) 。 I * xi m h m ′1 h′1 m ′2 h′2 m 1 h1 m 2 h2 m ′ h′ h′ 2 次元格子状 LSTM ブロック標準の LSTM ブロック 1 次元格子状 LSTM ブロック 3 次元格子状 LSTM ブロック図 4.32 格子状 LSTM 出力層

Pascanu (2014) より出力層入力層隠れ層 T1 T2 Tg 図
4.33 時計状 LSTM

handout of RNN camp #2

handout of RNN camp #2

More Decks by Shin Asakawa

Other Decks in Science

Featured

Transcript