himkt
December 05, 2017
200

# Deep Learning Book 10その1 / deep learning book 10 vol1

Deep Learning本の10章（Sequence Modeling: Recurrent and Recursive Nets）の資料です．

## himkt

December 05, 2017

## Transcript

2. ### Recurrent neural networks (RNNs)ͱ͸ • ܥྻσʔλΛѻ͏͜ͱ͕Ͱ͖ΔχϡʔϥϧωοτϫʔΫ • ௨ৗͷχϡʔϥϧωοτϫʔΫΑΓ΋ܥྻ௕ʹ͍ͨͯ͠  ͍ͨ΁Μྑ͍۩߹ʹεέʔϧ͢Δ •

RNN͸Մม௕ͷܥྻσʔλΛѻ͏͜ͱ͕Ͱ͖Δʢʂʂʣ 
3. ### • ΋͠΋֤࣌ࠁ͝ͱʹύϥϝʔλΛ͍࣋ͬͯͨΒ • ֶशσʔλʹग़ݱ͠ͳ͍௕͞ͷܥྻΛѻ͑ͳ͍ • ҟͳΔ௕͞ͷܥྻσʔλͷ৘ใΛߟྀͰ͖ͳ͍ • ύϥϝʔλڞ༗͸Ͳ͏͍͏ͱ͖ʹ༗ޮ͔ʁ • ಛఆͷ৘ใ͕ҟͳΔҐஔʹग़ݱ͠͏Δͱ͖

• “I went to Nepal in 2009”ͱ”In 2009, I went to Nepal” RNNsͱύϥϝʔλڞ༗  s(0) s(1) s(2) s(3) s(4) s(5) s(0) s(1) s(2) s(3) s(4) s(0) s(1) s(2) s(3) s(4) s(5)

s(5)
5. ### 10.1 ܭࢉάϥϑͷల։  x h h(t 1) h(t+1) h(t) h(...

) h(... ) x(t 1) x(t) x(t+1) • ҎԼͷΑ͏ͳग़ྗΛ΋ͨͳ͍RNNΛߟ͑Δ • Unfolding: ͖ͬ͞΍ͬͨ͜ͱͷٯ • ંΓͨͨ·ΕͨηϧΛ࣌ࠁ͝ͱʹల։͢Δͱ…
6. ###  x h h(t 1) h(t+1) h(t) h(... ) h(...

) x(t 1) x(t) x(t+1) h(t) = f(h(t 1), x(t); ✓) t=3ʹ͍ͭͯల։: ఆٛ: h(3) = f(h(2), x(3); ✓) = f(f(h(1), x(2); ✓), x(3)✓) ڞ༗ 10.1 ܭࢉάϥϑͷల։
7. ### 10.1 ల։͞ΕͨRNNͷදݱ • ࣌ࠁ t ʹ͓͚ΔRNNͷηϧ͸ؔ਺ɹɹΛ༻͍ͯ  ҎԼͷΑ͏ʹ͔͚Δ • ͜ΕΛɼ࣌ࠁʹґଘ͠ͳ͍ɼ֤࣌ࠁʹద༻͞ΕΔ  ؔ਺

f ʹ෼ղ͢ΔͱมܗޙͷΑ͏ͳࣜʹͰ͖Δ • ύϥϝʔλΛڞ༗ͨ͠Մม௕ͷೖྗΛड͚෇͚Δؔ਺  h(t) = g(t)(x(t), x(t 1), x(t 2), . . . , x(2), x(1)) = f(h(t 1), x(t); ✓) g(t)
8. ### • ࠶ؼతͳఆٛͱύϥϝʔλͷڞ༗Λલఏͱͯ͠ɼ  ͍Ζ͍ΖͳRNNΛߟ͑Δ͜ͱ͕Ͱ͖Δ o (t 1) o (t 1) o

(t) o (t) h (t 1) h (t 1) h (t) h (t) x (t 1) x (t 1) x (t) x (t) W V V U U o (t 1) o (t 1) o (t) o (t) L(t 1) L(t 1) L(t) L(t) y (t 1) y (t 1) y (t) y (t) h (t 1) h (t 1) h (t) h (t) x (t 1) x (t 1) x (t) x (t) W V V U U Train time Test time h (t 1) h (t 1) W h (t) h (t) . . . . . . x (t 1) x (t 1) x (t) x (t) x (...) x (...) W W U U U h ( ) h ( ) x ( ) x ( ) W U o ( ) o ( ) y ( ) y ( ) L( ) L( ) V . . . . . . 10.2 Recurrent Neural Network  U U V V W W o (t 1) o (t 1) h h o o y y L L x x o (t) o (t) o (t+1) o (t+1) L(t 1) L(t 1) L(t) L(t) L (t+1) L (t+1) y (t 1) y (t 1) y (t) y (t) y (t+1) y (t+1) h (t 1) h (t 1) h (t) h (t) h (t+1) h (t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) W W W W W W W W h (... ) h (... ) h (... ) h (... ) V V V V V V U U U U U U Unfold http://www.deeplearningbook.org/lecture_slides.html U V W o (t 1) o (t 1) h h o o y y L L x x o (t) o (t) o (t+1) o (t+1) L(t 1) L(t 1) L(t) L(t) L (t+1) L (t+1) y (t 1) y (t 1) y (t) y (t) y (t+1) y (t+1) h (t 1) h (t 1) h (t) h (t) h (t+1) h (t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) W W W W o (... ) o (... ) h (... ) h (... ) V V V U U U Unfold
9. ### ҰൠతͳRNN  med with the graph unrolling and parameter sharing

ideas of section 10.1, w n design a wide variety of recurrent neural networks. U U V V W W o (t 1) o (t 1) h h o o y y L L x x o (t) o (t) o (t+1) o (t+1) L(t 1) L(t 1) L(t) L(t) L (t+1) L (t+1) y (t 1) y (t 1) y (t) y (t) y (t+1) y (t+1) h (t 1) h (t 1) h (t) h (t) h (t+1) h (t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) W W W W W W W W h (... ) h (... ) h (... ) h (... ) V V V V V V U U U U U U Unfold http://www.deeplearningbook.org/lecture_slides.html
10. ### લͷ࣌ࠁͷग़ྗͷ஋͕ӅΕ૚ʹಧ͘RNN  http://www.deeplearningbook.org/lecture_slides.html U V W o (t 1) o

(t 1) h h o o y y L L x x o (t) o (t) o (t+1) o (t+1) L(t 1) L(t 1) L(t) L(t) L (t+1) L (t+1) y (t 1) y (t 1) y (t) y (t) y (t+1) y (t+1) h (t 1) h (t 1) h (t) h (t) h (t+1) h (t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) W W W W o (... ) o (... ) h (... ) h (... ) V V V U U U Unfold
11. ### ࠷ऴ࣌ࠁͷηϧ͔Βग़ྗΛ͢ΔRNN  http://www.deeplearningbook.org/lecture_slides.html ompute the output for the previous time

step ﬁrst, because the des the ideal value of that output. h (t 1) h (t 1) W h (t) h (t) . . . . . . x (t 1) x (t 1) x (t) x (t) x (...) x (...) W W U U U h ( ) h ( ) x ( ) x ( ) W U o ( ) o ( ) y ( ) y ( ) L( ) L( ) V . . . . . .
12. ### • ޯ഑͸Ͳ͏΍ͬͯٻΊΔͷ͔? • Back Propagation Through Time: ܥྻ௕ʹର͠ઢܗ • ॱ఻೻ܭࢉΛ͢Δ

=> ޙΖ͔ΒޡࠩΛܭࢉ͍ͯ͘͠ 10.2 ֶश͸Ͳ͏͢Δͷ͔  http://www.phontron.com/slides/nlp-programming-ja-08-rnn.pdf
13. ### 10.2.1 Teacher Forcingͱग़ྗͰͷ࠶ؼ  o (t 1) o (t 1)

o (t) o (t) h (t 1) h (t 1) h (t) h (t) x (t 1) x (t 1) x (t) x (t) W V V U U o (t 1) o (t 1) o (t) o (t) L(t 1) L(t 1) L(t) L(t) y (t 1) y (t 1) y (t) y (t) h (t 1) h (t 1) h (t) h (t) x (t 1) x (t 1) x (t) x (t) W V V U U Train time Test time http://www.deeplearningbook.org/lecture_slides.html
14. ### • աڈͷӅΕ૚ʹର͢Δґଘ͕੾Ε͍ͯΔͷͰ  ֶशΛฒྻԽ͢Δ͜ͱ͕Մೳ • ͨͩ͠ɼΫϥεϥϕϧͷσʔλ͔͠ಘΒΕͳ͍ͷͰ  Ϟσϧͷදݱྗ͸͔ͳΓམͪΔ • ଴ͬͯɼਪ࿦࣌ʹ͸TeacherͳΜ͍ͯͳ͘ͳ͍ʁʁ • ֶश࣌:

ڭࢣσʔλ • ਪ࿦࣌ʢ։ϧʔϓʣ: ࣗ਎ͷग़ྗ • Teacher forcing͸ਪ࿦࣌ʹੑೳ͕҆ఆ͠ͳ͍͜ͱ͕ଟ͍ • free-learningΛ͢Δ͜ͱͰ؇࿨͢ΔΒ͍͠ʢn-bestతͳʁʣ  10.2.1 Teacher Forcingͱग़ྗͰͷ࠶ؼ
15. ### 10.2.2 RNNʹ͓͚Δޯ഑ͷܭࢉ • BPTTʹΑͬͯޯ഑ΛٻΊΔ • ֤ϊʔυͰඍ෼ => ύϥϝʔλͰඍ෼  TER

10. SEQUENCE MODELING: RECURRENT AND RECURSIVE NETS mation ﬂow forward in time (computing outputs and losses) and backward me (computing gradients) by explicitly showing the path along which this mation ﬂows. Recurrent Neural Networks d with the graph unrolling and parameter sharing ideas of section 10.1, we esign a wide variety of recurrent neural networks. U U V V W W o (t 1) o (t 1) h h o o y y L L x x o (t) o (t) o (t+1) o (t+1) L(t 1) L(t 1) L(t) L(t) L (t+1) L (t+1) y (t 1) y (t 1) y (t) y (t) y (t+1) y (t+1) h (t 1) h (t 1) h (t) h (t) h (t+1) h (t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) W W W W W W W W h (... ) h (... ) h (... ) h (... ) V V V V V V U U U U U U Unfold 10.3: The computational graph to compute the training loss of a recurrent network maps an input sequence of x values to a corresponding sequence of output o values. http://www.deeplearningbook.org/lecture_slides.html a(t) = b + Wh(t 1) + Ux(t) h(t) = tanh(a(t)) o(t) = c + Vh(t) ˆ y(t) = softmax(o(t))
16. ### ϩεΛग़ྗ૚Ͱඍ෼  with the graph unrolling and parameter sharing ideas

of section 10.1, we gn a wide variety of recurrent neural networks. U U V V W W o (t 1) o (t 1) h h o o y y L L x x o (t) o (t) o (t+1) o (t+1) L(t 1) L(t 1) L(t) L(t) L (t+1) L (t+1) y (t 1) y (t 1) y (t) y (t) y (t+1) y (t+1) h (t 1) h (t 1) h (t) h (t) h (t+1) h (t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) W W W W W W W W h (... ) h (... ) h (... ) h (... ) V V V V V V U U U U U U Unfold http://www.deeplearningbook.org/lecture_slides.html
17. ### ϩεΛग़ྗ૚Ͱඍ෼  ΫϥεͰల։ L = X t L(t) ) @L

@L(t) = 1 where L(t) = X c y(t) c log ˆ y(t) c @L @o(t) i = @L @L(t) · @L(t) @o(t) i = @L @L(t) · X c @L(t) @ˆ y(t) c · @ˆ y(t) c @o(t) i
18. ### ϩεΛग़ྗ૚Ͱඍ෼  @exp (o(t) i ) @o(t) k = (

exp (o(t) i ) (i = k) 0 (otherwise) ˆ y(t) i = softmax(o(t))i = exp (o(t) i ) P c exp (o(t) c ) X c @L(t) @ˆ y(t) c · @ˆ y(t) c @o(t) i = @L(t) @ˆ y(t) i · @ˆ y(t) i @o(t) i + X c6=i @L(t) @ˆ y(t) c · @ˆ y(t) c @o(t) i ͳͷͰɼiʹ͍ͭͯ৔߹෼͚͢Δͱ… ͜͜Ͱɼ JD J㱠D
19. ### ϩεΛग़ྗ૚Ͱඍ෼  @L(t) @ˆ y(t) i · @ˆ y(t) i

@o(t) i = y(t) i ˆ y(t) i · exp (o(t) i ) P c exp (o(t) c ) exp (o(t) i ) exp (o(t) i ) P c exp (o(t) c )2 = y(t) i ˆ y(t) i · exp (o(t) i )( P c exp (o(t) c ) exp (o(t) i )) P c exp (o(t) c )2 = y(t) i ˆ y(t) i · ˆ y(t) i (1 ˆ y(t) i ) = y(t) i (1 ˆ y(t) i ) = y(t) i ˆ y(t) i y(t) i @L(t) @ˆ y(t) c · @ˆ y(t) c @o(t) i = y(t) c ˆ y(t) c · exp (o(t) i ) exp (o(t) c ) P c exp (o(t) c )2 = y(t) c ˆ y(t) c · ˆ y(t) i ˆ y(t) c = y(t) c · ˆ y(t) i f g 0 f0g fg0 g2
20. ### ϩεΛग़ྗ૚Ͱඍ෼  ·ͱΊΔͱ… X c @L(t) @ˆ y(t) c ·

@ˆ y(t) c @o(t) i = @L(t) @ˆ y(t) i · @ˆ y(t) i @o(t) i + X c6=i @L(t) @ˆ y(t) c · @ˆ y(t) c @o(t) i = y(t) i ˆ y(t) i y(t) i + X c6=i y(t) c · ˆ y(t) i = y(t) i + X c y(t) c · ˆ y(t) i = ˆ y(t) i y(t) i (* X c y(t) c = 1) ) @L @o(t) i = @L @L(t) · @L(t) @o(t) i = @L @L(t) · X c @L(t) @ˆ y(t) c · @ˆ y(t) c @o(t) i = ˆ y(t) i y(t) i ϕΫτϧͰදݱ͢Δͱʁ JOEJDBUPSGVODUJPOΛ࢖͏  ڭՊॻͱҰக͢Δ
21. ### ϩεΛதؒ૚Ͱඍ෼  with the graph unrolling and parameter sharing ideas

of section 10.1, we gn a wide variety of recurrent neural networks. U U V V W W o (t 1) o (t 1) h h o o y y L L x x o (t) o (t) o (t+1) o (t+1) L(t 1) L(t 1) L(t) L(t) L (t+1) L (t+1) y (t 1) y (t 1) y (t) y (t) y (t+1) y (t+1) h (t 1) h (t 1) h (t) h (t) h (t+1) h (t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) W W W W W W W W h (... ) h (... ) h (... ) h (... ) V V V V V V U U U U U U Unfold http://www.deeplearningbook.org/lecture_slides.html
22. ### ϩεΛதؒ૚Ͱඍ෼ • ϩεͷಉ࣌͡ࠁͷग़ྗ૚Ͱͷޯ഑ͱ  ϩεͷ1࣌ࠁઌͷதؒ૚Ͱͷޯ഑ͷ࿨  @L @h(t) i = @L

@L(t) · ⇣X u @L(t) @h(t+1) u @h(t+1) u @h(t) i + X c @L(t) @o(t) c @o(t) c @h(t) i ⌘ = @L(t) @h(t+1) i @h(t+1) i @h(t) i + @L(t) @o(t) i @o(t) i @h(t) i ⇣ * @h(t+1) u @h(t) i = 0 (u 6= i) & @o(t) c @h(t) i = 0 (c 6= i) ⌘
23. ### ϩεΛதؒ૚Ͱඍ෼  h(t) i = tanh(a(t) i ) @h(t+1) i

@h(t) i = @ tanh(a(t+1) i ) @h(t) i = ⇣ 1 tanh2(a(t+1) i ) ⌘ · @ (bi + Wh(t) i + Ux(t+1) i ) @h(t) i = ⇣ 1 tanh2(a(t+1) i ) ⌘ · X u Wiu ) @L(t) @h(t+1) i @h(t+1) i @h(t) i = ⇣ 1 tanh2(a(t+1) i ) ⌘ · (rh(t+1)L )i · X u Wiu )rh(t) L = WT · (rh(t+1) L) · diag ⇣ 1 (h(t+1)))2 ⌘ @o(t) i @h(t) i = @ ci + V h(t) i @h(t) i = X u Viu ) @L @h(t) i = ro(t) L i X u Viu )rh(t) L = V T ro(t) L
24. ### ϩεΛதؒ૚Ͱඍ෼  ੔ཧ͢Δͱ… @L @h(t) i = @L @L(t) ·

⇣X u @L(t) @h(t+1) u @h(t+1) u @h(t) i + X c @L(t) @o(t) c @o(t) c @h(t) i ⌘ = @L(t) @h(t+1) i @h(t+1) i @h(t) i + @L(t) @o(t) i @o(t) i @h(t) i ⇣ * @h(t+1) u @h(t) i = 0 (u 6= i) & @o(t) c @h(t) i = 0 (c 6= i) ⌘ )rh(t) L = WT · (rh(t+1) L) · diag ⇣ 1 (h(t+1)))2 ⌘ + V T ro(t) L
25. ### ϩεΛύϥϝʔλͰඍ෼ • ܭࢉάϥϑ্ͷ֤ϊʔυͰͷޯ഑͕ܭࢉͰ͖ͨ  => ͜ΕΒΛ࢖ͬͯύϥϝʔλͰͷޯ഑΋ܭࢉ͢Δ  ύϥϝʔλ͸֤࣌ࠁͰڞ༗͞Ε͍ͯΔ  => ޯ഑ΛऔΔͱ͖ʹ࣌ࠁͰͷ࿨ΛऔΔඞཁ͕͋Δ a(t)

= b + Wh(t 1) + Ux(t) h(t) = tanh(a(t)) o(t) = c + Vh(t) ˆ y(t) = softmax(o(t))
26. ### ϩεΛύϥϝʔλc͓ΑͼbͰඍ෼  with the graph unrolling and parameter sharing ideas

of section 10.1, we gn a wide variety of recurrent neural networks. U U V V W W o (t 1) o (t 1) h h o o y y L L x x o (t) o (t) o (t+1) o (t+1) L(t 1) L(t 1) L(t) L(t) L (t+1) L (t+1) y (t 1) y (t 1) y (t) y (t) y (t+1) y (t+1) h (t 1) h (t 1) h (t) h (t) h (t+1) h (t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) W W W W W W W W h (... ) h (... ) h (... ) h (... ) V V V V V V U U U U U U Unfold http://www.deeplearningbook.org/lecture_slides.html Ͳ͔͜
27. ### ϩεΛύϥϝʔλc͓ΑͼbͰඍ෼  @L @ci = X t X k @L(t)

@o(t) k · @o(t) k @ci = X t @L(t) @o(t) i (* i 6= k ) @o(t) k @ci = 0) )rcL = X t ro(t) L(t) @L @bi = X t X c @L(t) @h(t) c · X k @h(t) c @a(t) k · @a(t) k @bi = X t (rh(t) L(t))i · (1 tanh2(h(t) i )) (* k 6= i ) @a(t) k @bi = 0) ) rbL = X t diag(1 (h(t))2)rh(t) L
28. ### ϩεΛύϥϝʔλVͰඍ෼  with the graph unrolling and parameter sharing ideas

of section 10.1, we gn a wide variety of recurrent neural networks. U U V V W W o (t 1) o (t 1) h h o o y y L L x x o (t) o (t) o (t+1) o (t+1) L(t 1) L(t 1) L(t) L(t) L (t+1) L (t+1) y (t 1) y (t 1) y (t) y (t) y (t+1) y (t+1) h (t 1) h (t 1) h (t) h (t) h (t+1) h (t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) W W W W W W W W h (... ) h (... ) h (... ) h (... ) V V V V V V U U U U U U Unfold http://www.deeplearningbook.org/lecture_slides.html
29. ### ϩεΛύϥϝʔλVͰඍ෼  ( @L @V )ij = X t X

c @L @L(t) · @L(t) @o(t) c · @o(t) c @Vij = X t @L(t) @o(t) i · h(t) j = X t @L @o(t) i · h(t) j * @o(t) c @Vij = @(ci + Vh(t) i ) @Vij = h(t) j (* Vh(t) i = X k Vikh(t) k ) ) rV L = X t (ro(t) L)h(t)T
30. ### ϩεΛύϥϝʔλWͰඍ෼  with the graph unrolling and parameter sharing ideas

of section 10.1, we gn a wide variety of recurrent neural networks. U U V V W W o (t 1) o (t 1) h h o o y y L L x x o (t) o (t) o (t+1) o (t+1) L(t 1) L(t 1) L(t) L(t) L (t+1) L (t+1) y (t 1) y (t 1) y (t) y (t) y (t+1) y (t+1) h (t 1) h (t 1) h (t) h (t) h (t+1) h (t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) W W W W W W W W h (... ) h (... ) h (... ) h (... ) V V V V V V U U U U U U Unfold http://www.deeplearningbook.org/lecture_slides.html
31. ### ϩεΛύϥϝʔλWͰඍ෼  ࿈࠯཯ ( @L @W )ij = X t

X u @L @h(t) u · @h(t) u @Wij = X t @L @h(t) i · @h(t) i @Wij (* h(t) i = tanh(a(t) i ) ) @h(t) i @Wij = X c @h(t) i @a(t) c · @a(t) c @Wij = @h(t) i @a(t) i · @a(t) i @Wij ) = 1 tanh2(a(t) i ) · @a(t) i @Wij = (1 h(t)2 )h(t 1) j (* (Wh(t))i = X k Wikh(t) k ) @(Wh(t))i @Wij = h(t) j )
32. ### ϩεΛύϥϝʔλUͰඍ෼  with the graph unrolling and parameter sharing ideas

of section 10.1, we gn a wide variety of recurrent neural networks. U U V V W W o (t 1) o (t 1) h h o o y y L L x x o (t) o (t) o (t+1) o (t+1) L(t 1) L(t 1) L(t) L(t) L (t+1) L (t+1) y (t 1) y (t 1) y (t) y (t) y (t+1) y (t+1) h (t 1) h (t 1) h (t) h (t) h (t+1) h (t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) W W W W W W W W h (... ) h (... ) h (... ) h (... ) V V V V V V U U U U U U Unfold http://www.deeplearningbook.org/lecture_slides.html
33. ### ϩεΛύϥϝʔλUͰඍ෼  ( @L @U )ij = X t X

u @L @h(t) u · @h(t) u @Uij = X t @L @h(t) i · @h(t) i @Uij = X t @L @h(t) i · X c @h(t) i @a(t) c · @a(t) c @Uij = X t @L @h(t) i · @h(t) i @a(t) i · @a(t) i @Uij = X t @L @h(t) i · (1 tanh2(h(t) i )) · x(t) j (* Ux(t) i = X k Uikx(t) k ) @Ux(t) i @Uij = x(t) j ) ) rU L = X t diag ⇣ 1 (h(t))2 ⌘ (rh(t) L)x(t)T
34. ### ޯ഑ʹ͍ͭͯͷ·ͱΊ  rcL = X t ro(t) L(t) rbL =

X t diag(1 (h(t))2)rh(t) L rWL = X t diag ⇣ 1 (h(t))2 ⌘ (rh(t) L)h(t 1)T rUL = X t diag ⇣ 1 (h(t))2 ⌘ (rh(t) L)x(t)T rVL = X t (ro(t) L)h(t)T ͜ΕͰֶश͕Ͱ͖Δʂ

. . , x(t))
36. ### • աڈͷ༧ଌͷ஋Λߟྀ͢ΔͱɼάϥϑΟΧϧϞσϧ͸… • ͨͩ͠ɼਤ͸ೖྗΛ࣋ͨͳ͍RNNͷྫ • ͔ͭɼRNNͷӅΕ૚Λແࢹ͍ͯ͠Δ • ೖྗΛ࣋ͭͳΒҎԼͷΑ͏ͳ৚݅෇͖֬཰Ͱ༧ଌΛߦ͏ աڈͷ༧ଌ஋Λߟྀͯ͠༧ଌΛߦ͏ͳΒʁ 

p(y(t)|x(1), . . . , x(t), y(1), . . . , y(t 1)) http://www.deeplearningbook.org/lecture_slides.html

38. ### RNNͷ೉͠͞: ܥྻͷ௕͞ͷܾఆ • ೖྗ͕ܧଓతʹಘΒΕΔͱ͖ɼग़ྗܥྻͷऴྃ͸Ͳ͜ʁ  ʢͲ͔͜ΒͲ͜·Ͱ͕1ͭͷܥྻʁʣ 1. ܥྻͷऴΘΓΛද͢γϯϘϧΛೖΕֶͯश͓ͯ͘͠ 2. ֤εςοϓͰܥྻ͕ऴྃ͢Δ͔Ͳ͏͔Λ൑ఆ͢Δ  ϕϧψʔΠग़ྗΛϞσϧʹ͚ͬͭ͘Δ

3. ϞσϧͷதͰܥྻ௕΋ֶशɾ༧ଌ͢Δ  ʢϞσϧ͸ܥྻ௕ΛαϯϓϦϯάޙʹܥྻΛ༧ଌ͢Δʣ  p(x(1), . . . , x(⌧)) = P(⌧)P(x(1), . . . , x(⌧)|⌧)
39. ### 10.2.4 RNNʹΑͬͯจ຺Λߟྀͨ͠ܥྻϞσϦϯά • RNNͷηϧ͸ೖྗͷ֤࣌ࠁͷ஋ ͷΈΛड͚औΔʁ  => ඞͣ͠΋ͦͷඞཁ͸ͳ͘ɼશೖྗΛड͚ͯ΋ྑ͍ • ৚݅෇͖֬཰৔ (CRF)

Ͱͷٞ࿦ͱࣅͨΑ͏ͳײ͡ʁ  http://www.deeplearningbook.org/lecture_slides.html x(t) P(y(1), . . . , y(⌧)|x(1), . . . , y(⌧)) = Y t P(y(t)|x(1), . . . , y(⌧)) ܥྻͷ֬཰ʹରͯ͠  ͜ͷΑ͏ͳ෼ղΛ͍ͯ͠ΔͱΈͳͤΔ

41. ### 10.3 Bidirectional RNN • ॱํ޲ͱٯํ޲ͷRNNΛ݁߹͢Δ • Ի੠ͷೝࣝͳͲͰ͸ɼݱࡏ·ͰͷԻ͚ͩͰ͸ͳ͘  ͦͷઌͷԻΛߟྀͨ͠΄͏͕ྑ͍͜ͱ͕͋Δʢաڈ+ະདྷʣ • ΋ͪΖΜɼܥྻશମ͕؍ଌ͞Ε͍ͯΔඞཁ͸͋Δ

• ͓͓ΑͦͷλεΫͰ͸ܥྻશମ͸؍ଌ͞Ε͍ͯΔ جຊతʹBidirectionalͷํ͕ੑೳ͕ྑ͍ʢओ؍ʣ 
42. ### 10.4 Encoder-Decoder Sequence-to-Sequence  vector. We have seen in ﬁgure

10.9 how an RNN can map a ﬁxed-size vector to a sequence. We have seen in ﬁgures 10.3, 10.4, 10.10 and 10.11 how an RNN can map an input sequence to an output sequence of the same length. Encoder … x (1) x (1) x (2) x (2) x (...) x (...) x (nx) x (nx) Decoder … y (1) y (1) y (2) y (2) y (...) y (...) y (ny) y (ny) C C http://www.deeplearningbook.org/lecture_slides.html
43. ### 10.4 Encoder-Decoder Sequence-to-Sequence • Encoder͸ೖྗΛจ຺ϕΫτϧʹූ߸Խ • Decoder͸จ຺ϕΫτϧΛ෮߸Խ͠ग़ྗΛಘΔ • Encoder-Decoder͸Մม௕ͷೖྗɾग़ྗ͕Մೳ (seq2seq)

• ຋༁ͳͲɼೖग़ྗͷܥྻ௕͕Ұக͠ͳ͍৔߹ʹخ͍͠ • RNNΛ࢖ͬͨEncoder-DecoderϞσϧΛ  RNN Sequence-to-Sequenceͱ͍͏ 
44. ### 10.4 จ຺Cʹ͍ͭͯ • C͸ݻఆ௕ͷϕΫτϧͰ͋Δ͜ͱ͕ଟ͍ • χϡʔϥϧػց຋༁ͷ࿦จ [Bahdanau+, 2015] Ͱ͸… •

C͸Մม௕ͷϕΫτϧͰ΋ྑ͍ • ΞςϯγϣϯػߏΛಋೖͯ͠จ຺ΛΑΓ׆༻͢Δ  https://arxiv.org/pdf/1409.0473.pdf
45. ### 10.5 Deep Recurrent Neural Network  h y x z

(a) (b) (c) x h y x h y http://www.deeplearningbook.org/lecture_slides.html
46. ###  • ৭ʑͳੵΈํ͕͋Δ • ୯७ʹRNN૚ͷ࣍ʹRNN૚Λ௥Ճ (a) • RNN૚ͷग़ྗΛMLPʹೖྗɼಘΒΕͨग़ྗ  ΛRNNͷ࣍ͷೖྗͱ͢Δ (b)

• (c) ͸εΩοϓ઀ଓΛಋೖ͢Δ͜ͱͰɼ  χϡʔϥϧωοτϫʔΫ্ͷϊʔυ͔Βϊʔυͷ  ࠷୹ڑ཭͕௕͘ͳͬͯ͠·͏͜ͱΛ๷͍Ͱ͍Δ 10.5 Deep Recurrent Neural Network
47. ### 10.6 Recursive Neural Network  http://www.deeplearningbook.org/lecture_slides.html 10.6 Recursive Neural Networks

x (1) x (1) x (2) x (2) x (3) x (3) V V V y y L L x (4) x (4) V o o U W U W U W Figure 10.14: A recursive network has a computational graph that generalizes that of the
48. ### 10.6 Recursive Neural Network • RNN (recurrent neural network): ઢܗͳܥྻσʔλ

• RNN (recursive neural network): ໦ߏ଄Λ࣋ͭσʔλ  CHAPTER 10. SEQUENCE MODELING: RECURRENT AND RECURSIVE NETS information ﬂow forward in time (computing outputs and losses) and backward in time (computing gradients) by explicitly showing the path along which this information ﬂows. 10.2 Recurrent Neural Networks Armed with the graph unrolling and parameter sharing ideas of section 10.1, we can design a wide variety of recurrent neural networks. U U V V W W o (t 1) o (t 1) h h o o y y L L x x o (t) o (t) o (t+1) o (t+1) L(t 1) L(t 1) L(t) L(t) L (t+1) L (t+1) y (t 1) y (t 1) y (t) y (t) y (t+1) y (t+1) h (t 1) h (t 1) h (t) h (t) h (t+1) h (t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) W W W W W W W W h (... ) h (... ) h (... ) h (... ) V V V V V V U U U U U U Unfold Figure 10.3: The computational graph to compute the training loss of a recurrent network that maps an input sequence of x values to a corresponding sequence of output o values. CHAPTER 10. SEQUENCE MODELING: RECURRENT AND RECURSIVE NETS can be mitigated by introducing skip connections in the hidden-to-hidden path, a illustrated in ﬁgure 10.13c. 10.6 Recursive Neural Networks x (1) x (1) x (2) x (2) x (3) x (3) V V V y y L L x (4) x (4) V o o U W U W U W Figure 10.14: A recursive network has a computational graph that generalizes that of th recurrent network from a chain to a tree. A variable-size sequence x(1), x(2), . . . , x(t) ca be mapped to a ﬁxed-size representation (the output o), with a ﬁxed set of paramete http://www.deeplearningbook.org/lecture_slides.html

50. ### 10.7 ௕ڑ཭هԱ  PTER 10. SEQUENCE MODELING: RECURRENT AND RECURSIVE

NETS 60 40 20 0 20 40 60 Input coordinate 4 3 2 1 0 1 2 3 4 Projection of output 0 1 2 3 4 5 e 10.15: When composing many nonlinear functions (like the linear-tanh layer http://www.deeplearningbook.org/lecture_slides.html
51. ### 10.7 ௕ڑ཭هԱ  • RNNͰ͸ޯ഑ͷফࣦɾരൃ͕େ͖ͳ໰୊ͱͳΔ • ޯ഑͸ফࣦ͢Δ͜ͱ͕ଟ͍ • كʹരൃ͠ɼ࠷దԽʹѱӨڹΛ༩͑Δ •

ྑ͍ύϥϝʔλ͕༩͑ΒΕ͍ͯͨͱͯ͠΋  Өڹྗ͸ڑ཭ʹରͯ͠ࢦ਺വ਺తʹখ͘͞ͳΔ
52. ### 10.7 ௕ڑ཭هԱ  • RNNͷӅΕ૚ • ύϥϝʔλߦྻ͕ݻ༗஋෼ղՄೳͱ͢Δͱ ht = Wht

1 = W(Wht 2) = WW(Wht 3) · · · = Wth0 ht = Wht 1 = (Q⇤Q 1)(Q⇤Q 1) . . . (Q⇤Q 1)h0 = Q⇤tQ 1h0 (where W = Q⇤Q 1)
53. ### 10.7 ௕ڑ཭هԱ • ݻ༗஋͕1ΑΓେ͖͍৔߹ -> രൃ • ݻ༗஋͕1ΑΓখ͍͞৔߹ -> ফࣦ

• ࠷େͷݻ༗ϕΫτϧͱಉ͡޲͖Λ࣋ͨͳ͍ɹ ͷཁૉ͸  ࠷ऴతʹഁغ͞ΕΔ (?)  h0