Deep Learning Book 10その1 / deep learning book 10 vol1

Slide 1

Slide 1 text

$IBQUFS 4FRVFODF.PEFMJOH  3FDVSSFOUBOE3FDVSTJWF/FUT %FFQ-FBSOJOH#PPLಡΈձ!य़೔ΤϦΞ IJNLU .BU6OJWFSTJUZPG5TVLVCB

Slide 2

Slide 2 text

Recurrent neural networks (RNNs)ͱ͸ • ܥྻσʔλΛѻ͏͜ͱ͕Ͱ͖ΔχϡʔϥϧωοτϫʔΫ • ௨ৗͷχϡʔϥϧωοτϫʔΫΑΓ΋ܥྻ௕ʹ͍ͨͯ͠  ͍ͨ΁Μྑ͍۩߹ʹεέʔϧ͢Δ • RNN͸Մม௕ͷܥྻσʔλΛѻ͏͜ͱ͕Ͱ͖Δʢʂʂʣ

Slide 3

Slide 3 text

• ΋͠΋֤࣌ࠁ͝ͱʹύϥϝʔλΛ͍࣋ͬͯͨΒ • ֶशσʔλʹग़ݱ͠ͳ͍௕͞ͷܥྻΛѻ͑ͳ͍ • ҟͳΔ௕͞ͷܥྻσʔλͷ৘ใΛߟྀͰ͖ͳ͍ • ύϥϝʔλڞ༗͸Ͳ͏͍͏ͱ͖ʹ༗ޮ͔ʁ • ಛఆͷ৘ใ͕ҟͳΔҐஔʹग़ݱ͠͏Δͱ͖ • “I went to Nepal in 2009”ͱ”In 2009, I went to Nepal” RNNsͱύϥϝʔλڞ༗ s(0) s(1) s(2) s(3) s(4) s(5) s(0) s(1) s(2) s(3) s(4) s(0) s(1) s(2) s(3) s(4) s(5)

Slide 4

Slide 4 text

• ηϧʢ·Δ͍΍ͭʣΛύϥϝʔλΛڞ༗ͨ͠  ̍ͭͷηϧͱΈͳ͢ͱ…ʁ ંΓͨͨΈʢFoldingʣ s(0) s(1) s(2) s(3) s(4) s(5)

Slide 5

Slide 5 text

10.1 ܭࢉάϥϑͷల։ x h h(t 1) h(t+1) h(t) h(... ) h(... ) x(t 1) x(t) x(t+1) • ҎԼͷΑ͏ͳग़ྗΛ΋ͨͳ͍RNNΛߟ͑Δ • Unfolding: ͖ͬ͞΍ͬͨ͜ͱͷٯ • ંΓͨͨ·ΕͨηϧΛ࣌ࠁ͝ͱʹల։͢Δͱ…

Slide 6

Slide 6 text

x h h(t 1) h(t+1) h(t) h(... ) h(... ) x(t 1) x(t) x(t+1) h(t) = f(h(t 1), x(t); ✓) t=3ʹ͍ͭͯల։: ఆٛ: h(3) = f(h(2), x(3); ✓) = f(f(h(1), x(2); ✓), x(3)✓) ڞ༗ 10.1 ܭࢉάϥϑͷల։

Slide 7

Slide 7 text

10.1 ల։͞ΕͨRNNͷදݱ • ࣌ࠁ t ʹ͓͚ΔRNNͷηϧ͸ؔ਺ɹɹΛ༻͍ͯ  ҎԼͷΑ͏ʹ͔͚Δ • ͜ΕΛɼ࣌ࠁʹґଘ͠ͳ͍ɼ֤࣌ࠁʹద༻͞ΕΔ  ؔ਺ f ʹ෼ղ͢ΔͱมܗޙͷΑ͏ͳࣜʹͰ͖Δ • ύϥϝʔλΛڞ༗ͨ͠Մม௕ͷೖྗΛड͚෇͚Δؔ਺ h(t) = g(t)(x(t), x(t 1), x(t 2), . . . , x(2), x(1)) = f(h(t 1), x(t); ✓) g(t)

Slide 8

Slide 8 text

• ࠶ؼతͳఆٛͱύϥϝʔλͷڞ༗Λલఏͱͯ͠ɼ  ͍Ζ͍ΖͳRNNΛߟ͑Δ͜ͱ͕Ͱ͖Δ o (t 1) o (t 1) o (t) o (t) h (t 1) h (t 1) h (t) h (t) x (t 1) x (t 1) x (t) x (t) W V V U U o (t 1) o (t 1) o (t) o (t) L(t 1) L(t 1) L(t) L(t) y (t 1) y (t 1) y (t) y (t) h (t 1) h (t 1) h (t) h (t) x (t 1) x (t 1) x (t) x (t) W V V U U Train time Test time h (t 1) h (t 1) W h (t) h (t) . . . . . . x (t 1) x (t 1) x (t) x (t) x (...) x (...) W W U U U h ( ) h ( ) x ( ) x ( ) W U o ( ) o ( ) y ( ) y ( ) L( ) L( ) V . . . . . . 10.2 Recurrent Neural Network U U V V W W o (t 1) o (t 1) h h o o y y L L x x o (t) o (t) o (t+1) o (t+1) L(t 1) L(t 1) L(t) L(t) L (t+1) L (t+1) y (t 1) y (t 1) y (t) y (t) y (t+1) y (t+1) h (t 1) h (t 1) h (t) h (t) h (t+1) h (t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) W W W W W W W W h (... ) h (... ) h (... ) h (... ) V V V V V V U U U U U U Unfold http://www.deeplearningbook.org/lecture_slides.html U V W o (t 1) o (t 1) h h o o y y L L x x o (t) o (t) o (t+1) o (t+1) L(t 1) L(t 1) L(t) L(t) L (t+1) L (t+1) y (t 1) y (t 1) y (t) y (t) y (t+1) y (t+1) h (t 1) h (t 1) h (t) h (t) h (t+1) h (t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) W W W W o (... ) o (... ) h (... ) h (... ) V V V U U U Unfold

Slide 9

Slide 9 text

ҰൠతͳRNN med with the graph unrolling and parameter sharing ideas of section 10.1, w n design a wide variety of recurrent neural networks. U U V V W W o (t 1) o (t 1) h h o o y y L L x x o (t) o (t) o (t+1) o (t+1) L(t 1) L(t 1) L(t) L(t) L (t+1) L (t+1) y (t 1) y (t 1) y (t) y (t) y (t+1) y (t+1) h (t 1) h (t 1) h (t) h (t) h (t+1) h (t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) W W W W W W W W h (... ) h (... ) h (... ) h (... ) V V V V V V U U U U U U Unfold http://www.deeplearningbook.org/lecture_slides.html

Slide 10

Slide 10 text

લͷ࣌ࠁͷग़ྗͷ஋͕ӅΕ૚ʹಧ͘RNN http://www.deeplearningbook.org/lecture_slides.html U V W o (t 1) o (t 1) h h o o y y L L x x o (t) o (t) o (t+1) o (t+1) L(t 1) L(t 1) L(t) L(t) L (t+1) L (t+1) y (t 1) y (t 1) y (t) y (t) y (t+1) y (t+1) h (t 1) h (t 1) h (t) h (t) h (t+1) h (t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) W W W W o (... ) o (... ) h (... ) h (... ) V V V U U U Unfold

Slide 11

Slide 11 text

࠷ऴ࣌ࠁͷηϧ͔Βग़ྗΛ͢ΔRNN http://www.deeplearningbook.org/lecture_slides.html ompute the output for the previous time step ﬁrst, because the des the ideal value of that output. h (t 1) h (t 1) W h (t) h (t) . . . . . . x (t 1) x (t 1) x (t) x (t) x (...) x (...) W W U U U h ( ) h ( ) x ( ) x ( ) W U o ( ) o ( ) y ( ) y ( ) L( ) L( ) V . . . . . .

Slide 12

Slide 12 text

• ޯ഑͸Ͳ͏΍ͬͯٻΊΔͷ͔? • Back Propagation Through Time: ܥྻ௕ʹର͠ઢܗ • ॱ఻೻ܭࢉΛ͢Δ => ޙΖ͔ΒޡࠩΛܭࢉ͍ͯ͘͠ 10.2 ֶश͸Ͳ͏͢Δͷ͔ http://www.phontron.com/slides/nlp-programming-ja-08-rnn.pdf

Slide 13

Slide 13 text

10.2.1 Teacher Forcingͱग़ྗͰͷ࠶ؼ o (t 1) o (t 1) o (t) o (t) h (t 1) h (t 1) h (t) h (t) x (t 1) x (t 1) x (t) x (t) W V V U U o (t 1) o (t 1) o (t) o (t) L(t 1) L(t 1) L(t) L(t) y (t 1) y (t 1) y (t) y (t) h (t 1) h (t 1) h (t) h (t) x (t 1) x (t 1) x (t) x (t) W V V U U Train time Test time http://www.deeplearningbook.org/lecture_slides.html

Slide 14

Slide 14 text

• աڈͷӅΕ૚ʹର͢Δґଘ͕੾Ε͍ͯΔͷͰ  ֶशΛฒྻԽ͢Δ͜ͱ͕Մೳ • ͨͩ͠ɼΫϥεϥϕϧͷσʔλ͔͠ಘΒΕͳ͍ͷͰ  Ϟσϧͷදݱྗ͸͔ͳΓམͪΔ • ଴ͬͯɼਪ࿦࣌ʹ͸TeacherͳΜ͍ͯͳ͘ͳ͍ʁʁ • ֶश࣌: ڭࢣσʔλ • ਪ࿦࣌ʢ։ϧʔϓʣ: ࣗ਎ͷग़ྗ • Teacher forcing͸ਪ࿦࣌ʹੑೳ͕҆ఆ͠ͳ͍͜ͱ͕ଟ͍ • free-learningΛ͢Δ͜ͱͰ؇࿨͢ΔΒ͍͠ʢn-bestతͳʁʣ 10.2.1 Teacher Forcingͱग़ྗͰͷ࠶ؼ

Slide 15

Slide 15 text

10.2.2 RNNʹ͓͚Δޯ഑ͷܭࢉ • BPTTʹΑͬͯޯ഑ΛٻΊΔ • ֤ϊʔυͰඍ෼ => ύϥϝʔλͰඍ෼ TER 10. SEQUENCE MODELING: RECURRENT AND RECURSIVE NETS mation ﬂow forward in time (computing outputs and losses) and backward me (computing gradients) by explicitly showing the path along which this mation ﬂows. Recurrent Neural Networks d with the graph unrolling and parameter sharing ideas of section 10.1, we esign a wide variety of recurrent neural networks. U U V V W W o (t 1) o (t 1) h h o o y y L L x x o (t) o (t) o (t+1) o (t+1) L(t 1) L(t 1) L(t) L(t) L (t+1) L (t+1) y (t 1) y (t 1) y (t) y (t) y (t+1) y (t+1) h (t 1) h (t 1) h (t) h (t) h (t+1) h (t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) W W W W W W W W h (... ) h (... ) h (... ) h (... ) V V V V V V U U U U U U Unfold 10.3: The computational graph to compute the training loss of a recurrent network maps an input sequence of x values to a corresponding sequence of output o values. http://www.deeplearningbook.org/lecture_slides.html a(t) = b + Wh(t 1) + Ux(t) h(t) = tanh(a(t)) o(t) = c + Vh(t) ˆ y(t) = softmax(o(t))

Slide 16

Slide 16 text

ϩεΛग़ྗ૚Ͱඍ෼ with the graph unrolling and parameter sharing ideas of section 10.1, we gn a wide variety of recurrent neural networks. U U V V W W o (t 1) o (t 1) h h o o y y L L x x o (t) o (t) o (t+1) o (t+1) L(t 1) L(t 1) L(t) L(t) L (t+1) L (t+1) y (t 1) y (t 1) y (t) y (t) y (t+1) y (t+1) h (t 1) h (t 1) h (t) h (t) h (t+1) h (t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) W W W W W W W W h (... ) h (... ) h (... ) h (... ) V V V V V V U U U U U U Unfold http://www.deeplearningbook.org/lecture_slides.html

Slide 17

Slide 17 text

ϩεΛग़ྗ૚Ͱඍ෼ ΫϥεͰల։ L = X t L(t) ) @L @L(t) = 1 where L(t) = X c y(t) c log ˆ y(t) c @L @o(t) i = @L @L(t) · @L(t) @o(t) i = @L @L(t) · X c @L(t) @ˆ y(t) c · @ˆ y(t) c @o(t) i

Slide 18

Slide 18 text

ϩεΛग़ྗ૚Ͱඍ෼ @exp (o(t) i ) @o(t) k = ( exp (o(t) i ) (i = k) 0 (otherwise) ˆ y(t) i = softmax(o(t))i = exp (o(t) i ) P c exp (o(t) c ) X c @L(t) @ˆ y(t) c · @ˆ y(t) c @o(t) i = @L(t) @ˆ y(t) i · @ˆ y(t) i @o(t) i + X c6=i @L(t) @ˆ y(t) c · @ˆ y(t) c @o(t) i ͳͷͰɼiʹ͍ͭͯ৔߹෼͚͢Δͱ… ͜͜Ͱɼ JD J㱠D

Slide 19

Slide 19 text

ϩεΛग़ྗ૚Ͱඍ෼ @L(t) @ˆ y(t) i · @ˆ y(t) i @o(t) i = y(t) i ˆ y(t) i · exp (o(t) i ) P c exp (o(t) c ) exp (o(t) i ) exp (o(t) i ) P c exp (o(t) c )2 = y(t) i ˆ y(t) i · exp (o(t) i )( P c exp (o(t) c ) exp (o(t) i )) P c exp (o(t) c )2 = y(t) i ˆ y(t) i · ˆ y(t) i (1 ˆ y(t) i ) = y(t) i (1 ˆ y(t) i ) = y(t) i ˆ y(t) i y(t) i @L(t) @ˆ y(t) c · @ˆ y(t) c @o(t) i = y(t) c ˆ y(t) c · exp (o(t) i ) exp (o(t) c ) P c exp (o(t) c )2 = y(t) c ˆ y(t) c · ˆ y(t) i ˆ y(t) c = y(t) c · ˆ y(t) i f g 0 f0g fg0 g2

Slide 20

Slide 20 text

ϩεΛग़ྗ૚Ͱඍ෼ ·ͱΊΔͱ… X c @L(t) @ˆ y(t) c · @ˆ y(t) c @o(t) i = @L(t) @ˆ y(t) i · @ˆ y(t) i @o(t) i + X c6=i @L(t) @ˆ y(t) c · @ˆ y(t) c @o(t) i = y(t) i ˆ y(t) i y(t) i + X c6=i y(t) c · ˆ y(t) i = y(t) i + X c y(t) c · ˆ y(t) i = ˆ y(t) i y(t) i (* X c y(t) c = 1) ) @L @o(t) i = @L @L(t) · @L(t) @o(t) i = @L @L(t) · X c @L(t) @ˆ y(t) c · @ˆ y(t) c @o(t) i = ˆ y(t) i y(t) i ϕΫτϧͰදݱ͢Δͱʁ JOEJDBUPSGVODUJPOΛ࢖͏  ڭՊॻͱҰக͢Δ

Slide 21

Slide 21 text

ϩεΛதؒ૚Ͱඍ෼ with the graph unrolling and parameter sharing ideas of section 10.1, we gn a wide variety of recurrent neural networks. U U V V W W o (t 1) o (t 1) h h o o y y L L x x o (t) o (t) o (t+1) o (t+1) L(t 1) L(t 1) L(t) L(t) L (t+1) L (t+1) y (t 1) y (t 1) y (t) y (t) y (t+1) y (t+1) h (t 1) h (t 1) h (t) h (t) h (t+1) h (t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) W W W W W W W W h (... ) h (... ) h (... ) h (... ) V V V V V V U U U U U U Unfold http://www.deeplearningbook.org/lecture_slides.html

Slide 22

Slide 22 text

ϩεΛதؒ૚Ͱඍ෼ • ϩεͷಉ࣌͡ࠁͷग़ྗ૚Ͱͷޯ഑ͱ  ϩεͷ1࣌ࠁઌͷதؒ૚Ͱͷޯ഑ͷ࿨ @L @h(t) i = @L @L(t) · ⇣X u @L(t) @h(t+1) u @h(t+1) u @h(t) i + X c @L(t) @o(t) c @o(t) c @h(t) i ⌘ = @L(t) @h(t+1) i @h(t+1) i @h(t) i + @L(t) @o(t) i @o(t) i @h(t) i ⇣ * @h(t+1) u @h(t) i = 0 (u 6= i) & @o(t) c @h(t) i = 0 (c 6= i) ⌘

Slide 23

Slide 23 text

ϩεΛதؒ૚Ͱඍ෼ h(t) i = tanh(a(t) i ) @h(t+1) i @h(t) i = @ tanh(a(t+1) i ) @h(t) i = ⇣ 1 tanh2(a(t+1) i ) ⌘ · @ (bi + Wh(t) i + Ux(t+1) i ) @h(t) i = ⇣ 1 tanh2(a(t+1) i ) ⌘ · X u Wiu ) @L(t) @h(t+1) i @h(t+1) i @h(t) i = ⇣ 1 tanh2(a(t+1) i ) ⌘ · (rh(t+1)L )i · X u Wiu )rh(t) L = WT · (rh(t+1) L) · diag ⇣ 1 (h(t+1)))2 ⌘ @o(t) i @h(t) i = @ ci + V h(t) i @h(t) i = X u Viu ) @L @h(t) i = ro(t) L i X u Viu )rh(t) L = V T ro(t) L

Slide 24

Slide 24 text

ϩεΛதؒ૚Ͱඍ෼ ੔ཧ͢Δͱ… @L @h(t) i = @L @L(t) · ⇣X u @L(t) @h(t+1) u @h(t+1) u @h(t) i + X c @L(t) @o(t) c @o(t) c @h(t) i ⌘ = @L(t) @h(t+1) i @h(t+1) i @h(t) i + @L(t) @o(t) i @o(t) i @h(t) i ⇣ * @h(t+1) u @h(t) i = 0 (u 6= i) & @o(t) c @h(t) i = 0 (c 6= i) ⌘ )rh(t) L = WT · (rh(t+1) L) · diag ⇣ 1 (h(t+1)))2 ⌘ + V T ro(t) L

Slide 25

Slide 25 text

ϩεΛύϥϝʔλͰඍ෼ • ܭࢉάϥϑ্ͷ֤ϊʔυͰͷޯ഑͕ܭࢉͰ͖ͨ  => ͜ΕΒΛ࢖ͬͯύϥϝʔλͰͷޯ഑΋ܭࢉ͢Δ ύϥϝʔλ͸֤࣌ࠁͰڞ༗͞Ε͍ͯΔ  => ޯ഑ΛऔΔͱ͖ʹ࣌ࠁͰͷ࿨ΛऔΔඞཁ͕͋Δ a(t) = b + Wh(t 1) + Ux(t) h(t) = tanh(a(t)) o(t) = c + Vh(t) ˆ y(t) = softmax(o(t))

Slide 26

Slide 26 text

ϩεΛύϥϝʔλc͓ΑͼbͰඍ෼ with the graph unrolling and parameter sharing ideas of section 10.1, we gn a wide variety of recurrent neural networks. U U V V W W o (t 1) o (t 1) h h o o y y L L x x o (t) o (t) o (t+1) o (t+1) L(t 1) L(t 1) L(t) L(t) L (t+1) L (t+1) y (t 1) y (t 1) y (t) y (t) y (t+1) y (t+1) h (t 1) h (t 1) h (t) h (t) h (t+1) h (t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) W W W W W W W W h (... ) h (... ) h (... ) h (... ) V V V V V V U U U U U U Unfold http://www.deeplearningbook.org/lecture_slides.html Ͳ͔͜

Slide 27

Slide 27 text

ϩεΛύϥϝʔλc͓ΑͼbͰඍ෼ @L @ci = X t X k @L(t) @o(t) k · @o(t) k @ci = X t @L(t) @o(t) i (* i 6= k ) @o(t) k @ci = 0) )rcL = X t ro(t) L(t) @L @bi = X t X c @L(t) @h(t) c · X k @h(t) c @a(t) k · @a(t) k @bi = X t (rh(t) L(t))i · (1 tanh2(h(t) i )) (* k 6= i ) @a(t) k @bi = 0) ) rbL = X t diag(1 (h(t))2)rh(t) L

Slide 28

Slide 28 text

ϩεΛύϥϝʔλVͰඍ෼ with the graph unrolling and parameter sharing ideas of section 10.1, we gn a wide variety of recurrent neural networks. U U V V W W o (t 1) o (t 1) h h o o y y L L x x o (t) o (t) o (t+1) o (t+1) L(t 1) L(t 1) L(t) L(t) L (t+1) L (t+1) y (t 1) y (t 1) y (t) y (t) y (t+1) y (t+1) h (t 1) h (t 1) h (t) h (t) h (t+1) h (t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) W W W W W W W W h (... ) h (... ) h (... ) h (... ) V V V V V V U U U U U U Unfold http://www.deeplearningbook.org/lecture_slides.html

Slide 29

Slide 29 text

ϩεΛύϥϝʔλVͰඍ෼ ( @L @V )ij = X t X c @L @L(t) · @L(t) @o(t) c · @o(t) c @Vij = X t @L(t) @o(t) i · h(t) j = X t @L @o(t) i · h(t) j * @o(t) c @Vij = @(ci + Vh(t) i ) @Vij = h(t) j (* Vh(t) i = X k Vikh(t) k ) ) rV L = X t (ro(t) L)h(t)T

Slide 30

Slide 30 text

ϩεΛύϥϝʔλWͰඍ෼ with the graph unrolling and parameter sharing ideas of section 10.1, we gn a wide variety of recurrent neural networks. U U V V W W o (t 1) o (t 1) h h o o y y L L x x o (t) o (t) o (t+1) o (t+1) L(t 1) L(t 1) L(t) L(t) L (t+1) L (t+1) y (t 1) y (t 1) y (t) y (t) y (t+1) y (t+1) h (t 1) h (t 1) h (t) h (t) h (t+1) h (t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) W W W W W W W W h (... ) h (... ) h (... ) h (... ) V V V V V V U U U U U U Unfold http://www.deeplearningbook.org/lecture_slides.html

Slide 31

Slide 31 text

ϩεΛύϥϝʔλWͰඍ෼ ࿈࠯཯ ( @L @W )ij = X t X u @L @h(t) u · @h(t) u @Wij = X t @L @h(t) i · @h(t) i @Wij (* h(t) i = tanh(a(t) i ) ) @h(t) i @Wij = X c @h(t) i @a(t) c · @a(t) c @Wij = @h(t) i @a(t) i · @a(t) i @Wij ) = 1 tanh2(a(t) i ) · @a(t) i @Wij = (1 h(t)2 )h(t 1) j (* (Wh(t))i = X k Wikh(t) k ) @(Wh(t))i @Wij = h(t) j )

Slide 32

Slide 32 text

ϩεΛύϥϝʔλUͰඍ෼ with the graph unrolling and parameter sharing ideas of section 10.1, we gn a wide variety of recurrent neural networks. U U V V W W o (t 1) o (t 1) h h o o y y L L x x o (t) o (t) o (t+1) o (t+1) L(t 1) L(t 1) L(t) L(t) L (t+1) L (t+1) y (t 1) y (t 1) y (t) y (t) y (t+1) y (t+1) h (t 1) h (t 1) h (t) h (t) h (t+1) h (t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) W W W W W W W W h (... ) h (... ) h (... ) h (... ) V V V V V V U U U U U U Unfold http://www.deeplearningbook.org/lecture_slides.html

Slide 33

Slide 33 text

ϩεΛύϥϝʔλUͰඍ෼ ( @L @U )ij = X t X u @L @h(t) u · @h(t) u @Uij = X t @L @h(t) i · @h(t) i @Uij = X t @L @h(t) i · X c @h(t) i @a(t) c · @a(t) c @Uij = X t @L @h(t) i · @h(t) i @a(t) i · @a(t) i @Uij = X t @L @h(t) i · (1 tanh2(h(t) i )) · x(t) j (* Ux(t) i = X k Uikx(t) k ) @Ux(t) i @Uij = x(t) j ) ) rU L = X t diag ⇣ 1 (h(t))2 ⌘ (rh(t) L)x(t)T

Slide 34

Slide 34 text

ޯ഑ʹ͍ͭͯͷ·ͱΊ rcL = X t ro(t) L(t) rbL = X t diag(1 (h(t))2)rh(t) L rWL = X t diag ⇣ 1 (h(t))2 ⌘ (rh(t) L)h(t 1)T rUL = X t diag ⇣ 1 (h(t))2 ⌘ (rh(t) L)x(t)T rVL = X t (ro(t) L)h(t)T ͜ΕͰֶश͕Ͱ͖Δʂ

Slide 35

Slide 35 text

• RNNͷґଘ͸ϊʔυͱΤοδͰදݱͰ͖Δ • ҎԼͷRNNͰ͸ग़ྗ͸֤࣌ࠁಠཱʹߦΘΕ͍ͯΔ 10.2.3 ༗޲άϥϑΟΧϧϞσϧͱͯ͠ͷRNN http://www.deeplearningbook.org/lecture_slides.html p(y(t)|x(1), . . . , x(t))

Slide 36

Slide 36 text

• աڈͷ༧ଌͷ஋Λߟྀ͢ΔͱɼάϥϑΟΧϧϞσϧ͸… • ͨͩ͠ɼਤ͸ೖྗΛ࣋ͨͳ͍RNNͷྫ • ͔ͭɼRNNͷӅΕ૚Λແࢹ͍ͯ͠Δ • ೖྗΛ࣋ͭͳΒҎԼͷΑ͏ͳ৚݅෇͖֬཰Ͱ༧ଌΛߦ͏ աڈͷ༧ଌ஋Λߟྀͯ͠༧ଌΛߦ͏ͳΒʁ p(y(t)|x(1), . . . , x(t), y(1), . . . , y(t 1)) http://www.deeplearningbook.org/lecture_slides.html

Slide 37

Slide 37 text

ແࢹ͍ͯͨ͠ӅΕม਺Λߟྀ͢Δͱʁ • աڈͱະདྷͷ༧ଌʹ͍ͭͯɼ௚઀ͷ઀ଓ͕੾Εɼ  ༧ଌ݁ՌͰ͸ͳ͘தؒදݱΛ༻͍ͯ༧ଌ͕Մೳʹ • ඞཁͳม਺͕؍ଌ͞Ε͍ͯͯɼӅΕ૚ͷ࣍ݩ͕ద੾  Ͱ͋Ε͹ɼաڈͷ৘ใΛద੾ʹύϥϝʔλͱͯ࣋ͯ͠Δ http://www.deeplearningbook.org/lecture_slides.html

Slide 38

Slide 38 text

RNNͷ೉͠͞: ܥྻͷ௕͞ͷܾఆ • ೖྗ͕ܧଓతʹಘΒΕΔͱ͖ɼग़ྗܥྻͷऴྃ͸Ͳ͜ʁ  ʢͲ͔͜ΒͲ͜·Ͱ͕1ͭͷܥྻʁʣ 1. ܥྻͷऴΘΓΛද͢γϯϘϧΛೖΕֶͯश͓ͯ͘͠ 2. ֤εςοϓͰܥྻ͕ऴྃ͢Δ͔Ͳ͏͔Λ൑ఆ͢Δ  ϕϧψʔΠग़ྗΛϞσϧʹ͚ͬͭ͘Δ 3. ϞσϧͷதͰܥྻ௕΋ֶशɾ༧ଌ͢Δ  ʢϞσϧ͸ܥྻ௕ΛαϯϓϦϯάޙʹܥྻΛ༧ଌ͢Δʣ p(x(1), . . . , x(⌧)) = P(⌧)P(x(1), . . . , x(⌧)|⌧)

Slide 39

Slide 39 text

10.2.4 RNNʹΑͬͯจ຺Λߟྀͨ͠ܥྻϞσϦϯά • RNNͷηϧ͸ೖྗͷ֤࣌ࠁͷ஋ ͷΈΛड͚औΔʁ  => ඞͣ͠΋ͦͷඞཁ͸ͳ͘ɼશೖྗΛड͚ͯ΋ྑ͍ • ৚݅෇͖֬཰৔ (CRF) Ͱͷٞ࿦ͱࣅͨΑ͏ͳײ͡ʁ http://www.deeplearningbook.org/lecture_slides.html x(t) P(y(1), . . . , y(⌧)|x(1), . . . , y(⌧)) = Y t P(y(t)|x(1), . . . , y(⌧)) ܥྻͷ֬཰ʹରͯ͠  ͜ͷΑ͏ͳ෼ղΛ͍ͯ͠ΔͱΈͳͤΔ

Slide 40

Slide 40 text

10.3 Bidirectional RNN • ॱํ޲ͱٯํ޲ͷRNNΛ݁߹͢Δ http://www.deeplearningbook.org/lecture_slides.html

Slide 41

Slide 41 text

10.3 Bidirectional RNN • ॱํ޲ͱٯํ޲ͷRNNΛ݁߹͢Δ • Ի੠ͷೝࣝͳͲͰ͸ɼݱࡏ·ͰͷԻ͚ͩͰ͸ͳ͘  ͦͷઌͷԻΛߟྀͨ͠΄͏͕ྑ͍͜ͱ͕͋Δʢաڈ+ະདྷʣ • ΋ͪΖΜɼܥྻશମ͕؍ଌ͞Ε͍ͯΔඞཁ͸͋Δ • ͓͓ΑͦͷλεΫͰ͸ܥྻશମ͸؍ଌ͞Ε͍ͯΔ جຊతʹBidirectionalͷํ͕ੑೳ͕ྑ͍ʢओ؍ʣ

Slide 42

Slide 42 text

10.4 Encoder-Decoder Sequence-to-Sequence vector. We have seen in figure 10.9 how an RNN can map a fixed-size vector to a sequence. We have seen in figures 10.3, 10.4, 10.10 and 10.11 how an RNN can map an input sequence to an output sequence of the same length. Encoder … x (1) x (1) x (2) x (2) x (...) x (...) x (nx) x (nx) Decoder … y (1) y (1) y (2) y (2) y (...) y (...) y (ny) y (ny) C C http://www.deeplearningbook.org/lecture_slides.html

Slide 43

Slide 43 text

10.4 Encoder-Decoder Sequence-to-Sequence • Encoder͸ೖྗΛจ຺ϕΫτϧʹූ߸Խ • Decoder͸จ຺ϕΫτϧΛ෮߸Խ͠ग़ྗΛಘΔ • Encoder-Decoder͸Մม௕ͷೖྗɾग़ྗ͕Մೳ (seq2seq) • ຋༁ͳͲɼೖग़ྗͷܥྻ௕͕Ұக͠ͳ͍৔߹ʹخ͍͠ • RNNΛ࢖ͬͨEncoder-DecoderϞσϧΛ  RNN Sequence-to-Sequenceͱ͍͏

Slide 44

Slide 44 text

10.4 จ຺Cʹ͍ͭͯ • C͸ݻఆ௕ͷϕΫτϧͰ͋Δ͜ͱ͕ଟ͍ • χϡʔϥϧػց຋༁ͷ࿦จ [Bahdanau+, 2015] Ͱ͸… • C͸Մม௕ͷϕΫτϧͰ΋ྑ͍ • ΞςϯγϣϯػߏΛಋೖͯ͠จ຺ΛΑΓ׆༻͢Δ https://arxiv.org/pdf/1409.0473.pdf

Slide 45

Slide 45 text

10.5 Deep Recurrent Neural Network h y x z (a) (b) (c) x h y x h y http://www.deeplearningbook.org/lecture_slides.html

Slide 46

Slide 46 text

• ৭ʑͳੵΈํ͕͋Δ • ୯७ʹRNN૚ͷ࣍ʹRNN૚Λ௥Ճ (a) • RNN૚ͷग़ྗΛMLPʹೖྗɼಘΒΕͨग़ྗ  ΛRNNͷ࣍ͷೖྗͱ͢Δ (b) • (c) ͸εΩοϓ઀ଓΛಋೖ͢Δ͜ͱͰɼ  χϡʔϥϧωοτϫʔΫ্ͷϊʔυ͔Βϊʔυͷ  ࠷୹ڑ཭͕௕͘ͳͬͯ͠·͏͜ͱΛ๷͍Ͱ͍Δ 10.5 Deep Recurrent Neural Network

Slide 47

Slide 47 text

10.6 Recursive Neural Network http://www.deeplearningbook.org/lecture_slides.html 10.6 Recursive Neural Networks x (1) x (1) x (2) x (2) x (3) x (3) V V V y y L L x (4) x (4) V o o U W U W U W Figure 10.14: A recursive network has a computational graph that generalizes that of the

Slide 48

Slide 48 text

10.6 Recursive Neural Network • RNN (recurrent neural network): ઢܗͳܥྻσʔλ • RNN (recursive neural network): ໦ߏ଄Λ࣋ͭσʔλ CHAPTER 10. SEQUENCE MODELING: RECURRENT AND RECURSIVE NETS information flow forward in time (computing outputs and losses) and backward in time (computing gradients) by explicitly showing the path along which this information flows. 10.2 Recurrent Neural Networks Armed with the graph unrolling and parameter sharing ideas of section 10.1, we can design a wide variety of recurrent neural networks. U U V V W W o (t 1) o (t 1) h h o o y y L L x x o (t) o (t) o (t+1) o (t+1) L(t 1) L(t 1) L(t) L(t) L (t+1) L (t+1) y (t 1) y (t 1) y (t) y (t) y (t+1) y (t+1) h (t 1) h (t 1) h (t) h (t) h (t+1) h (t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) W W W W W W W W h (... ) h (... ) h (... ) h (... ) V V V V V V U U U U U U Unfold Figure 10.3: The computational graph to compute the training loss of a recurrent network that maps an input sequence of x values to a corresponding sequence of output o values. CHAPTER 10. SEQUENCE MODELING: RECURRENT AND RECURSIVE NETS can be mitigated by introducing skip connections in the hidden-to-hidden path, a illustrated in figure 10.13c. 10.6 Recursive Neural Networks x (1) x (1) x (2) x (2) x (3) x (3) V V V y y L L x (4) x (4) V o o U W U W U W Figure 10.14: A recursive network has a computational graph that generalizes that of th recurrent network from a chain to a tree. A variable-size sequence x(1), x(2), . . . , x(t) ca be mapped to a fixed-size representation (the output o), with a fixed set of paramete http://www.deeplearningbook.org/lecture_slides.html

Slide 49

Slide 49 text

10.6 Recursive Neural Network http://nlp.stanford.edu:8080/sentiment/rntnDemo.html

Slide 50

Slide 50 text

10.7 ௕ڑ཭هԱ PTER 10. SEQUENCE MODELING: RECURRENT AND RECURSIVE NETS 60 40 20 0 20 40 60 Input coordinate 4 3 2 1 0 1 2 3 4 Projection of output 0 1 2 3 4 5 e 10.15: When composing many nonlinear functions (like the linear-tanh layer http://www.deeplearningbook.org/lecture_slides.html

Slide 51

Slide 51 text

10.7 ௕ڑ཭هԱ • RNNͰ͸ޯ഑ͷফࣦɾരൃ͕େ͖ͳ໰୊ͱͳΔ • ޯ഑͸ফࣦ͢Δ͜ͱ͕ଟ͍ • كʹരൃ͠ɼ࠷దԽʹѱӨڹΛ༩͑Δ • ྑ͍ύϥϝʔλ͕༩͑ΒΕ͍ͯͨͱͯ͠΋  Өڹྗ͸ڑ཭ʹରͯ͠ࢦ਺വ਺తʹখ͘͞ͳΔ

Slide 52

Slide 52 text

10.7 ௕ڑ཭هԱ • RNNͷӅΕ૚ • ύϥϝʔλߦྻ͕ݻ༗஋෼ղՄೳͱ͢Δͱ ht = Wht 1 = W(Wht 2) = WW(Wht 3) · · · = Wth0 ht = Wht 1 = (Q⇤Q 1)(Q⇤Q 1) . . . (Q⇤Q 1)h0 = Q⇤tQ 1h0 (where W = Q⇤Q 1)

Slide 53

Slide 53 text

10.7 ௕ڑ཭هԱ • ݻ༗஋͕1ΑΓେ͖͍৔߹ -> രൃ • ݻ༗஋͕1ΑΓখ͍͞৔߹ -> ফࣦ • ࠷େͷݻ༗ϕΫτϧͱಉ͡޲͖Λ࣋ͨͳ͍ɹ ͷཁૉ͸  ࠷ऴతʹഁغ͞ΕΔ (?) h0