Slide 1

Slide 1 text

Visualizing and Understanding Neural Machine Translation Yanzhuo Ding, Yang Liu, Huanbo Luan, Maosong Sun ACL 2017 ※εϥΠυதͷਤද͸࿦จ͔ΒҾ༻͞Εͨ΋ͷ খொक ୈ9ճ࠷ઌ୺NLPษڧձ@ϦΫϧʔτMTLΧϑΣ 2017/09/15

Slide 2

Slide 2 text

ਂ૚ֶशʹΑΓػց຋༁ͷ࣭͕ ۙ೥ஶ͘͠޲্͍ͯ͠·͢ 2 https://research.googleblog.com/2016/09/a- neural-network-for-machine.html

Slide 3

Slide 3 text

χϡʔϥϧ຋༁ಛ༗ͷ໰ ୊͕͍͔ͭ͋͘Γ·͢ | ະ஌ޠͷѻ͍ →the largest UNK in the world | Under-translation, over-translation →in the history of the history of the history of the … | શવؔ܎ͷͳ͍୯ޠΛग़ྗ 3 Sato et al., Japanese-English Machine Translation of Recipe Texts. WAT 2016.

Slide 4

Slide 4 text

NMT ಛ༗ͷޡΓͷΤϥʔ ෼ੳ͸؆୯Ͱ͸͋Γ·ͤ Μ ݪ จ খܕ ߕ֪ ྨ ͯ㿆 ͸ , Ξϛ ྨ ͷ ΞΧΠιΞϛ , ϫϨΧ ϥ ྨ ͷ χοϙϯϫϨΧϥ ͱ πΧ㿆ϧϫϨΧϥ ͸ ҵ৓ ݝ ͯ㿆 ॳΊͯ ֬ೝ ͞ Ε ͨ ɻ N M T in small crustaceans , and of and were con- firmed for the first time in Ibaraki Prefecture . ࢀ র ༁ among the small-type Crustacea , Paracanthomysis hispida of Mysidae , and Caprella japonica and C. tsugarensis of Caprellidae were confirmed for the first time in Ibaraki Prefecture . 4 Matsumura and Komachi. Tokyo Metropolitan University Neural Machine Translation System for WAT 2017. WAT 2017.

Slide 5

Slide 5 text

Ξςϯγϣϯ͸໾ʹཱ͕ͭ NMT ͷσόοάʹ͸ෆे෼ 5 Matsumura et al., English-Japanese Neural Machine Translation with Encoder-Decoder-Reconstructor. arXiv 2017.

Slide 6

Slide 6 text

ຊݚڀͷ3ߦ·ͱΊ | ૚͝ͱͷద߹ੑ఻೻ LRP: layer-wise relevance propagation (Bach et al., 2015) Λ༻͍ͯ NMT ͷՄࢹԽͱղऍΛ͢Δख๏ΛఏҊ | Ξςϯγϣϯʹجͮ͘Τϯίʔμɾσίʔμϑ ϨʔϜϫʔΫ (Bahdanau et al., 2015) ʹ LRP ΛదԠ | தӳ຋༁ͰέʔεελσΟΛߦ͍ɺNMT ͷ຋༁ ޡΓΛ෼ੳʢ→Ξςϯγϣϯ͚ͩΛ༻͍Δͷͱ ൺ΂ͯɺղऍɾσόοά͠΍͍͢ʣ 6

Slide 7

Slide 7 text

2017೥ݱࡏ;ͭ͏ͷ NMT 𝑃 𝒚 𝒙; 𝜽 = ' !"# $ 𝑃(𝑦!|𝒙, 𝒚%!; 𝜽) | 𝑃(𝑦!|𝑥, 𝑦%!; 𝜃) = 𝜌(𝑦!, 𝑠!, 𝑐!) | 𝑠! = 𝑔(𝑠!, 𝑦!, 𝑐!) | 𝑐! = ∑'"# ()# 𝛼!,'ℎ' { ℎ! = [ℎ! ; ℎ! ] { ℎ! = 𝑓(ℎ!"# , 𝑥! ) { ℎ! = 𝑓(ℎ!$# , 𝑥! ) 7 x: ೖྗʢI୯ޠʣ y: ग़ྗʢJ୯ޠʣ f, g, ρ: ඇઢܗؔ਺

Slide 8

Slide 8 text

χϡʔϥϧωοτϫʔΫ ͷՄࢹԽɾղऍͷ໰୊ઃ ఆ |࠷ऴग़ྗ૚ʹೖྗ ૚ͷϢχοτ͕Ͳ Ε͘Β͍ߩݙ͢Δ ͔ܭࢉ (Bach et al., 2015; Li et al., 2016) |Ξςϯγϣϯʹج ͮ͘Τϯίʔμɾ σίʔμͰ஌Γͨ ͍ͷ͸ɺݪݴޠͱ ໨తݴޠͷ୯ޠ͕ ӈͷঢ়ଶʹͲΕ͘ Β͍ߩݙ͢Δ͔ 1. ℎ! = 𝑓(ℎ!"#, 𝑥!): ݪݴޠͷલ޲͖ӅΕঢ়ଶ 2. ℎ! = 𝑓(ℎ!$#, 𝑥!): ݪݴޠͷޙΖ޲͖ӅΕঢ়ଶ 3. ℎ! = [ℎ!; ℎ!]: ݪݴޠͷӅΕঢ়ଶ 4. 𝑐% = ∑! '$# 𝛼%,!ℎ! : ݪݴޠͷจ຺ϕΫτϧ 5. 𝑠% = 𝑔 𝑠%"#, 𝑦%, 𝑐% : ໨తݴޠͷӅΕঢ়ଶ 6. 𝑦% : ໨తݴޠͷ୯ޠຒΊࠐΈ 8

Slide 9

Slide 9 text

“New York” ͷϕΫτϧʹର͢ Δೖྗͱग़ྗϕΫτϧͷؔ࿈౓ | “York” Λग़͢ͱ͖ɺ ೖྗͱग़ྗͷ྆ํʹ ؔ܎͍ͯ͠Δʢࠇ͘ ͳ͍ͬͯΔʣ | ӅΕঢ়ଶͱจ຺ϕΫ τϧΛͲͷΑ͏ʹܭ ࢉɾՄࢹԽ͢Ε͹͍ ͍͔ʁ 9

Slide 10

Slide 10 text

χϡʔϩϯϨϕϧͷؔ࿈౓͔Β ϕΫτϧͷؔ࿈౓Λܭࢉ͢Δ 10 W ͷܭࢉ͸ Eq. (15)-(17)

Slide 11

Slide 11 text

୯७ͳϑΟʔυϑΥϫʔυ ωοτϫʔΫͰؔ࿈౓Λܭࢉ | Wͷܭࢉ͸ԋࢉ಺༰ʢߦྻͷੵɺཁૉ͝ͱͷੵɺ ࠷େ஋౳ʣʹΑͬͯม͑Δ (Bach et al., 2015) | O(|G|º|V|ºOmax ) ͰܭࢉՄೳ { Omax ͸ωοτϫʔΫதͷχϡʔϩϯͷ࠷େ࣍਺ { ωοτϫʔΫશମͷχϡʔϩϯͷܭࢉΛ͢Δͷ ͰΞςϯγϣϯͷܭࢉΑΓॏ͍͕ɺฒྻܭࢉ΍ ΩϟογϡʹΑͬͯߴ଎ԽͰ͖Δ 11

Slide 12

Slide 12 text

தӳ຋༁ͰՄࢹԽ࣮ݧ | σʔλ { ܇࿅: 125ສจͷύϥϨϧίʔύε { ։ൃ: NIST 2003ʢϞσϧબ୒ʣ { ςετ: NIST 2004ʢՄࢹԽʣ | πʔϧ { GroundHog (Bahdanau et al., 2015) →։ൃσʔλͰͷ BLEU είΞ͸ 32.73 12

Slide 13

Slide 13 text

ݪݴޠϕΫτϧͷՄࢹԽ: ྆ํ޲ͷจ຺Λߟྀ 13

Slide 14

Slide 14 text

໨తݴޠϕΫτϧͷՄࢹԽ: Ξςϯγϣϯͱͷҧ͍΋͋Δ 14

Slide 15

Slide 15 text

ະ஌ޠϕΫτϧͷՄࢹԽ: पลจ຺Λݟ͍ͯΔͷ͕Θ͔Δ 15

Slide 16

Slide 16 text

͔͜͜Βχϡʔϥϧ຋༁ಛ༗ͷ ໰୊ͷΤϥʔ෼ੳʢ࠶ܝʣ | ະ஌ޠͷѻ͍ →the largest UNK in the world | Under-translation, over-translation →in the history of the history of the history of the … | શવؔ܎ͷͳ͍୯ޠΛग़ྗ 16 Sato et al., Japanese-English Machine Translation of Recipe Texts. WAT 2016.

Slide 17

Slide 17 text

ະ຋༁ͷޡΓ෼ੳ: จ຤ه߸Λग़͢ͷ͕ૣ͗ͨ͢ 17

Slide 18

Slide 18 text

܁Γฦ͠ͷޡΓ෼ੳ: ໨తݴޠͷจ຺ͷӨڹ͕େ͖͍ 18

Slide 19

Slide 19 text

ஔ׵ޡΓͷޡΓ෼ੳ: ೖग़ྗͷଞͷ୯ޠΛݟ͍ͯΔ 19

Slide 20

Slide 20 text

൱ఆͷ൓సͷޡΓ෼ੳ: ໨తݴޠͷจ຺ͷӨڹ͕ڧ͍ 20

Slide 21

Slide 21 text

༨෼ͳ୯ޠͷޡΓ෼ੳ: จ຤ه߸Λʢͳ͔ͥʣݟ͍ͯΔ 21

Slide 22

Slide 22 text

NMTͷσίʔσΟϯάͱ Τϥʔ෼ੳͷ·ͱΊ | Ξςϯγϣϯ͸ݪݴޠͱ໨తݴޠͷؔ࿈ͷ෼ੳ ʹ͸໾ʹཱ͕ͭɺΞςϯγϣϯ͚ͩͰ͸໨తݴ ޠͷ୯ޠੜ੒ͷཧղʹ͸ෆे෼ | จ຺தͷ୯ޠͷؔ࿈౓͸ӅΕ૚͝ͱʹ͔ͳΓҟ ͳΔʢΞςϯγϣϯͱ΋ҟͳΔʣ | ୯ޠੜ੒ʹ͸໨తݴޠͷจ຺͕ͱͯ΋ॏཁ | จ຤ه߸͕͍ΖΜͳ໰୊ʢະ຋༁ɺؔ܎ͳ͍୯ ޠͷੜ੒ɺ౳ʣͷݪҼͰ͋ΔՄೳੑ 22

Slide 23

Slide 23 text

ਂ૚ֶशͷՄࢹԽͷݚڀ ͸࢝·ͬͨ͹͔Γ | ը૾ೝࣝͰ͸ग़ྗ૚ʹͲΕ͘Β͍ೖྗ૚ͷ৘ใ ͕ؔ܎͢Δͷ͔ܭࢉ͢Δݚڀ͕੝Μʹ͋Δ (Bach et al., 2015; Li et al., 2016; …) { Bach et al. (2015) ͱͷҧ͍͸ɺNMT ͷೖྗ͸1 ϐΫηϧͰ͸ͳ͘୯ޠϕΫτϧͰ͋Δ͜ͱ →ϕΫτϧϨϕϧͷద߹ੑͱॏΈΛܭࢉ͢Δ { Li et al. (2016) ͱͷҧ͍͸ɺภඍ෼Ͱ͸ͳ͘ద߹ ੑΛ༻͍͍ͯΔ͜ͱ →׆ੑԽؔ਺͕ඍ෼ՄೳͰ΋׈Β͔Ͱͳͯ͘΋ ͍͍ | Ξςϯγϣϯ͸ιʔεͱλʔήοτͷ୯ޠͷؔ ܎͔͠ݟΒΕͳ͍͕ɺద߹ੑ͸೚ҙͷӅΕ૚ͷ ؒͷؔ܎ͷ౓߹͍Λܭࢉ͢ΔͨΊʹ࢖͑Δ 23

Slide 24

Slide 24 text

·ͱΊͱࠓޙͷ՝୊ /.5ͷՄࢹԽͱղऍ | Layer-wise relevance propagation Λ༻͍Δ ͜ͱͰ NMT ͷՄࢹԽͱղऍΛߦ͏ख๏ΛఏҊ { ೚ҙͷӅΕ૚ͱจ຺ͷؒͷؔ࿈౓ΛܭࢉՄೳ { ΞςϯγϣϯϝΧχζϜΑΓਂ͍෼ੳ͕Մೳ | ࠓޙͷ՝୊ { ଞͷ NMT ϞσϧɺଞͷݴޠରͰͷ༗ޮੑ { ຋༁ͷؔ࿈౓Λ໌ࣔతʹߟྀ͢ΔΑ͏ͳϞσϧ 24

Slide 25

Slide 25 text

ॴײ | ΞςϯγϣϯΛݟΔ͘Β͍͔͠؆୯ʹNMTͷ෼ ੳ͕Ͱ͖ͳ͍ͱࢥ͍ͬͯͨͷͰɺ༗ޮͦ͏ɻ { NMT ಛ༗ͷޡΓͷݪҼͷݕ౼͕ͭ͘ͷ͸େ͖͍ { ܭࢉࣗମ͸݁ߏॏͨͦ͏ʢΩϟογϡ͢Ε͹͍ ͍ͱ͔ॻ͍ͯ͋Δ͕ɺ1૚ͷγϯϓϧͳ NMT ͩ ͔Β͜ΕͰಈ͍͍ͯΔͷͰ͸ʁʣ | ໨తݴޠͷจ຺͸໌Β͔ʹॏཁ͕ͩɺจ຤ه߸ ͸ຊ౰ʹ຋༁ޡΓͷݪҼͳͷ͔ʁ→ଞͷ໰୊͕ จ຤ه߸ʹݱΕ͍ͯΔͷͰ͸ͳ͍͔ʁ 25

Slide 26

Slide 26 text

࣭ٙԠ౴ᶃ | Q: ՄࢹԽ͕Ͱ͖Δͷ͸෼͔͕ͬͨɺ࣮ࡍʹ NMT ͷσόοά͕Ͱ͖ΔΑ͏ͳํ๏͸ఏҊ͞Ε ͍ͯΔͷ͔ʁ A: σόοάํ๏·Ͱ͸ఏҊ͞Ε͓ͯΒͣɺTu et al. (2017) ͷΑ͏ͳ context gate Λߟྀ͢Δॏ ཁੑʹ͍ͭͯࢦఠ͞Ε͍ͯͨɻকདྷతʹ͸͜͜ ͰޡΓͷݪҼͷݕ౼Λ͚ͭͯվળ͍͖͍ͯͨ͠ɻ 26

Slide 27

Slide 27 text

࣭ٙԠ౴ᶄ | Q: ͜ͷΑ͏ͳ෼ੳ͕Ͱ͖Δྫ͕͋Δͷ͸෼͔ͬ ͕ͨɺ࣮ࡍʹ͜ͷΑ͏ͳ෼ੳ͕Ͱ͖Δͷ͸ఆྔ తʹ͸ͲΕ͘Β͍͋Δͷ͔ʁ A: NMT ͱ PBSMT ͷΤϥʔͷ෼෍ʹ͍ͭͯ͸ྫ ͑͹ Sato et al. (2016) Ͱௐ΂͕ͨɺͦΕͧΕ ͷதͰͲΕ͘Β͍͕ࠓճͷख๏Ͱ͖Ε͍ʹՄࢹ ԽͰ͖Δͷ͔͸෼͔Βͳ͍ɻࠓޙ࣮૷ͯ͠ௐ΂ ͯΈ͍ͨɻ 27

Slide 28

Slide 28 text

࣭ٙԠ౴ᶅ | Q: ໨తݴޠͷจ຺͕ॏཁͩͱ͍͏ͷ͸෼͔ͬͨ ͕ɺͦΕҎ֎ͷ৘ใ͸ຊ౰ʹॏཁͳͷ͔ʁ A: ͦΕ͸ٙ໰ɻ࣮ࡍɺจ຤ه߸ʹؔ͢Δٞ࿦͸ ͔ͳΓո͍͠ͱߟ͍͑ͯΔɻଞͷݪҼͰ຋༁ޡ Γ͕ى͖͍ͯΔͷ͕ɺจ຤ه߸΁ͷΞςϯγϣ ϯ·ͨ͸จ຤ه߸΁ͷؔ࿈౓ͷूதͱ͍͏ܗͰ ग़͖͍ͯͯΔͷ͔ͳͱࢥ͍ͬͯΔɻ 28

Slide 29

Slide 29 text

ࢀߟจݙ | Ding et al. Visualizing and Understanding Neural Machine Translation. ACL 2017. | Bach et al. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE 2015. | Li et al. Visualizing and understanding neural models in NLP. NAACL 2016. | Tu et al. Context gates for neural machine translations. ACL 2017. 29

Slide 30

Slide 30 text

ʢट౎େͷNMTʣؔ࿈จ ݙ | Matsumura et al. English-Japanese Neural Machine Translation with Encoder-Decoder- Reconstructor. arXiv 2017. | Sato et al. Japanese-English Machine Translation of Recipe Texts. WAT 2016. | Yamagishi et al. Improving Japanese-to- English Neural Machine Translation by Voice Prediction. IJCNLP 2017. | Matsumura and Komachi. Tokyo Metropolitan University Neural Machine Translation System for WAT 2017. WAT 2017. 30