Slide 1

Slide 1 text

Sequence-to-Dependency Neural Machine Translation Shuangzhi Wu, Dongdong Zhang, Nan Yang, Mu Li, Ming Zhou ACL 2017 ※εϥΠυதͷਤද͸࿦จ͔ΒҾ༻͞Εͨ΋ͷ খொक ACL2017 ಡΈձ@౦޻େ͔͚ͣ͢୆Ωϟϯύε 2017/09/19

Slide 2

Slide 2 text

ਂ૚ֶशʹΑΓػց຋༁ͷ࣭͕ ۙ೥ஶ͘͠޲্͍ͯ͠·͢ 2 https://research.googleblog.com/2016/09/a- neural-network-for-machine.html

Slide 3

Slide 3 text

ґଘߏ଄͸୯ޠ༧ଌͷ໾ʹཱͭ 3

Slide 4

Slide 4 text

λʔήοτݴޠͷґଘߏ଄Λ NMT ʹೖΕΔͷ͸೉͍͠ | RNN ʹґଘߏ଄໦ͷΑ͏ͳ౷ޠߏ଄ΛೖΕΔͷ ͸ͲͷΑ͏ʹϞσϧԽ͢Ε͹͍͍͔ʁ | 1ͭͷχϡʔϥϧωοτϫʔΫͰλʔήοτͷ୯ ޠੜ੒ͱ౷ޠߏ଄ͷߏஙΛ͢Δʹ౰ͨͬͯɺͲ ͷΑ͏ʹ͢Ε͹ޮ཰త͔ʁ | λʔήοτͷ౷ޠߏ଄Λߟྀͨ͠จ຺ΛͲͷΑ ͏ʹޮ཰తʹ୯ޠੜ੒ʹ׆༻͢Δ͔ʁ 4

Slide 5

Slide 5 text

Sequence-to-dependency NMT Ͱґଘߏ଄Λ׆༻ | Τϯίʔμ͸ී௨ͷΞςϯγϣϯ͖ͭ bidirectional RNN | σίʔμ͸຋༁ͷ୯ޠੜ੒ͱґଘߏ଄໦Λಉ࣌ ʹղੳ͢ΔϞσϧ { 1ͭͷ RNN Ͱ୯ޠੜ੒ʢͪ͜Β͸ී௨ʣ { ΋͏1ͭͷ RNN Ͱ arc-standard shift-reduce ΞϧΰϦζϜʹΑΔґଘߏ଄ղੳ | தӳɺ೔ӳ຋༁Ͱ NMT/SMT ϕʔεϥΠϯΑΓ ౷ܭతʹ༗ҙͳੑೳ޲্ 5

Slide 6

Slide 6 text

| ೖྗ: 𝑋 = 𝑥!, … , 𝑥" | ग़ྗ: 𝑌 = 𝑦!, … , 𝑦# | ೖྗͷӅΕϕΫτϧ: 𝐻 = ℎ!, … , ℎ" | ग़ྗӅΕ૚: 𝑠$ | จ຺: 𝑐$ { Eq. (3)-(5) ͸Ξςϯ γϣϯͷܭࢉ 2017೥ݱࡏ;ͭ͏ͷ NMT (Sutskever et al., 2014; Bahdanau et al., 2015) 6

Slide 7

Slide 7 text

;ͭ͏ͷ arc-standard ґଘߏ଄ղੳ (Nivre, 2004) | ελοΫதͷҰ൪্ͷ2ͭͷ୯ޠ: 𝑤%, 𝑤! | ೖྗதͷ৽͍͠୯ޠ: - 𝑤 | 3ͭͷΞΫγϣϯ 7

Slide 8

Slide 8 text

Seq2dep NMT ͸ґଘߏ଄໦ T Λ shift-reduce ͷܥྻͰදݱ { 𝐴 = 𝑎! , … , 𝑎" ͨͩ͠ 𝑙 = 2nʢn ͸ X ͷ௕͞ʣ ※Ξςϯγϣϯͷ a ͱ͸ผ { 𝑎# 𝜖 SH, RR 𝑑 , LR(𝑑) ※͜͜Ͱ͸ϥϕϧ͖ͭґଘߏ଄ 8

Slide 9

Slide 9 text

୯ޠ RNN ͱґଘߏ଄ղੳ RNN Λಉ࣌ʹ༻͍ͯղੳ | ΞΫγϣϯ͕ shift ͷͱ͖͚ͩ୯ޠ RNN Ͱੜ੒ | ୯ޠ RNN ͕ EOS Λग़ྗ͠ɺ͔ͭελοΫͷத ͷશ୯ޠ͕ reduce ͞Εͨͱ͖ʹऴྃ 9

Slide 10

Slide 10 text

SD-NMT Ϟσϧʢ఺ઢ͸લͷঢ় ଶͷίϐʔɺփ৭ͷശ͸࿈݁ʣ 10

Slide 11

Slide 11 text

λʔήοτͷ୯ޠੜ੒ʹ ܎Γड͚ͷจ຺Λߟྀ͢Δ | ࠨӈͷ܎ؔ܎ʹ͋Δ୯ޠΛ bigram ૉੑͱͯ͠ߟྀ | ར༻Ͱ͖ͳ͍৔߹͸ 0 ϕΫτ ϧͰύσΟϯά | ӅΕঢ়ଶ s ͷߋ৽ʹ͸ K ͸༻ ͍ͳ͍ 11

Slide 12

Slide 12 text

SD-NMT͸୯ޠੜ੒RNNͱґଘ ߏ଄ղੳRNNΛಉ࣌ʹ࠷దԽ | ໨తؔ਺ | σίʔσΟϯάͷ࣌ʹ࢖͏είΞ 12

Slide 13

Slide 13 text

தӳɾ೔ӳ຋༁Ͱ࣮ݧ | σʔλ { தӳ: LDC͔Β200ສจରΛ܇࿅ɺNIST2003Λ։ ൃɺNIST2005, NIST2006, NIST2008, NIST2012 Λ ςετ { ೔ӳ: ASPEC͔Β100ສจରΛ܇࿅ɺ1,790จରΛ ։ൃɺ1,812จରΛςετ | πʔϧ { ґଘߏ଄͸ arc-eager ґଘߏ૝ղੳ (Zhang and Nivre, 2011) 13

Slide 14

Slide 14 text

SD-NMT ͷ࣮ݧઃఆ | ޠኮαΠζ: 30,000୯ޠʢ྆ଆʣ | ະ஌ޠॲཧ: unk ஔ׵ͱޙॲཧ (Luong et al., 2015) | ୯ޠຒΊࠐΈͱΞΫγϣϯຒΊࠐΈ: 512࣍ݩ | RNN ͷӅΕঢ়ଶ: 1024࣍ݩ | ॳظԽ: ਖ਼ن෼෍ (Glorot and Bengio, 2010) | ࠷దԽ: SGDʢֶश཰=1.0ʣͱ Adadelta | όοναΠζ: 96 | ϏʔϜαΠζ: 12ʢ୯ޠ༧ଌͱΞΫγϣϯ༧ଌ྆ํʣ 14

Slide 15

Slide 15 text

ϕʔεϥΠϯ͸ͲͪΒ΋ ஶऀΒ͕࣮૷ͨ͠Ϟσϧ | SMT { ֊૚తϑϨʔζϕʔεϞσϧ (Chiang, 2005) { English Gigaword ͱλʔήοτଆͷίʔύεͰ܇࿅͠ ͨ 4-gram ݴޠϞσϧʢKneser-Ney εϜʔδϯάʣ | NMT { RNNsearch (Bahdanau et al., 2015) { ύϥϝʔλ͸ SD-NMT ͱಉ͡ | ධՁ { தӳ: BLEU-4ʢBootstrap resampling Ͱ༗ҙࠩݕఆʣ { ೔ӳ: BLEU+RIBES 15

Slide 16

Slide 16 text

SD-NMT ͕ SMT ͱ NMT ϕʔεϥΠϯΑΓ΋ߴੑ ೳ | SD-NMT¥K ͸ target bigram dependency Λ ߟྀ͠ͳ͍Ϟσϧ | ଠࣈ͸ϕʔεϥΠϯͱൺֱͯ͠౷ܭత༗ҙ ʢp<0.05ʣ 16

Slide 17

Slide 17 text

SD-NMT ͷํ͕ݴޠϞσϧੑೳ ΋ʢ࠷ॳ͸ѱ͍͕ʣߴ͘ͳΔ 17

Slide 18

Slide 18 text

೔ӳ຋༁Ͱ΋SD-NMT͕ ϕʔεϥΠϯΑΓڧ͍ | Cromieres (2016) ͸ઌߦݚڀͷ਺ࣈͷίϐϖ | SD-NMT ͸Ξϯαϯϒϧ͍ͯ͠ͳ͍Ϟσϧ 18

Slide 19

Slide 19 text

ΞΫγϣϯͷϏʔϜαΠζΛ ૿΍ͤ͹຋༁ੑೳ͸্͕Δ 19

Slide 20

Slide 20 text

RNNsearch͸௕ڑ཭ͷ ґଘؔ܎ͷ͋Δ຋༁͕ۤख 20

Slide 21

Slide 21 text

ػց຋༁ͷ໨తݴޠଆͷ ౷ޠߏ଄͸ѻ͍ʹ͍͘ | SMT Ͱ͸ string-to-tree (Liu et al., 2006) ΍໨త ݴޠͰґଘߏ଄Λ࢖͏ݴޠϞσϧ (Shen et al., 2008) ͕ఏҊ͞Ε͍ͯͨ →SMT Ͱ͸େମʹ͓͍ͯ tree-to-string ͷํ͕ ߴੑೳ͕ͩͬͨ…… | NMT Ͱ͸ tree-to-sequence Ξςϯγϣϯ NMT Ϟσϧ (Eriguchi et al., 2016) ͕ఏҊ͞Ε ͍ͯͨ →ιʔεͱൺ΂ͯλʔήοτͷํ͕౷ޠ৘ใΛ ೖΕʹ͍͘ 21

Slide 22

Slide 22 text

·ͱΊͱࠓޙͷ՝୊ TUSJOHUPEFQFOEFODZ/.5 ·ͱΊ | ୯ޠੜ੒ͱ arc-standard ͳґଘߏ଄ղੳΛಉ ࣌ʹղੳ͢Δ string-to-dependency NMT ࠓޙͷ՝୊ | ଞͷࣄલ஌ࣝʢҙຯʣΛ NMT ʹ౷߹ | ଞͷ seq2seq λεΫʢจॻཁ໿ʣʹద༻ 22

Slide 23

Slide 23 text

ॴײ | γϯϓϧͳํ๏͕ͩɺґଘߏ଄ղੳΛߟྀͨ͠ string-to-dependency NMT ϞσϧͰ͍͍ײ͡ɻ | ୯ޠੜ੒ RNN ͱґଘߏ଄ղੳ RNN ͸ຊ౰ʹ ͪΌΜͱڠௐͯ͠ಈ͘ͷͩΖ͏͔ʁ { ґଘߏ૝ղੳثͷੑ֨ʹΑͬͯɺ౷߹Մೳ͔Ͳ ͏͔͕มΘͬͯ͘ΔͷͰ͸ʁ { Ξϯαϯϒϧͷ࢓ํ͕࣮͸ࣗ໌Ͱ͸ͳ͍ʁ { ग़ྗଆ͕͋·Γؤ݈Ͱ͸ͳͦ͞͏ 23

Slide 24

Slide 24 text

࣭ٙԠ౴ᶃ | Q: bigram embeddings ͱ͸ԿΛ͍ͯ͠Δ͜ͱ ʹ૬౰͢Δͷ͔ʁ A: bigram ͱ͍͏໊લ͕෇͍͍ͯΔ͕ɺී௨ͷ ୯ޠ N-gram ͷΑ͏ͳ bigram Ͱ͸ͳ͘ɺґଘ ߏ଄ղੳͷ bigram Λ༻͍Δɻ୯ޠͷڞىΛݟ ͍ͯΔ͜ͱʹ૬౰͢Δɻ ࣮ݧ݁ՌΛݟΔͱɺbigram embeddings Λ༻ ͍ͳͯ͘΋ RNNsearch ΑΓ޲্͍ͯ͠Δ͕ɺ ্͕Γ෯͸ SD-NMT ΑΓ bigram embeddings ͷํ͕େ͖͍ 24

Slide 25

Slide 25 text

࣭ٙԠ౴ᶄ | Q: ґଘߏ଄Λߟྀ͢ΔΑ͏ͳϞσϧ͸௕͍จʹ ޮՌ͕͋Γͦ͏͕ͩɺͦͷΑ͏ͳٞ࿦͸࿦จʹ ॻ͔Ε͍ͯΔͷ͔ʁ A: ಛʹͳ͍ɻൺֱͨ͠ํ͕͍͍ͱࢥ͏ɻ ͨͩɺʮॴײʯʹॻ͍ͨΑ͏ʹɺࣗಈͰύϥϨ ϧίʔύεʹґଘߏ଄Λ෇༩͢Δґଘߏ૝ղੳ ثͷੑ֨ʹΑͬͯੑೳ͕มΘΓͦ͏ͩͱ͸ײ͡ ͨɻ 25

Slide 26

Slide 26 text

ࢀߟจݙᶃ | Wu et al. Sequence-to-Dependency Neural Machine Translation. ACL 2017. | Eriguchi et al. Tree-to-Sequence Attentional Neural Machine Translation. ACL 2016. | Joakim Nivre. Incrementality in Deterministic Dependency Parsing. Workshop on Incrementral Parsing: Bringing Engineering and Cognition Together. 2004. | Shen et al. A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model. ACL 2008. 26

Slide 27

Slide 27 text

ࢀߟจݙᶄ | Luong et al. Addressing the Rare Word Problem in Neural Machine Translation. ACL 2015. | Zhang and Nivre. Transition-based Dependency Parsing with Rich Non-local Features. ACL 2011. | David Chiang. A Hierarchical Phrase-based Model for Statistical Machine Translation. ACL 2005. | Fabien Cromieres. Kyoto-NMT: A Neural Machine Translation Implementation in Chainer. COLING 2016. 27