最先端NLP2018.pdf

Unsupervised Neural Machine Translation with Weight Sharing(ACL2018) Zhen Yang, Wei
Chen , Feng Wang, Bo Xu NLP 2018/08/03 Okazaki Lab : B4 S. Shimadu

Summary • Objective • Machine translation which trains without using
any labeled data • Improvement of unsupervised NMT • weak in keeping the unique and internal characteristics of each language • Background • The monolingual corpora is easy to be collected • shared-latent space assumption • Assumption that a pair of sentences from two different languages can be mapped to a same latent representation 2018/8/2 NLP 2

Summary • Related research 1. Source language à pivot language
à target language [Saha et al. 2016; Cheng et al., 2017] 2. Single encoder and a single decoder for both languages [Lample et al. 2017] 3. Single encoder and two independent decoders [Artetxe et al. 2017b] • Above both use a single shared encoder to guarantee the shared latent space. 2018/8/2 NLP 3

Summary • Proposed idea • The weight-sharing constraint • The
embedding-reinforced encoders • Two different GANs • Transformer for the encoder and decoder • Experiment result • Compared with several baseline systems • Achieve significant improvements • Reveal that it deserves to investigate the order information within self-attention 2018/8/2 NLP 4

Model Architecture • Based on the AE and GAN •
Local discriminator : multi-layer perceptron • Global discriminator : based on CNN 2018/8/2 NLP 5

Model Architecture • Weight-sharing constraint • Based on the shared-latent
space assumption • Share the weights of the last few layers of encoder • Extracting high-level representations of the input sentences • Share the first few layers of the decoder • Decode high-level representations 2018/8/2 NLP 6

Model Architecture • Embedding reinforced encoder • Pre-trained cross-lingual embeddingsthat
are kept fixed during training • The final output sequence of the encoder computed !" = $ ⊙ ! + 1 − $ ⊙ * E : input sequence embedding vectors H : initial output sequence of the encoder stack g : gate unit and compute as $ = +(-. * + -/ ! + 0) • -. , -/ 345 0 are trainable parameters and they are shared by the two encoders. 2018/8/2 NLP 7

Methodology • Back-translation • Utilize for the cross language training
• How to get the pseudo-parallel corpus • source / target sentence à target / source sentence • Utilize the pseudo-parallel corpus to reconstruct the original sentence from its translation 2018/8/2 NLP 8

Methodology • Local GAN • To further enforce the shared-latent
space, train a discriminative neural network • Takes the output of the encoder and produces a binary prediction about the input sentence • Local discriminator is trained to predict the exact language • Encoders are trained to fool the local disctiminator 2018/8/2 NLP 9

Methodology • Global GAN • Fine tune the whole model
• Utilized to update the whole parameters of the proposed model 2018/8/2 NLP 10

Methodology • Training 1. Train with AE, back-translation and the
local GANs 2. No improvement is achieved on the development set 3. Fine tune the proposed model with the global GANs 2018/8/2 NLP 11

Evaluation • Evaluated by computing the BLEU score • Two-step
translation process • Translate the source sentences to the target language • The resulting sentences back to the source language • Performance is finally averaged over two directions 2018/8/2 NLP 12

Experiment Result(1) 2018/8/2 NLP 13 • Vary the number
of weight-sharing layers in the AEs • Verifies that the shared encoder is detrimental to the performance especially distant language pairs

Experiment Result(2) 2018/8/2 NLP 14 • Only trained with
monolingual data effectively learns to use the context information and the internal structure of each language

Experiment Result(3) • The most critical component is the weight-sharing
constraint • The embedding-reinforced encoder brings some improvement on all of the translation tasks 2018/8/2 NLP 15

Experiment Result(4) • Remove the directional self-attention à -0.3 BLEU
• Deserves more efforts to investigate the temporal order information • The GANs significantly improve the performance 2018/8/2 NLP 16

Conclusion • They proposed • The weight-sharing constraint in unsupervised
NMT • The embedding-reinforced encoders • Local GAN and global GAN • Achieves significant improvement • Reveals that the shared encoder is really a bottleneck • Future work • Investigate how to utilize the monolingual data more effectively • Explore how to reinforce the temporal order information 2018/8/2 NLP 17

最先端NLP2018.pdf

最先端NLP2018.pdf

Shimadu

Other Decks in Research

Featured

Transcript

Unsupervised Neural Machine Translation with Weight Sharing(ACL2018) Zhen Yang, Wei

Summary • Objective • Machine translation which trains without using

Summary • Related research 1. Source language à pivot language

Summary • Proposed idea • The weight-sharing constraint • The

Model Architecture • Based on the AE and GAN •

Model Architecture • Weight-sharing constraint • Based on the shared-latent

Model Architecture • Embedding reinforced encoder • Pre-trained cross-lingual embeddingsthat

Methodology • Back-translation • Utilize for the cross language training

Methodology • Local GAN • To further enforce the shared-latent

Methodology • Global GAN • Fine tune the whole model

Methodology • Training 1. Train with AE, back-translation and the

Evaluation • Evaluated by computing the BLEU score • Two-step

Experiment Result(1) 2018/8/2 NLP 13 • Vary the number

Experiment Result(2) 2018/8/2 NLP 14 • Only trained with

Experiment Result(3) • The most critical component is the weight-sharing

Experiment Result(4) • Remove the directional self-attention à -0.3 BLEU

Conclusion • They proposed • The weight-sharing constraint in unsupervised