Upgrade to Pro — share decks privately, control downloads, hide ads and more …

最先端NLP2018.pdf

Shimadu
July 30, 2018

 最先端NLP2018.pdf

Shimadu

July 30, 2018
Tweet

Other Decks in Research

Transcript

  1. Unsupervised Neural Machine Translation with Weight Sharing(ACL2018) Zhen Yang, Wei

    Chen , Feng Wang, Bo Xu      NLP 2018/08/03 Okazaki Lab : B4 S. Shimadu
  2. Summary • Objective • Machine translation which trains without using

    any labeled data • Improvement of unsupervised NMT • weak in keeping the unique and internal characteristics of each language • Background • The monolingual corpora is easy to be collected • shared-latent space assumption • Assumption that a pair of sentences from two different languages can be mapped to a same latent representation 2018/8/2  NLP 2
  3. Summary • Related research 1. Source language à pivot language

    à target language [Saha et al. 2016; Cheng et al., 2017] 2. Single encoder and a single decoder for both languages [Lample et al. 2017] 3. Single encoder and two independent decoders [Artetxe et al. 2017b] • Above both use a single shared encoder to guarantee the shared latent space. 2018/8/2  NLP 3
  4. Summary • Proposed idea • The weight-sharing constraint • The

    embedding-reinforced encoders • Two different GANs • Transformer for the encoder and decoder • Experiment result • Compared with several baseline systems • Achieve significant improvements • Reveal that it deserves to investigate the order information within self-attention 2018/8/2  NLP 4
  5. Model Architecture • Based on the AE and GAN •

    Local discriminator : multi-layer perceptron • Global discriminator : based on CNN 2018/8/2  NLP 5
  6. Model Architecture • Weight-sharing constraint • Based on the shared-latent

    space assumption • Share the weights of the last few layers of encoder • Extracting high-level representations of the input sentences • Share the first few layers of the decoder • Decode high-level representations 2018/8/2  NLP 6
  7. Model Architecture • Embedding reinforced encoder • Pre-trained cross-lingual embeddingsthat

    are kept fixed during training • The final output sequence of the encoder computed !" = $ ⊙ ! + 1 − $ ⊙ * E : input sequence embedding vectors H : initial output sequence of the encoder stack g : gate unit and compute as $ = +(-. * + -/ ! + 0) • -. , -/ 345 0 are trainable parameters and they are shared by the two encoders. 2018/8/2  NLP 7
  8. Methodology • Back-translation • Utilize for the cross language training

    • How to get the pseudo-parallel corpus • source / target sentence à target / source sentence • Utilize the pseudo-parallel corpus to reconstruct the original sentence from its translation 2018/8/2  NLP 8
  9. Methodology • Local GAN • To further enforce the shared-latent

    space, train a discriminative neural network • Takes the output of the encoder and produces a binary prediction about the input sentence • Local discriminator is trained to predict the exact language • Encoders are trained to fool the local disctiminator 2018/8/2  NLP 9
  10. Methodology • Global GAN • Fine tune the whole model

    • Utilized to update the whole parameters of the proposed model 2018/8/2  NLP 10
  11. Methodology • Training 1. Train with AE, back-translation and the

    local GANs 2. No improvement is achieved on the development set 3. Fine tune the proposed model with the global GANs 2018/8/2  NLP 11
  12. Evaluation • Evaluated by computing the BLEU score • Two-step

    translation process • Translate the source sentences to the target language • The resulting sentences back to the source language • Performance is finally averaged over two directions 2018/8/2  NLP 12
  13. Experiment Result(1) 2018/8/2  NLP 13 • Vary the number

    of weight-sharing layers in the AEs • Verifies that the shared encoder is detrimental to the performance especially distant language pairs
  14. Experiment Result(2) 2018/8/2  NLP 14 • Only trained with

    monolingual data effectively learns to use the context information and the internal structure of each language
  15. Experiment Result(3) • The most critical component is the weight-sharing

    constraint • The embedding-reinforced encoder brings some improvement on all of the translation tasks 2018/8/2  NLP 15
  16. Experiment Result(4) • Remove the directional self-attention à -0.3 BLEU

    • Deserves more efforts to investigate the temporal order information • The GANs significantly improve the performance 2018/8/2  NLP 16
  17. Conclusion • They proposed • The weight-sharing constraint in unsupervised

    NMT • The embedding-reinforced encoders • Local GAN and global GAN • Achieves significant improvement • Reveals that the shared encoder is really a bottleneck • Future work • Investigate how to utilize the monolingual data more effectively • Explore how to reinforce the temporal order information 2018/8/2  NLP 17