Upgrade to Pro — share decks privately, control downloads, hide ads and more …

最先端NLP2018.pdf

Avatar for Shimadu Shimadu
July 30, 2018

 最先端NLP2018.pdf

Avatar for Shimadu

Shimadu

July 30, 2018
Tweet

Other Decks in Research

Transcript

  1. Unsupervised Neural Machine Translation with Weight Sharing(ACL2018) Zhen Yang, Wei

    Chen , Feng Wang, Bo Xu      NLP 2018/08/03 Okazaki Lab : B4 S. Shimadu
  2. Summary • Objective • Machine translation which trains without using

    any labeled data • Improvement of unsupervised NMT • weak in keeping the unique and internal characteristics of each language • Background • The monolingual corpora is easy to be collected • shared-latent space assumption • Assumption that a pair of sentences from two different languages can be mapped to a same latent representation 2018/8/2  NLP 2
  3. Summary • Related research 1. Source language à pivot language

    à target language [Saha et al. 2016; Cheng et al., 2017] 2. Single encoder and a single decoder for both languages [Lample et al. 2017] 3. Single encoder and two independent decoders [Artetxe et al. 2017b] • Above both use a single shared encoder to guarantee the shared latent space. 2018/8/2  NLP 3
  4. Summary • Proposed idea • The weight-sharing constraint • The

    embedding-reinforced encoders • Two different GANs • Transformer for the encoder and decoder • Experiment result • Compared with several baseline systems • Achieve significant improvements • Reveal that it deserves to investigate the order information within self-attention 2018/8/2  NLP 4
  5. Model Architecture • Based on the AE and GAN •

    Local discriminator : multi-layer perceptron • Global discriminator : based on CNN 2018/8/2  NLP 5
  6. Model Architecture • Weight-sharing constraint • Based on the shared-latent

    space assumption • Share the weights of the last few layers of encoder • Extracting high-level representations of the input sentences • Share the first few layers of the decoder • Decode high-level representations 2018/8/2  NLP 6
  7. Model Architecture • Embedding reinforced encoder • Pre-trained cross-lingual embeddingsthat

    are kept fixed during training • The final output sequence of the encoder computed !" = $ ⊙ ! + 1 − $ ⊙ * E : input sequence embedding vectors H : initial output sequence of the encoder stack g : gate unit and compute as $ = +(-. * + -/ ! + 0) • -. , -/ 345 0 are trainable parameters and they are shared by the two encoders. 2018/8/2  NLP 7
  8. Methodology • Back-translation • Utilize for the cross language training

    • How to get the pseudo-parallel corpus • source / target sentence à target / source sentence • Utilize the pseudo-parallel corpus to reconstruct the original sentence from its translation 2018/8/2  NLP 8
  9. Methodology • Local GAN • To further enforce the shared-latent

    space, train a discriminative neural network • Takes the output of the encoder and produces a binary prediction about the input sentence • Local discriminator is trained to predict the exact language • Encoders are trained to fool the local disctiminator 2018/8/2  NLP 9
  10. Methodology • Global GAN • Fine tune the whole model

    • Utilized to update the whole parameters of the proposed model 2018/8/2  NLP 10
  11. Methodology • Training 1. Train with AE, back-translation and the

    local GANs 2. No improvement is achieved on the development set 3. Fine tune the proposed model with the global GANs 2018/8/2  NLP 11
  12. Evaluation • Evaluated by computing the BLEU score • Two-step

    translation process • Translate the source sentences to the target language • The resulting sentences back to the source language • Performance is finally averaged over two directions 2018/8/2  NLP 12
  13. Experiment Result(1) 2018/8/2  NLP 13 • Vary the number

    of weight-sharing layers in the AEs • Verifies that the shared encoder is detrimental to the performance especially distant language pairs
  14. Experiment Result(2) 2018/8/2  NLP 14 • Only trained with

    monolingual data effectively learns to use the context information and the internal structure of each language
  15. Experiment Result(3) • The most critical component is the weight-sharing

    constraint • The embedding-reinforced encoder brings some improvement on all of the translation tasks 2018/8/2  NLP 15
  16. Experiment Result(4) • Remove the directional self-attention à -0.3 BLEU

    • Deserves more efforts to investigate the temporal order information • The GANs significantly improve the performance 2018/8/2  NLP 16
  17. Conclusion • They proposed • The weight-sharing constraint in unsupervised

    NMT • The embedding-reinforced encoders • Local GAN and global GAN • Achieves significant improvement • Reveals that the shared encoder is really a bottleneck • Future work • Investigate how to utilize the monolingual data more effectively • Explore how to reinforce the temporal order information 2018/8/2  NLP 17