any labeled data • Improvement of unsupervised NMT • weak in keeping the unique and internal characteristics of each language • Background • The monolingual corpora is easy to be collected • shared-latent space assumption • Assumption that a pair of sentences from two different languages can be mapped to a same latent representation 2018/8/2 NLP 2
à target language [Saha et al. 2016; Cheng et al., 2017] 2. Single encoder and a single decoder for both languages [Lample et al. 2017] 3. Single encoder and two independent decoders [Artetxe et al. 2017b] • Above both use a single shared encoder to guarantee the shared latent space. 2018/8/2 NLP 3
embedding-reinforced encoders • Two different GANs • Transformer for the encoder and decoder • Experiment result • Compared with several baseline systems • Achieve significant improvements • Reveal that it deserves to investigate the order information within self-attention 2018/8/2 NLP 4
space assumption • Share the weights of the last few layers of encoder • Extracting high-level representations of the input sentences • Share the first few layers of the decoder • Decode high-level representations 2018/8/2 NLP 6
are kept fixed during training • The final output sequence of the encoder computed !" = $ ⊙ ! + 1 − $ ⊙ * E : input sequence embedding vectors H : initial output sequence of the encoder stack g : gate unit and compute as $ = +(-. * + -/ ! + 0) • -. , -/ 345 0 are trainable parameters and they are shared by the two encoders. 2018/8/2 NLP 7
• How to get the pseudo-parallel corpus • source / target sentence à target / source sentence • Utilize the pseudo-parallel corpus to reconstruct the original sentence from its translation 2018/8/2 NLP 8
space, train a discriminative neural network • Takes the output of the encoder and produces a binary prediction about the input sentence • Local discriminator is trained to predict the exact language • Encoders are trained to fool the local disctiminator 2018/8/2 NLP 9
translation process • Translate the source sentences to the target language • The resulting sentences back to the source language • Performance is finally averaged over two directions 2018/8/2 NLP 12
NMT • The embedding-reinforced encoders • Local GAN and global GAN • Achieves significant improvement • Reveals that the shared encoder is really a bottleneck • Future work • Investigate how to utilize the monolingual data more effectively • Explore how to reinforce the temporal order information 2018/8/2 NLP 17