Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Literature-review-01

MARUYAMA
March 26, 2018
120

 Literature-review-01

MARUYAMA

March 26, 2018
Tweet

Transcript

  1. Supervised Learning of Universal Sentence Representations from Natural Language Inference

    Data A. Conneau, D. Kiela, H. Schwenk, L. Barrault, and A. Bordes, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 670–680, 2017. Nagaoka University of Technology Takumi Maruyama Literature review:
  2. Introduction Ø This paper proposed universal sentence representations trained using

    the supervised data of the Stanford Natural Language Inference datasets Ø Sentence embeddings with supervised data were tested on 12 different transfer tasks Ø BiLSTM network with max pooling makes the best current universal sentence encoding methods 2
  3. The Natural Language Inference (NLI) task Ø Stanford Natural Language

    Inference (SNLI) dataset : • Consist 570k human-generated English sentence pairs • Manually labeled with one of three categories: entailment, contradiction and neutral https://nlp.stanford.edu/projects/snli 3
  4. The Natural Language Inference (NLI) task Ø The model that

    separate the encoding 5 7 different architectures
  5. Sentence encoder architectures Ø 7 different architectures • Long Short-Term

    Memory (LSTM) • Gated Recurrent Units (GRU) • Concatenation of last hidden states of forward and backward GRU • Bi-directional LSTMs with mean pooling • Bi-directional LSTMs with max pooling • Self-attentive network • Hierarchical convolutional networks 6
  6. Sentence encoder architectures Ø 7 different architectures • Long Short-Term

    Memory (LSTM) • Gated Recurrent Units (GRU) • Concatenation of last hidden states of forward and backward GRU • Bi-directional LSTMs with mean pooling • Bi-directional LSTMs with max pooling • Self-attentive network • Hierarchical convolutional networks 7
  7. Sentence encoder architectures Ø LSTM, GRU Last hidden vector is

    used as sentence representation 8 LSTM (GRU) The movie LSTM (GRU) LSTM (GRU) LSTM (GRU) was great Sentence representation
  8. Sentence encoder architectures Ø Concatenation of last hidden states of

    forward and backward GRU 9 GRU The movie GRU GRU GRU was great GRU GRU GRU GRU Concatenation Sentence representation
  9. Sentence encoder architectures Ø 7 different architectures • Long Short-Term

    Memory (LSTM) • Gated Recurrent Units (GRU) • Concatenation of last hidden states of forward and backward GRU • Bi-directional LSTMs with mean pooling • Bi-directional LSTMs with max pooling • Self-attentive network • Hierarchical convolutional networks 10
  10. Sentence encoder architectures Ø 7 different architectures • Long Short-Term

    Memory (LSTM) • Gated Recurrent Units (GRU) • Concatenation of last hidden states of forward and backward GRU • Bi-directional LSTMs with mean pooling • Bi-directional LSTMs with max pooling • Self-attentive network • Hierarchical convolutional networks 12
  11. Sentence encoder architectures Ø 7 different architectures • Long Short-Term

    Memory (LSTM) • Gated Recurrent Units (GRU) • Concatenation of last hidden states of forward and backward GRU • Bi-directional LSTMs with mean pooling • Bi-directional LSTMs with max pooling • Self-attentive network • Hierarchical convolutional networks 14
  12. Evaluation of sentence representations 16 Ø 12 tasks to evaluate

    sentence representations • Binary and multi-class classification • Entailment and semantic relatedness • Semantic Textual Similarity • Paraphrase detection • Caption-Image retrieval
  13. Evaluation of sentence representations 17 Ø 12 tasks to evaluate

    sentence representations • Binary and multi-class classification • Entailment and semantic relatedness • Semantic Textual Similarity • Paraphrase detection • Caption-Image retrieval
  14. Empirical results Ø Architecture impact 19 “macro” is classical average

    of dev accuracies “micro” is a sum of the dev accuracies, weighted by the number of dev samples
  15. Empirical results Ø Architecture impact 20 “macro” is classical average

    of dev accuracies “micro” is a sum of the dev accuracies, weighted by the number of dev samples
  16. Empirical results Ø Architecture impact 21 “macro” is classical average

    of dev accuracies “micro” is a sum of the dev accuracies, weighted by the number of dev samples
  17. Conclusion Ø This paper proposed universal sentence representations trained using

    the supervised data Ø Sentence embeddings with supervised data were tested on 12 different transfer tasks Ø BiLSTM network with max pooling makes the best current universal sentence encoding methods 25