Literature-review-01

A3ea3bc5dde6ae2dd6eae71da9c418b0?s=47 MARUYAMA
March 26, 2018
66

 Literature-review-01

A3ea3bc5dde6ae2dd6eae71da9c418b0?s=128

MARUYAMA

March 26, 2018
Tweet

Transcript

  1. Supervised Learning of Universal Sentence Representations from Natural Language Inference

    Data A. Conneau, D. Kiela, H. Schwenk, L. Barrault, and A. Bordes, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 670–680, 2017. Nagaoka University of Technology Takumi Maruyama Literature review:
  2. Introduction Ø This paper proposed universal sentence representations trained using

    the supervised data of the Stanford Natural Language Inference datasets Ø Sentence embeddings with supervised data were tested on 12 different transfer tasks Ø BiLSTM network with max pooling makes the best current universal sentence encoding methods 2
  3. The Natural Language Inference (NLI) task Ø Stanford Natural Language

    Inference (SNLI) dataset : • Consist 570k human-generated English sentence pairs • Manually labeled with one of three categories: entailment, contradiction and neutral https://nlp.stanford.edu/projects/snli 3
  4. The Natural Language Inference (NLI) task Ø The model that

    separate the encoding 4
  5. The Natural Language Inference (NLI) task Ø The model that

    separate the encoding 5 7 different architectures
  6. Sentence encoder architectures Ø 7 different architectures • Long Short-Term

    Memory (LSTM) • Gated Recurrent Units (GRU) • Concatenation of last hidden states of forward and backward GRU • Bi-directional LSTMs with mean pooling • Bi-directional LSTMs with max pooling • Self-attentive network • Hierarchical convolutional networks 6
  7. Sentence encoder architectures Ø 7 different architectures • Long Short-Term

    Memory (LSTM) • Gated Recurrent Units (GRU) • Concatenation of last hidden states of forward and backward GRU • Bi-directional LSTMs with mean pooling • Bi-directional LSTMs with max pooling • Self-attentive network • Hierarchical convolutional networks 7
  8. Sentence encoder architectures Ø LSTM, GRU Last hidden vector is

    used as sentence representation 8 LSTM (GRU) The movie LSTM (GRU) LSTM (GRU) LSTM (GRU) was great Sentence representation
  9. Sentence encoder architectures Ø Concatenation of last hidden states of

    forward and backward GRU 9 GRU The movie GRU GRU GRU was great GRU GRU GRU GRU Concatenation Sentence representation
  10. Sentence encoder architectures Ø 7 different architectures • Long Short-Term

    Memory (LSTM) • Gated Recurrent Units (GRU) • Concatenation of last hidden states of forward and backward GRU • Bi-directional LSTMs with mean pooling • Bi-directional LSTMs with max pooling • Self-attentive network • Hierarchical convolutional networks 10
  11. Sentence encoder architectures Ø Bi-directional LSTMs with max pooling 11

  12. Sentence encoder architectures Ø 7 different architectures • Long Short-Term

    Memory (LSTM) • Gated Recurrent Units (GRU) • Concatenation of last hidden states of forward and backward GRU • Bi-directional LSTMs with mean pooling • Bi-directional LSTMs with max pooling • Self-attentive network • Hierarchical convolutional networks 12
  13. Sentence encoder architectures Ø Self-attentive network 13

  14. Sentence encoder architectures Ø 7 different architectures • Long Short-Term

    Memory (LSTM) • Gated Recurrent Units (GRU) • Concatenation of last hidden states of forward and backward GRU • Bi-directional LSTMs with mean pooling • Bi-directional LSTMs with max pooling • Self-attentive network • Hierarchical convolutional networks 14
  15. Sentence encoder architectures Ø Hierarchical convolutional networks 15

  16. Evaluation of sentence representations 16 Ø 12 tasks to evaluate

    sentence representations • Binary and multi-class classification • Entailment and semantic relatedness • Semantic Textual Similarity • Paraphrase detection • Caption-Image retrieval
  17. Evaluation of sentence representations 17 Ø 12 tasks to evaluate

    sentence representations • Binary and multi-class classification • Entailment and semantic relatedness • Semantic Textual Similarity • Paraphrase detection • Caption-Image retrieval
  18. Evaluation of sentence representations 18 Ø 12 tasks to evaluate

    sentence representations
  19. Empirical results Ø Architecture impact 19 “macro” is classical average

    of dev accuracies “micro” is a sum of the dev accuracies, weighted by the number of dev samples
  20. Empirical results Ø Architecture impact 20 “macro” is classical average

    of dev accuracies “micro” is a sum of the dev accuracies, weighted by the number of dev samples
  21. Empirical results Ø Architecture impact 21 “macro” is classical average

    of dev accuracies “micro” is a sum of the dev accuracies, weighted by the number of dev samples
  22. Empirical results Ø Task transfer 22 classification Paraphrase detection Semantic

    textual similarity Natural Language inference
  23. Empirical results Ø Task transfer 23 classification Paraphrase detection Semantic

    textual similarity Natural Language inference
  24. Empirical results Ø Task transfer 24 classification Paraphrase detection Semantic

    textual similarity Natural Language inference
  25. Conclusion Ø This paper proposed universal sentence representations trained using

    the supervised data Ø Sentence embeddings with supervised data were tested on 12 different transfer tasks Ø BiLSTM network with max pooling makes the best current universal sentence encoding methods 25