Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How Transferable are Neural Networks in NLP Applications?

katsutan
December 10, 2018

How Transferable are Neural Networks in NLP Applications?

文献紹介

長岡技術科学大学 自然言語処理研究室
勝田 哲弘

katsutan

December 10, 2018
Tweet

More Decks by katsutan

Other Decks in Technology

Transcript

  1. How Transferable are Neural Networks in NLP Applications? 文献紹介 自然言語処理研究室 勝田

    哲弘 Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 479–489, Austin,Texas, November 1-5, 2016.
  2. Introduction • 十分にデータセットがないタスクの場合 ◦ 別のタスク、ドメインから知識を得る ◦ 転移学習(またはドメイン適応) ▪ 、 instance

    weighting、structural correspondence learning • ニューラルモデルは転移が有用 ◦ 別タスクで学習したパラメータを使用 ◦ いくつかの研究でtransferabilityが示されている ▪ NLPでは有効性があまり明確ではない
  3. Datasets 6つのデータセット • Experiment I: Sentence classification ◦ IMDB. A large

    dataset for binary sentiment classification (positive vs. negative). ◦ MR. A small dataset for binary sentiment classification. ◦ QC. A small dataset for 6-way question classification (e.g., location, time, and number). • Experiment II: Sentence-pair classification ◦ SNLI. A large dataset for sentence entailment recognition. ▪ The classification objectives are entailment, contradiction, and neutral. ◦ SICK. A small dataset with exactly the same classification objective as SNLI. ◦ MSRP. A small dataset for paraphrase detection. ▪ The objective is binary classification: judging whether two sentences have the same meaning.
  4. Transfer Methods transfer learningのための2つの手法 • Parameter initialization (INIT). ◦ Word

    embeddingなども関連する • Multi-task learning (MULT). ◦ 両方のタスクを同時に学習 ◦ 全体のコスト関数:
  5. How Transferable are Neural Networks in NLP Applications? review  

    Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 479–489, Austin,Texas, November 1-5, 2016.
  6. Abstract • conduct systematic case studies and appear the transferability

    of neural networks in NLP. • Motivation ◦ In some fields like image processing, many studies have shown the effectiveness. ◦ For neural NLP? ▪ conclusions are inconsistent.
  7. Introduction • when we do not have large enough datasets

    for the task of interest ◦ transfer or adapt knowledge from other domains ◦ Transfer learning( domain adaptation) ▪ 、 instance weighting、structural correspondence learning • transfer learning is promising ◦ share parameter what learned other tasks. ◦ existing studies have already shown the transferability. ▪ appear to be less clear in NLP applications.
  8. Contributions 結果 • • The output layer is mainly specific

    to the dataset. ◦ Word embeddings are likely to be transferable. • MULT and INIT appear to be generally comparable. ◦ combining these two methods do not result in a further gain.
  9. Datasets • Experiment I: Sentence classification ◦ IMDB. A large dataset

    for binary sentiment classification (positive vs. negative). ◦ MR. A small dataset for binary sentiment classification. ◦ QC. A small dataset for 6-way question classification (e.g., location, time, and number). • Experiment II: Sentence-pair classification ◦ SNLI. A large dataset for sentence entailment recognition. ▪ The classification objectives are entailment, contradiction, and neutral. ◦ SICK. A small dataset with exactly the same classification objective as SNLI. ◦ MSRP. A small dataset for paraphrase detection. ▪ The objective is binary classification: judging whether two sentences have the same meaning.
  10. Transfer Methods two main approaches to neural network-based transfer learning

    • Parameter initialization (INIT). ◦ related to unsupervised pre-training such as word embedding learning. • Multi-task learning (MULT). ◦ simultaneously trains samples in both domains ◦ The overall cost function is given by:
  11. Concluding Remarks • addressed the problem of transfer learning in

    neural network-based NLP applications. • conducted two series of experiments on six datasets. ◦ Results are mostly consistent ◦ the conclusions can be generalized to similar scenarios
  12. Contributions 結果 • • The output layer is mainly specific

    to the dataset. ◦ Word embeddings are likely to be transferable. • MULT and INIT appear to be generally comparable. ◦ combining these two methods do not result in a further gain.