Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How Transferable are Neural Networks in NLP Applications?

19861bbc3b8d3ef89df5400d1e2c529a?s=47 katsutan
December 10, 2018

How Transferable are Neural Networks in NLP Applications?

文献紹介

長岡技術科学大学 自然言語処理研究室
勝田 哲弘

19861bbc3b8d3ef89df5400d1e2c529a?s=128

katsutan

December 10, 2018
Tweet

More Decks by katsutan

Other Decks in Technology

Transcript

  1. How Transferable are Neural Networks in NLP Applications? 文献紹介 自然言語処理研究室 勝田

    哲弘 Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 479–489, Austin,Texas, November 1-5, 2016.
  2. Abstract • 転移学習、マルチタスクの有効性を体系的に調 査 • モチベーション ◦ 画像では多くの研究が有効性を示している ◦ 自然言語処理ではどうか?

    ▪ 結果に一貫性がない
  3. Introduction • 十分にデータセットがないタスクの場合 ◦ 別のタスク、ドメインから知識を得る ◦ 転移学習(またはドメイン適応) ▪ 、 instance

    weighting、structural correspondence learning • ニューラルモデルは転移が有用 ◦ 別タスクで学習したパラメータを使用 ◦ いくつかの研究でtransferabilityが示されている ▪ NLPでは有効性があまり明確ではない
  4. Contributions 実験設定 類似、同等のタスクで異なるデータセット 異なるタスクでモデルを統一 学習方法 転移学習 マルチタスクモデル

  5. Contributions 結果 • 転移学習はタスクの意味的な類似度に依存する • 出力層はタスク固有 ◦ 層は類似度に関わらず機能する • マルチモデル、転移学習はほぼ同質のもの

    ◦ 組み合わせても改善は見られない
  6. Datasets 6つのデータセット • Experiment I: Sentence classification ◦ IMDB. A large

    dataset for binary sentiment classification (positive vs. negative). ◦ MR. A small dataset for binary sentiment classification. ◦ QC. A small dataset for 6-way question classification (e.g., location, time, and number). • Experiment II: Sentence-pair classification ◦ SNLI. A large dataset for sentence entailment recognition. ▪ The classification objectives are entailment, contradiction, and neutral. ◦ SICK. A small dataset with exactly the same classification objective as SNLI. ◦ MSRP. A small dataset for paraphrase detection. ▪ The objective is binary classification: judging whether two sentences have the same meaning.
  7. Datasets

  8. Neural Models

  9. Transfer Methods transfer learningのための2つの手法 • Parameter initialization (INIT). ◦ Word

    embeddingなども関連する • Multi-task learning (MULT). ◦ 両方のタスクを同時に学習 ◦ 全体のコスト関数:
  10. Results of Transferring by INIT

  11. Results of Transferring by INIT

  12. MULT, and its Combination with INIT

  13. Concluding Remarks • 転移学習とマルチモデルの有効性を調査 • 6つのデータセットと2つのモデルで実験 ◦ 一貫性のある結果が得られた ◦ 一般的な特性と考えれる

  14. Contributions 結果 • 転移学習はタスクの意味的な類似度に依存する • 出力層はタスク固有 ◦ 層は類似度に関わらず機能する • マルチモデル、転移学習はほぼ同質のもの

    ◦ 組み合わせても改善は見られない
  15. How Transferable are Neural Networks in NLP Applications? review  

    Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 479–489, Austin,Texas, November 1-5, 2016.
  16. Abstract • conduct systematic case studies and appear the transferability

    of neural networks in NLP. • Motivation ◦ In some fields like image processing, many studies have shown the effectiveness. ◦ For neural NLP? ▪ conclusions are inconsistent.
  17. Introduction • when we do not have large enough datasets

    for the task of interest ◦ transfer or adapt knowledge from other domains ◦ Transfer learning( domain adaptation) ▪ 、 instance weighting、structural correspondence learning • transfer learning is promising ◦ share parameter what learned other tasks. ◦ existing studies have already shown the transferability. ▪ appear to be less clear in NLP applications.
  18. Contributions Transfer learning

  19. Contributions 結果 • • The output layer is mainly specific

    to the dataset. ◦ Word embeddings are likely to be transferable. • MULT and INIT appear to be generally comparable. ◦ combining these two methods do not result in a further gain.
  20. Datasets • Experiment I: Sentence classification ◦ IMDB. A large dataset

    for binary sentiment classification (positive vs. negative). ◦ MR. A small dataset for binary sentiment classification. ◦ QC. A small dataset for 6-way question classification (e.g., location, time, and number). • Experiment II: Sentence-pair classification ◦ SNLI. A large dataset for sentence entailment recognition. ▪ The classification objectives are entailment, contradiction, and neutral. ◦ SICK. A small dataset with exactly the same classification objective as SNLI. ◦ MSRP. A small dataset for paraphrase detection. ▪ The objective is binary classification: judging whether two sentences have the same meaning.
  21. Datasets

  22. Neural Models

  23. Transfer Methods two main approaches to neural network-based transfer learning

    • Parameter initialization (INIT). ◦ related to unsupervised pre-training such as word embedding learning. • Multi-task learning (MULT). ◦ simultaneously trains samples in both domains ◦ The overall cost function is given by:
  24. Results of Transferring by INIT

  25. Results of Transferring by INIT

  26. MULT, and its Combination with INIT

  27. Concluding Remarks • addressed the problem of transfer learning in

    neural network-based NLP applications. • conducted two series of experiments on six datasets. ◦ Results are mostly consistent ◦ the conclusions can be generalized to similar scenarios
  28. Contributions 結果 • • The output layer is mainly specific

    to the dataset. ◦ Word embeddings are likely to be transferable. • MULT and INIT appear to be generally comparable. ◦ combining these two methods do not result in a further gain.