How Transferable are Neural Networks in NLP Applications?

How Transferable are Neural Networks in NLP Applications? 文献紹介自然言語処理研究室　勝田
哲弘 Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 479–489, Austin,Texas, November 1-5, 2016.

Abstract • 転移学習、マルチタスクの有効性を体系的に調査 • モチベーション ◦ 画像では多くの研究が有効性を示している ◦ 自然言語処理ではどうか？
▪ 結果に一貫性がない

Introduction • 十分にデータセットがないタスクの場合 ◦ 別のタスク、ドメインから知識を得る ◦ 転移学習（またはドメイン適応） ▪ 、 instance
weighting、structural correspondence learning • ニューラルモデルは転移が有用 ◦ 別タスクで学習したパラメータを使用 ◦ いくつかの研究でtransferabilityが示されている ▪ NLPでは有効性があまり明確ではない

Contributions 実験設定類似、同等のタスクで異なるデータセット異なるタスクでモデルを統一学習方法転移学習マルチタスクモデル

Contributions 結果 • 転移学習はタスクの意味的な類似度に依存する • 出力層はタスク固有 ◦ 層は類似度に関わらず機能する • マルチモデル、転移学習はほぼ同質のもの
◦ 組み合わせても改善は見られない

Datasets ６つのデータセット • Experiment I: Sentence classification ◦ IMDB.　A large
dataset for binary sentiment classification (positive vs. negative). ◦ MR.　A small dataset for binary sentiment classification. ◦ QC.　A small dataset for 6-way question classification (e.g., location, time, and number). • Experiment II: Sentence-pair classification ◦ SNLI.　A large dataset for sentence entailment recognition. ▪ The classification objectives are entailment, contradiction, and neutral. ◦ SICK.　A small dataset with exactly the same classification objective as SNLI. ◦ MSRP.　A small dataset for paraphrase detection. ▪ The objective is binary classification: judging whether two sentences have the same meaning.

Datasets

Neural Models

Transfer Methods transfer learningのための２つの手法 • Parameter initialization (INIT). ◦ Word
embeddingなども関連する • Multi-task learning (MULT). ◦ 両方のタスクを同時に学習 ◦ 全体のコスト関数:

Results of Transferring by INIT

MULT, and its Combination with INIT

Concluding Remarks • 転移学習とマルチモデルの有効性を調査 • ６つのデータセットと２つのモデルで実験 ◦ 一貫性のある結果が得られた ◦ 一般的な特性と考えれる

Contributions 結果 • 転移学習はタスクの意味的な類似度に依存する • 出力層はタスク固有 ◦ 層は類似度に関わらず機能する • マルチモデル、転移学習はほぼ同質のもの
◦ 組み合わせても改善は見られない

How Transferable are Neural Networks in NLP Applications? review 　
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 479–489, Austin,Texas, November 1-5, 2016.

Abstract • conduct systematic case studies and appear the transferability
of neural networks in NLP. • Motivation ◦ In some fields like image processing, many studies have shown the effectiveness. ◦ For neural NLP? ▪ conclusions are inconsistent.

Introduction • when we do not have large enough datasets
for the task of interest ◦ transfer or adapt knowledge from other domains ◦ Transfer learning（ domain adaptation） ▪ 、 instance weighting、structural correspondence learning • transfer learning is promising ◦ share parameter what learned other tasks. ◦ existing studies have already shown the transferability. ▪ appear to be less clear in NLP applications.

Contributions Transfer learning

Contributions 結果 • • The output layer is mainly specific
to the dataset. ◦ Word embeddings are likely to be transferable. • MULT and INIT appear to be generally comparable. ◦ combining these two methods do not result in a further gain.

Datasets • Experiment I: Sentence classification ◦ IMDB.　A large dataset
for binary sentiment classification (positive vs. negative). ◦ MR.　A small dataset for binary sentiment classification. ◦ QC.　A small dataset for 6-way question classification (e.g., location, time, and number). • Experiment II: Sentence-pair classification ◦ SNLI.　A large dataset for sentence entailment recognition. ▪ The classification objectives are entailment, contradiction, and neutral. ◦ SICK.　A small dataset with exactly the same classification objective as SNLI. ◦ MSRP.　A small dataset for paraphrase detection. ▪ The objective is binary classification: judging whether two sentences have the same meaning.

Datasets

Neural Models

Transfer Methods two main approaches to neural network-based transfer learning
• Parameter initialization (INIT). ◦ related to unsupervised pre-training such as word embedding learning. • Multi-task learning (MULT). ◦ simultaneously trains samples in both domains ◦ The overall cost function is given by:

Results of Transferring by INIT

MULT, and its Combination with INIT

Concluding Remarks • addressed the problem of transfer learning in
neural network-based NLP applications. • conducted two series of experiments on six datasets. ◦ Results are mostly consistent ◦ the conclusions can be generalized to similar scenarios

Contributions 結果 • • The output layer is mainly specific
to the dataset. ◦ Word embeddings are likely to be transferable. • MULT and INIT appear to be generally comparable. ◦ combining these two methods do not result in a further gain.

How Transferable are Neural Networks in NLP App...

How Transferable are Neural Networks in NLP Applications?

katsutan

More Decks by katsutan

Other Decks in Technology

Featured

Transcript