Slide 1

Slide 1 text

How Transferable are Neural Networks in NLP Applications? 文献紹介 自然言語処理研究室 勝田 哲弘 Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 479–489, Austin,Texas, November 1-5, 2016.

Slide 2

Slide 2 text

Abstract ● 転移学習、マルチタスクの有効性を体系的に調 査 ● モチベーション ○ 画像では多くの研究が有効性を示している ○ 自然言語処理ではどうか? ■ 結果に一貫性がない

Slide 3

Slide 3 text

Introduction ● 十分にデータセットがないタスクの場合 ○ 別のタスク、ドメインから知識を得る ○ 転移学習(またはドメイン適応) ■ 、 instance weighting、structural correspondence learning ● ニューラルモデルは転移が有用 ○ 別タスクで学習したパラメータを使用 ○ いくつかの研究でtransferabilityが示されている ■ NLPでは有効性があまり明確ではない

Slide 4

Slide 4 text

Contributions 実験設定 類似、同等のタスクで異なるデータセット 異なるタスクでモデルを統一 学習方法 転移学習 マルチタスクモデル

Slide 5

Slide 5 text

Contributions 結果 ● 転移学習はタスクの意味的な類似度に依存する ● 出力層はタスク固有 ○ 層は類似度に関わらず機能する ● マルチモデル、転移学習はほぼ同質のもの ○ 組み合わせても改善は見られない

Slide 6

Slide 6 text

Datasets 6つのデータセット ● Experiment I: Sentence classification ○ IMDB. A large dataset for binary sentiment classification (positive vs. negative). ○ MR. A small dataset for binary sentiment classification. ○ QC. A small dataset for 6-way question classification (e.g., location, time, and number). ● Experiment II: Sentence-pair classification ○ SNLI. A large dataset for sentence entailment recognition. ■ The classification objectives are entailment, contradiction, and neutral. ○ SICK. A small dataset with exactly the same classification objective as SNLI. ○ MSRP. A small dataset for paraphrase detection. ■ The objective is binary classification: judging whether two sentences have the same meaning.

Slide 7

Slide 7 text

Datasets

Slide 8

Slide 8 text

Neural Models

Slide 9

Slide 9 text

Transfer Methods transfer learningのための2つの手法 ● Parameter initialization (INIT). ○ Word embeddingなども関連する ● Multi-task learning (MULT). ○ 両方のタスクを同時に学習 ○ 全体のコスト関数:

Slide 10

Slide 10 text

Results of Transferring by INIT

Slide 11

Slide 11 text

Results of Transferring by INIT

Slide 12

Slide 12 text

MULT, and its Combination with INIT

Slide 13

Slide 13 text

Concluding Remarks ● 転移学習とマルチモデルの有効性を調査 ● 6つのデータセットと2つのモデルで実験 ○ 一貫性のある結果が得られた ○ 一般的な特性と考えれる

Slide 14

Slide 14 text

Contributions 結果 ● 転移学習はタスクの意味的な類似度に依存する ● 出力層はタスク固有 ○ 層は類似度に関わらず機能する ● マルチモデル、転移学習はほぼ同質のもの ○ 組み合わせても改善は見られない

Slide 15

Slide 15 text

How Transferable are Neural Networks in NLP Applications? review   Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 479–489, Austin,Texas, November 1-5, 2016.

Slide 16

Slide 16 text

Abstract ● conduct systematic case studies and appear the transferability of neural networks in NLP. ● Motivation ○ In some fields like image processing, many studies have shown the effectiveness. ○ For neural NLP? ■ conclusions are inconsistent.

Slide 17

Slide 17 text

Introduction ● when we do not have large enough datasets for the task of interest ○ transfer or adapt knowledge from other domains ○ Transfer learning( domain adaptation) ■ 、 instance weighting、structural correspondence learning ● transfer learning is promising ○ share parameter what learned other tasks. ○ existing studies have already shown the transferability. ■ appear to be less clear in NLP applications.

Slide 18

Slide 18 text

Contributions Transfer learning

Slide 19

Slide 19 text

Contributions 結果 ● ● The output layer is mainly specific to the dataset. ○ Word embeddings are likely to be transferable. ● MULT and INIT appear to be generally comparable. ○ combining these two methods do not result in a further gain.

Slide 20

Slide 20 text

Datasets ● Experiment I: Sentence classification ○ IMDB. A large dataset for binary sentiment classification (positive vs. negative). ○ MR. A small dataset for binary sentiment classification. ○ QC. A small dataset for 6-way question classification (e.g., location, time, and number). ● Experiment II: Sentence-pair classification ○ SNLI. A large dataset for sentence entailment recognition. ■ The classification objectives are entailment, contradiction, and neutral. ○ SICK. A small dataset with exactly the same classification objective as SNLI. ○ MSRP. A small dataset for paraphrase detection. ■ The objective is binary classification: judging whether two sentences have the same meaning.

Slide 21

Slide 21 text

Datasets

Slide 22

Slide 22 text

Neural Models

Slide 23

Slide 23 text

Transfer Methods two main approaches to neural network-based transfer learning ● Parameter initialization (INIT). ○ related to unsupervised pre-training such as word embedding learning. ● Multi-task learning (MULT). ○ simultaneously trains samples in both domains ○ The overall cost function is given by:

Slide 24

Slide 24 text

Results of Transferring by INIT

Slide 25

Slide 25 text

Results of Transferring by INIT

Slide 26

Slide 26 text

MULT, and its Combination with INIT

Slide 27

Slide 27 text

Concluding Remarks ● addressed the problem of transfer learning in neural network-based NLP applications. ● conducted two series of experiments on six datasets. ○ Results are mostly consistent ○ the conclusions can be generalized to similar scenarios

Slide 28

Slide 28 text

Contributions 結果 ● ● The output layer is mainly specific to the dataset. ○ Word embeddings are likely to be transferable. ● MULT and INIT appear to be generally comparable. ○ combining these two methods do not result in a further gain.