文献紹介

長岡技術科学大学自然言語処理研究室勝田哲弘文献紹介 Reporting Score Distributions Makes a
Difference: Performance Study of LSTM-networks for Sequence Tagging Nils Reimers and Iryna Gurevych, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 338–348.

Abstract • 近年、NERのモデルは2つが最先端となっている。 ◦ 果たして個々研究の数値を比較するだけでいいのか？ ↓ ◦ 統計的に有意な差があるのか検証しよう ▪ 5つのタスクで50,000回　ハイパーパラメータも含めて
▪ Reimers and Gurevych, 2017 • その要約

Introduction • 近年、Deep Neural Networks が幅広いNLPタスクで最先端となっている ◦ Sequence tagging　(Ma and
Hovy, 2016) ◦ Dependency parsing　(Andor et al., 2016) ◦ Machine translation　(Wu et al., 2016) • 学習は初期値、学習データのシャッフル、ドロップアウト（ランダム）に依存 ◦ Ma and Hovy (2016) のNERモデルではF1が89.99%~91.00% (p < 10−4) 変動する • ランダムシードが及ぼす影響を調査 ◦ POS-tagging, Chunking, Named Entity Recognition, Entity Recognition, Event Detection

Impact of Randomness in the Evaluation of Neural Networks •
NERのモデル ◦ Ma and Hovy (2016) 91.21% ▪ Bi-directional LSTM-CNNs-CRF ◦ Lample et al. (2016) 90.94% ▪ Bi-directional LSTM-LSTM-CRF • Ma and Hovyが最先端

Impact of Randomness in the Evaluation of Neural Networks Lample
et al. (2016)の方が良かった Bi-directional LSTM-LSTM-CRF

Impact of Randomness in the Evaluation of Neural Networks

Experimental Setup • NLPタスク ◦ POS-tagging, Chunking, Named Entity Recognition,
Entity Recognition, Event Detection • モデル ◦ Huang et al., 2015; Ma and Hovy,2016; Lample et al., 2016 • Optimizer ◦ SGD, Adagrad, Adadelta, RMSProp, Adam, Nadam, Adam variant that incorporates Nesterovmomentum • Gradient Clipping and Normalization • Tagging schemes ◦ BIO, IOBES

Experimental Setup • Pre-trained Word Embeddings ◦ Google News embeddings(G.
News) ◦ the Bag of Words (Le.BoW) ◦ the dependency based embeddings (Le. Dep.) ◦ GloVe embeddings ▪ Wikipedia 2014+ Gigaword 5 (GloVe1 with 100 dimensions and GloVe2 with 300 dimensions) ▪ Common Crawl (GloVe3) ◦ the Komninos and Manandhar (2016) embeddings (Komn.) ◦ Bojanowski et al. (2016) (Fast-Text)

Experimental Setup • Dropout ◦ 0.05, 0.1, 0.25, 0.5 •
Classifier ◦ Softmax, CRF • Number of LSTM-layers ◦ 1~3 • Number of recurrent units ◦ 25, 50, 75, 100, 125 • Mini-batch sizes ◦ 1, 8, 16, 32, and 64

Results -Classifier

Word Embeddings

Character Representation

Further Evaluated Parameters Optimizer:Nadam(NER) Gradient:Gradient normalization Dropout:83.5% tagging schemes:BIO(Entities) batch
size:1~16(small), 8~32(larger)

Conclusion • スコアの分布を比較することでアプローチの評価を正確に行えるだろう • 依存関係を明確にすることで新しいタスク、ドメインに適応する際の調整をすくなるすることができるだろう • 今後もハイパーパラメータとの依存関係などを調べていく

文献紹介

文献紹介

katsutan

More Decks by katsutan

Other Decks in Technology

Featured

Transcript

長岡技術科学大学自然言語処理研究室勝田哲弘文献紹介 Reporting Score Distributions Makes a

Abstract • 近年、NERのモデルは2つが最先端となっている。 ◦ 果たして個々研究の数値を比較するだけでいいのか？ ↓ ◦ 統計的に有意な差があるのか検証しよう ▪ 5つのタスクで50,000回　ハイパーパラメータも含めて

Introduction • 近年、Deep Neural Networks が幅広いNLPタスクで最先端となっている ◦ Sequence tagging　(Ma and

Impact of Randomness in the Evaluation of Neural Networks •

Impact of Randomness in the Evaluation of Neural Networks Lample

Impact of Randomness in the Evaluation of Neural Networks

Experimental Setup • NLPタスク ◦ POS-tagging, Chunking, Named Entity Recognition,

Experimental Setup • Pre-trained Word Embeddings ◦ Google News embeddings(G.

Experimental Setup • Dropout ◦ 0.05, 0.1, 0.25, 0.5 •

Results -Classifier

Word Embeddings

Character Representation

Further Evaluated Parameters Optimizer:Nadam(NER) Gradient:Gradient normalization Dropout:83.5% tagging schemes:BIO(Entities) batch