Upgrade to Pro — share decks privately, control downloads, hide ads and more …

文献紹介

katsutan
February 23, 2018

 文献紹介

修士1年 第一回
Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging

katsutan

February 23, 2018
Tweet

More Decks by katsutan

Other Decks in Technology

Transcript

  1. 長岡技術科学大学 自然言語処理研究室 勝田 哲弘 文献紹介 Reporting Score Distributions Makes a

    Difference: Performance Study of LSTM-networks for Sequence Tagging Nils Reimers and Iryna Gurevych, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 338–348.
  2. Introduction • 近年、Deep Neural Networks が幅広いNLPタスクで最先端となっている ◦ Sequence tagging (Ma and

    Hovy, 2016) ◦ Dependency parsing (Andor et al., 2016) ◦ Machine translation (Wu et al., 2016) • 学習は初期値、学習データのシャッフル、ドロップアウト(ランダム)に依存 ◦ Ma and Hovy (2016) のNERモデルではF1が89.99%~91.00% (p < 10−4) 変動する • ランダムシードが及ぼす影響を調査 ◦ POS-tagging, Chunking, Named Entity Recognition, Entity Recognition, Event Detection
  3. Impact of Randomness in the Evaluation of Neural Networks •

    NERのモデル ◦ Ma and Hovy (2016) 91.21% ▪ Bi-directional LSTM-CNNs-CRF ◦ Lample et al. (2016) 90.94% ▪ Bi-directional LSTM-LSTM-CRF • Ma and Hovyが最先端
  4. Impact of Randomness in the Evaluation of Neural Networks Lample

    et al. (2016)の方が良かった Bi-directional LSTM-LSTM-CRF
  5. Experimental Setup • NLPタスク ◦ POS-tagging, Chunking, Named Entity Recognition,

    Entity Recognition, Event Detection • モデル ◦ Huang et al., 2015; Ma and Hovy,2016; Lample et al., 2016 • Optimizer ◦ SGD, Adagrad, Adadelta, RMSProp, Adam, Nadam, Adam variant that incorporates Nesterovmomentum • Gradient Clipping and Normalization • Tagging schemes ◦ BIO, IOBES
  6. Experimental Setup • Pre-trained Word Embeddings ◦ Google News embeddings(G.

    News) ◦ the Bag of Words (Le.BoW) ◦ the dependency based embeddings (Le. Dep.) ◦ GloVe embeddings ▪ Wikipedia 2014+ Gigaword 5 (GloVe1 with 100 dimensions and GloVe2 with 300 dimensions) ▪ Common Crawl (GloVe3) ◦ the Komninos and Manandhar (2016) embeddings (Komn.) ◦ Bojanowski et al. (2016) (Fast-Text)
  7. Experimental Setup • Dropout ◦ 0.05, 0.1, 0.25, 0.5 •

    Classifier ◦ Softmax, CRF • Number of LSTM-layers ◦ 1~3 • Number of recurrent units ◦ 25, 50, 75, 100, 125 • Mini-batch sizes ◦ 1, 8, 16, 32, and 64