Upgrade to Pro — share decks privately, control downloads, hide ads and more …

文献紹介

Avatar for katsutan katsutan
February 23, 2018

 文献紹介

修士1年 第一回
Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging

Avatar for katsutan

katsutan

February 23, 2018
Tweet

More Decks by katsutan

Other Decks in Technology

Transcript

  1. 長岡技術科学大学 自然言語処理研究室 勝田 哲弘 文献紹介 Reporting Score Distributions Makes a

    Difference: Performance Study of LSTM-networks for Sequence Tagging Nils Reimers and Iryna Gurevych, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 338–348.
  2. Introduction • 近年、Deep Neural Networks が幅広いNLPタスクで最先端となっている ◦ Sequence tagging (Ma and

    Hovy, 2016) ◦ Dependency parsing (Andor et al., 2016) ◦ Machine translation (Wu et al., 2016) • 学習は初期値、学習データのシャッフル、ドロップアウト(ランダム)に依存 ◦ Ma and Hovy (2016) のNERモデルではF1が89.99%~91.00% (p < 10−4) 変動する • ランダムシードが及ぼす影響を調査 ◦ POS-tagging, Chunking, Named Entity Recognition, Entity Recognition, Event Detection
  3. Impact of Randomness in the Evaluation of Neural Networks •

    NERのモデル ◦ Ma and Hovy (2016) 91.21% ▪ Bi-directional LSTM-CNNs-CRF ◦ Lample et al. (2016) 90.94% ▪ Bi-directional LSTM-LSTM-CRF • Ma and Hovyが最先端
  4. Impact of Randomness in the Evaluation of Neural Networks Lample

    et al. (2016)の方が良かった Bi-directional LSTM-LSTM-CRF
  5. Experimental Setup • NLPタスク ◦ POS-tagging, Chunking, Named Entity Recognition,

    Entity Recognition, Event Detection • モデル ◦ Huang et al., 2015; Ma and Hovy,2016; Lample et al., 2016 • Optimizer ◦ SGD, Adagrad, Adadelta, RMSProp, Adam, Nadam, Adam variant that incorporates Nesterovmomentum • Gradient Clipping and Normalization • Tagging schemes ◦ BIO, IOBES
  6. Experimental Setup • Pre-trained Word Embeddings ◦ Google News embeddings(G.

    News) ◦ the Bag of Words (Le.BoW) ◦ the dependency based embeddings (Le. Dep.) ◦ GloVe embeddings ▪ Wikipedia 2014+ Gigaword 5 (GloVe1 with 100 dimensions and GloVe2 with 300 dimensions) ▪ Common Crawl (GloVe3) ◦ the Komninos and Manandhar (2016) embeddings (Komn.) ◦ Bojanowski et al. (2016) (Fast-Text)
  7. Experimental Setup • Dropout ◦ 0.05, 0.1, 0.25, 0.5 •

    Classifier ◦ Softmax, CRF • Number of LSTM-layers ◦ 1~3 • Number of recurrent units ◦ 25, 50, 75, 100, 125 • Mini-batch sizes ◦ 1, 8, 16, 32, and 64