Upgrade to Pro — share decks privately, control downloads, hide ads and more …

文献紹介: Why Self-Attention? A Targeted Evaluation...

Yumeto Inaoka
December 12, 2018

文献紹介: Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures

2018/12/12の文献紹介で発表

Yumeto Inaoka

December 12, 2018
Tweet

More Decks by Yumeto Inaoka

Other Decks in Research

Transcript

  1. Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures

    文献紹介 2018/12/12 長岡技術科学大学 自然言語処理研究室 稲岡 夢人 1
  2. Literature • Why Self-Attention? A Targeted Evaluation of Neural Machine

    Translation Architectures. • Gongbo Tang, Mathias Müller, Annette Rios, Rico Sennrich. • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4263-4272, 2018. 2
  3. Abstract  ConvolutionやSelf-AttentionがRNNより優秀 ← 理論のみで原因のテストはされていない  Subject-Verb Agreement (広範囲の関係捕捉) Word

    Sense Disambiguation (意味素性の抽出) のタスクで性質を確認  広範囲の関係捕捉はRNNが優れており、 WSDはSelf-Attentionが遥かに優れることを確認 3
  4. Transformer-based NMT  Self-Attentionを用いている → 直接他のトークンへ接続される  複数のAttention headsを持つ 

    任意のトークン間のネットワーク上の経路長は1  CNNと同様にPositional Embeddingsを導入 10
  5. Datasets  WMT17 shared task • 5.9 Million sentence pairs

    in training set • newstest2013 as the validation set • newstest2014 & newstest2017 as the test tests  Lingeval97 • 97,000 English→German Contrastive translation pairs • Using 35,105 instances which include subject-verb 13
  6. RNN vs. Transformer  [Tran et al. 2018]はTransformerはLSTMsよりも Subject-Verb Agreementタスクにおいて悪いこと

    を示している → 学習データの量を少量に合わせて実験 → パラメータを合わせて実験 16
  7. Datasets  ContraWSD  German→English • 84 different German word

    senses • 7,200 lexical ambiguities • 3.5 contractive translations on average  German→French  71 difference German word senses  6,700 lexical ambiguities  2.2 contractive translations on average 19
  8. Results  Transformerが最も優秀 → 強力な意味素性抽出器として動作  TransRNNはTransformerよりAcc.が低下 → WSDはEncoderだけでなくDecoderでも動作 20

    *TransRNN is a hybrid model with a Transformer encoder and an RNN decoder. *uedin-wmt17 is a model that achieved the best result in DE→EN