文献紹介：Improving Back-Translation with Uncertainty-based Confidence Estimation

Improving Back-Translation with Uncertainty-based Confidence Estimation Shuo Wang, Yang Liu,
Chao Wang, Huanbo Luan, Maosong Sun Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 791-802, Hong Kong, 2019.

Abstract - Back-Translation の順方向モデル(擬似データ＋元データ) において、“モデルの不確かさ” を訓練に組み込む手法を提案 -
ベースラインと比較して、性能の向上を確認 2

Introduction - Back-Translation は low-resource な機械翻訳で効果的 - 限られたデータで訓練された逆方向 NMT
で生成した擬似データはノイズが避けられない - “モデルの不確かさ” を用いて改善を図る 3

Method 4

Method 5 “モデルの不確かさ” を定義・Word-level ・Sentence-level

Method - Word-level - Attention weight に付与 - 誤った単語に対して
attention が小さくなるよう修正 6

Method - Sentence-level - loss の計算に使用(m: 元データ、n: 擬似データ) 7

Method 8 Monte Carlo Dropout で weight を変えて K 回生成(sampling)

Method 9 - 確率分布→不確かさ？ - Predicted translation probability (PTP) -
Expected translation probability (EXP) - Variance of translation probability (VAR) - Combination of expectation and variance (CEV)

Method - Predicted translation probability (PTP) - 確率をそのまま使う - Expected
translation probability (EXP) - 期待値 10

Method - Variance of translation probability (VAR) - 分散 -
1から引いたものを採用（α = 2） 11

Method - Combination of expectation and variance (CEV) - 分散と期待値を組み合わせたもの（β
= 2） 12

Experiment 1. 不確かさの比較(PTP vs EXP vs VAR vs CEV) 2.
粒度間で不確かさの比較(word vs sentence) 3. ベースラインとの比較 13

Experiment - Model: Transformer - Evaluation: BLEU - Data 14
train dev test back-translation Chinese-English LDC(1.25M) NIST06 NIST02-05, 08 WMT17 (English, 10M) English-German WMT14 (4.47M) newstest13 newstest12, 14-15 NewsCrawl12 (German, 4.5M)

Result 1. 不確かさの比較(PTP vs EXP vs VAR vs CEV) 15
CEV（期待値と分散の組み合わせ）が最も効果あり今後CEVを使用

Result 2. 粒度間で不確かさの比較(word vs sentence) 16 Word, sentence 両方を用いた結果が一番良い

Result 3. ベースラインとの比較(Chinese-English) - None: back-translation なし, U: 不確かさ(CEV, word+sentence)を用いる
- Search: back-translation with beam-search - Sample: back-translation with sampling 17

Result 3. ベースラインとの比較(English-German) - N: Openkiwi で Quality Estimation した結果を用いる
18

Conclusion - “モデルの不確かさ” を訓練に組み込む手法を提案 - Word-level - Sentence-level -
Back-Translation による NMT の性能向上 19

Reference - Dropout as a Bayesian Approximation: Representing Model Uncertainty
in Deep Learning(ICML2016) https://arxiv.org/abs/1506.02142 - ベイジアン・ディープラーニングによる安全なAIの実現 https://qiita.com/takaaki5564/items/5ed89541d8d2a4725baa# 20

文献紹介：Improving Back-Translation with Uncertaint...

文献紹介：Improving Back-Translation with Uncertainty-based Confidence Estimation

Taichi Aida

More Decks by Taichi Aida

Other Decks in Technology

Featured

Transcript

Improving Back-Translation with Uncertainty-based Confidence Estimation Shuo Wang, Yang Liu,

Abstract - Back-Translation の順方向モデル(擬似データ＋元データ) において、“モデルの不確かさ” を訓練に組み込む手法を提案 -

Introduction - Back-Translation は low-resource な機械翻訳で効果的 - 限られたデータで訓練された逆方向 NMT

Method 4

Method 5 “モデルの不確かさ” を定義・Word-level ・Sentence-level

Method - Word-level - Attention weight に付与 - 誤った単語に対して

Method - Sentence-level - loss の計算に使用(m: 元データ、n: 擬似データ) 7

Method 8 Monte Carlo Dropout で weight を変えて K 回生成(sampling)

Method 9 - 確率分布→不確かさ？ - Predicted translation probability (PTP) -

Method - Predicted translation probability (PTP) - 確率をそのまま使う - Expected

Method - Variance of translation probability (VAR) - 分散 -

Method - Combination of expectation and variance (CEV) - 分散と期待値を組み合わせたもの（β

Experiment 1. 不確かさの比較(PTP vs EXP vs VAR vs CEV) 2.

Experiment - Model: Transformer - Evaluation: BLEU - Data 14

Result 1. 不確かさの比較(PTP vs EXP vs VAR vs CEV) 15

Result 2. 粒度間で不確かさの比較(word vs sentence) 16 Word, sentence 両方を用いた結果が一番良い

Result 3. ベースラインとの比較(Chinese-English) - None: back-translation なし, U: 不確かさ(CEV, word+sentence)を用いる

Result 3. ベースラインとの比較(English-German) - N: Openkiwi で Quality Estimation した結果を用いる

Conclusion - “モデルの不確かさ” を訓練に組み込む手法を提案 - Word-level - Sentence-level -

Reference - Dropout as a Bayesian Approximation: Representing Model Uncertainty