Slide 1

Slide 1 text

Confidence Modeling for Neural Machine Translation Taichi Aida, Kazuhide Yamamoto Nagaoka University of Technology IALP2019

Slide 2

Slide 2 text

Introduction - Neural Machine Translation (NMT) system is widely used - NMT system outputs are wide range of quality - there may be mistranslations... 2 ➢Can we estimate the quality of sentence in the process of generation?

Slide 3

Slide 3 text

Introduction - Our goal is outputting only high- quality translations - NOT output low-quality translations ➔input sentences > output sentences - Output translations are reliable - Helps for translators 3

Slide 4

Slide 4 text

Methods 4 Figure 1. Overview of proposed method

Slide 5

Slide 5 text

Methods 5 Figure 1. Overview of proposed method

Slide 6

Slide 6 text

Methods 6 Figure 1. Overview of proposed method

Slide 7

Slide 7 text

Methods - Indices - Sentence log-likelihood - Average variance 7

Slide 8

Slide 8 text

Sentence log-likelihood 8

Slide 9

Slide 9 text

Sentence log-likelihood 9 - Taking the sum of log-probabilities of all output words in a sentence

Slide 10

Slide 10 text

Average variance 10

Slide 11

Slide 11 text

Average variance 11 - Using top-5 candidates to calculate variance from top in each part of sentence

Slide 12

Slide 12 text

Experiments 1. Appropriateness of indices - correlation with BLEU - (The Pearson correlation coefficient) 2. Using threshold - Changing the threshold… - number of output sentences - average BLEU in output sentences 12

Slide 13

Slide 13 text

Experiments - Model - Transformer (fairseq) - Data - ASPEC-JE (translating scientific papers) 13 Train Validation Test 1,000,000 1,790 1,812

Slide 14

Slide 14 text

1. Appropriateness of indices - correlation with BLEU Results 14 Indices ρ(index, BLEU) Sentence log-likelihood 0.308 Average variance 0.268

Slide 15

Slide 15 text

Results 2. Using threshold - Threshold = 0.0 - 1812 sentences - BLEU 22.11 - Threshold = 0.9 - 13 sentences - BLEU 43.45 15 Figure 5(a) Sentence log-likelihood

Slide 16

Slide 16 text

Results 2. Using threshold - Threshold = 0.0 - 1812 sentences - BLEU 22.11 - Threshold = 0.2 - 98 sentences - BLEU 33.12 16 Figure 5(b) Average variance

Slide 17

Slide 17 text

Conclusion ➢We proposed calculating a translation confidence from NMT features ○ sentence log-likelihood ○ average variance ➢It can limit low-quality translations in the process of generation 17

Slide 18

Slide 18 text

Thank you for Listening! 18