gained widespread attention because of its high translation accuracy. n It shows poor performance in the translation of long sentences, which is a major issue in low- resource languages. n It is assumed that this issue is caused by insufficient number of long sentences in the training data. Ø We proposes a simple data augmentation method to handle long sentences. Ø We combined this method with back-translation. Sentence Concatenation Approach to Data Augmentation for Neural Machine Translation Seiichiro Kondo, Kengo Hotate, Tosho Hirasawa, Masahiro Kaneko, and Mamoru Komachi Tokyo Metropolitan University
[email protected] Method Figure 1. Over view new train data … He lived a hard life. <sep> I can't swim at all. I can't swim at all. <sep> She is my friend. 𝑥! <sep> 𝑥" 𝑥!#" <sep> 𝑥! 彼は⾟い⼈⽣を送った。<sep> 私は全く泳げない。 私は全く泳げない。 <sep> 彼⼥は私の親友だ。 𝑦!#" <sep> 𝑦! 𝑦! <sep> 𝑦" 𝑥": He lived a hard life. 𝑥$: I can't swim at all. 𝑥! 𝑦! … original data 𝑦$: 私は全く泳げない。 𝑥%: She is my friend. 𝑦%: 彼⼥は私の親友だ。 𝑦": 彼は⾟い⼈⽣を送った。 𝑥′" <sep> 𝑥′$ 𝑥′" 𝑦" 𝑥′! 𝑦! 𝑥′% 𝑦% 𝑥′$ 𝑦$ … 𝑥′!#" <sep> 𝑥′! 𝑦!#" <sep> 𝑦! 𝑥′! <sep> 𝑥′" 𝑦! <sep> 𝑦" … 𝑥′$ <sep> 𝑥′% 𝑦$ <sep> 𝑦% 𝑦" <sep> 𝑦$ concatenation back translation concatenation Experiments Setting n We used Transformer model (Vaswani et al., 2017). n Dataset *400,000 sentences were randomly extracted from 2,000,000 sentences. n We also conducted oversampling as baseline. n A single sentence is fed as the input during the test process. n We computed the average of the BLEU scores of three runs with different seeds. n Human evaluation was also conducted. corpus train sent. dev sent. test sent. ASPEC 400,000* 1,790 1,812 The number of samples Sentence length in train data vanilla: original data OS: over sampling concat: concatenation (proposed method) BT: back translation Results n The BLEU score of “vanilla + concat” is more stable when applied for translation with sentence lengths of longer than 51 words. n In particular, the score of the sentence lengths of longer than 41 is significantly improved, which indicates that the proposed method is more effective for long sentence translation. n It is conceivable that back-translation and concatenation has independent factors that improve the accuracy of the translation. n We observed that the outputof the proposed method improved or were compara-ble under almost all conditions. n The translation quality degrades when there are more than 1,000,000 sentences in the training data. length all 1〜10 11〜20 21〜30 31〜40 41〜50 51〜60 61〜70 71〜 sent. 1,812 73 529 600 341 164 74 18 13 vanilla (400K) 26.5 22.9 23.0 26.2 27.1 29.6 28.5 28.8 23.6 + OS (+400K) 25.0 21.8 23.0 24.9 25.7 27.0 27.0 22.5 18.1 + concat (+400K) 26.6 21.4 23.3 25.7 27.5 28.7 29.5 28.2 29.0 + BT (+400K) 28.8 24.3 25.5 28.3 29.5 30.6 31.6 28.7 28.7 +BT +concat (+1.2M) 29.4 25.4 25.6 28.6 30.1 31.5 33.1 29.9 30.1 Figure 2. Distribution of each data set. Example src Results of the analysis shows high accuracy properties, such as the reproducibility of relative standard deviation 0.3~0.9% varified by repetitive analyses of ten times, the clibration curves with correlation coefficient of 1 verified by tests of standard materials in using six kinds of acetonitrile dilute solutions, and the formaldehyde detection limit of 0.0018μg/mL. tgt 結果は,相対標準偏差0.3〜0.9%の再現性(10回の繰返し分析),相関係数1の検量線(6種類のアセトニトリル希釈溶液による標 準資料の検定),0.0018μg/mLのホルムアルデヒド検出限界,など⾼い精度を得た。 vanilla + BT 6種のアセトニトリル希薄溶液を⽤いた標準物質の試験及びホルムアルデヒド検出限界は0.0018μg/mLであった。 vanilla + BT + concat 分析の結果は10回の繰り返し解析で相対標準偏差0.3〜0.9%の再現性,6種のアセトニトリル希薄溶液を⽤いた標準物質の試験によ り検証された1の相関係数を持つクライテリア曲線,及び0.0018μg/mLのホルムアルデヒド検出限界など⾼い精度を⽰した。 adequacy fluency length win tie lose win tie lose 1 ‒ 10 4 5 3 3 7 2 11 ‒ 20 20 39 29 21 47 20 21 ‒ 30 34 35 31 33 42 25 31 ‒ 40 23 21 13 17 24 16 41 ‒ 50 10 6 11 6 10 11 51 ‒ 6 5 6 7 6 4 overall 97 111 93 87 136 78 Table 1. BLEU score Table 2. Human evaluation Figure 3. Effectiveness of the proposed method for each data size. all 51‒60 61‒70 71‒ Length Difference of BLEU score 200K 400K 600K 800K 1M 2M (orig)