WAT2022_TMU NMT System with Automatic Post-Editing by Multi-Source Levenshtein Transformer for the Restricted Translation Task of WAT 2022

TMU NMT System with Automatic Post-Editing by Multi-Source Levenshtein Transformer
for the Restricted Translation Task of WAT 2022 Seiichiro Kondo, Mamoru Komachi Tokyo Metropolitan University WAT2022, Restricted Translation task 1

Restricted translation task Setting • An input sentence and restricted
target vocabularies (RTVs) are given ◦ Each RTV consists of more than one word ◦ Given RTVs are random order Input この回路は，入力信号位相の変化により共振周波数がシフトする帰還回路であり，２基のコイルの中央にある物体の磁気特性の変化を，高い感度と分解能で検出することができる。 RTVs magnetic features, resonance frequency, feedback circuit, resolution, input signal phase Output This is a feedback circuit shifting resonance frequency by change of input signal phase, which can detect change of magnetic features of an object present at a center of two coils on high sensitivity and resolution. 2 Purpose: Generating a sentence that contains all RTVs

Our approach to this task Translating an input sentence with
consideration of RTVs • Lexical-Constraint-Aware NMT（LeCA）[Chen+, 2020] ◦ Constraints can be satisﬁed using with grid beam search, but it is computationally time-consuming 3 Post-processing for the translation not including RTVs 1. Sorting RTVs using fasttext [Bojanowski+, 2017] 2. Automatic post-editing for translation using multi-source Levenshtein transformer（MSLevT）[Wan+, 2020]

Overview of our approach 4 If LeCA’s output does not
satisfy constrains Sorted RTVs as initial tokens LeCA LeCA’s output Input sentence RTVs Concatnation Encoder Encoder Decoder output sentence MSLevT Sorting RTVs

Lexical-Constraint-Aware NMT (LeCA) 5 ↑ ↑ ↑ ↑ ↑ ↑
↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋ Transformer Encoder Information of RTVs Using positional embedding after maximum length (1024) Adding embeddings to encourage copying from input text special token ▁画像 ▁ 符 ▁器号 ▁の ▁構成 ▁例 ▁示 ▁をし ▁た <sep> ▁configuration ▁example <sep> ▁image ▁en co der Token embeddings Segment embeddings 0 0 0 0 0 0 0 0 0 0 1 1 1 2 2 2 2 2 0 Positional embeddings 1 2 4 3 5 7 9 8 10 11 1024 1025 1026 1024 1025 1026 1027 1028 6

Sorting RTVs using fasttext Sorting steps 1. Getting embeddings of
each word of LeCA’s output and RTVs 2. Calculate cosine similarity between each RTV and LeCA’s output 3. Assign the RTV to the LeCA's output with the highest similarity 4. Deﬁne the order of RTVs based on assigned position 6 Two irregular cases • A RTV consists of more than two words • Some RTVs are assigned to the same words of LeCA’s output

Sorting RTVs using fasttext When a RTV consists of more
than two words • Summarizing LeCA’s output for each n-gram by taking the average • The ﬁrst word in the n-gram (word block) as the representative word 7 DERS can calculate new a dose rate corresponding to these changes . LeCA’s output First word of n-gram (bi-gram) DERS can calculate new a dose rate corresponding to these changes dose rate dose rate = + / 2 take the average dose rate RTV

When some RTVs are assigned to the same words of
LeCA’s output • The word with higher cosine similarity is assigned priority • Discarded RTV is considered to assign to the next highest ranking word (detect) 2 / 0.51 また , 流速体積やの変化も検出できることを確認した。 LeCA’s output 流量検出流速 RTVs 1 / 1.0 1 / 1.0 1 / 0.54 Index: Ranking of high cosine similarity of RTVs with respect to Leca's output / cosine similarity 流量検出流速 (volume) (flow rate) Sorting RTVs using fasttext 8 (And, it was confirmed to enable also to detect change of flow rate and volume.) (Assuming LeCA is able to translate to some extent.)

Overview of MSLevT Architecture • Two encoders ◦ Input sentence
◦ LeCA’s output 9 Encoder Encoder Decoder Input sentence LeCA’s output output sentence • One decoder ◦ LevT RTVs as initial tokens

Details of MSLevT’s Decoder 10 • Repeating 3 operations ◦
Delete / Insert / Replace → Modify sentences by 3 steps • Initial tokens: RTVs • Forbidden operations ◦ Delete RTVs ◦ Insert words into a RTV → Output RTVs in the text forcibly Insert RTVs Deletion Classifier Placeholder Classifier Token Classifier <s>▁また ▁, ▁流速 ▁や ▁流量 ▁の ▁変化 ▁も ▁検出 ▁ できる ▁こと ▁を ▁確認 ▁し ▁た ▁ 。</s> <s></s> <s>▁流速 ▁流量 ▁検出</s> <s><PLH> <PLH> ▁流速 <PLH> ▁流量 <PLH> <PLH> <PLH> ▁検出 <PLH> <PLH> <PLH> <PLH> <PLH> <PLH> <PLH> <PLH></s> <s>▁流速 ▁流量 ▁検出</s>

Experimental setting Original data • ASPEC 11 train valid test
2,000,000（sents） 1,790（sents） 1,812（sents） Distilled data • When training MSLevT, we also use LeCA’s output instead of original target sentence

Experimental setting Subword • En: mosestokenizer → SentencePiece • Ja:
mecab-ipadic-NEologd → SentencePiece Implementation • LeCA [Chen+, 2020]（Transformer big） • Fasttext [Bojanowski+, 2017] • MSLevT [Wan+, 2020] 12

Evalation • BLEU score • Consisitency score（CS） ◦ Percentage of
translations that contain all RTVs in the test data • Final score（FS） ◦ Combining the BLEU and consistency scores ◦ Translations not satisfying constraints are replaced to an empty string before calculating BLEU score 13

Result 14 En → Ja Ja → En BLEU CS
FS BLEU CS FS LeCA 52.0 0.805 36.0 39.0 0.719 19.6 MSLevT 35.8 1.000 35.8 32.6 1.000 32.6 MSLevT（distil + orig） 44.4 1.000 44.4 39.4 1.000 39.4 LeCA + MSLevT 50.1 1.000 50.1 39.3 1.000 39.3 LeCA + MSLevT（distil + orig） 50.5 1.000 50.5 39.3 1.000 39.3

Inference time 15 Beam size En → Ja Ja →
En sec/sent ratio BLEU sec/sent ratio BLEU LeCA 5 0.094 ×1.00 52.0 0.099 ×1.00 39.0 30 0.221 ×2.35 52.1 0.228 ×2.30 38.9 LeCA + MSLevT 5 0.115 ×1.22 50.1 0.126 ×1.27 39.3

Conclusions We introduced an automatic post-editing approach for the restricted
translation task of WAT 2022 • We succeeded in generating sentences including all RTVs keeping the LeCA’s BLEU score • Our proposed method can generate translations faster 16

WAT2022_TMU NMT System with Automatic Post-Edit...

WAT2022_TMU NMT System with Automatic Post-Editing by Multi-Source Levenshtein Transformer for the Restricted Translation Task of WAT 2022

maskcott

More Decks by maskcott

Other Decks in Programming

Featured

Transcript

TMU NMT System with Automatic Post-Editing by Multi-Source Levenshtein Transformer

Restricted translation task Setting • An input sentence and restricted

Our approach to this task Translating an input sentence with

Overview of our approach 4 If LeCA’s output does not

Lexical-Constraint-Aware NMT (LeCA) 5 ↑ ↑ ↑ ↑ ↑ ↑

Sorting RTVs using fasttext Sorting steps 1. Getting embeddings of

Sorting RTVs using fasttext When a RTV consists of more

When some RTVs are assigned to the same words of

Overview of MSLevT Architecture • Two encoders ◦ Input sentence

Details of MSLevT’s Decoder 10 • Repeating 3 operations ◦

Experimental setting Original data • ASPEC 11 train valid test

Experimental setting Subword • En: mosestokenizer → SentencePiece • Ja:

Evalation • BLEU score • Consisitency score（CS） ◦ Percentage of

Result 14 En → Ja Ja → En BLEU CS

Inference time 15 Beam size En → Ja Ja →

Conclusions We introduced an automatic post-editing approach for the restricted