$30 off During Our Annual Pro Sale. View Details »

WAT2022_TMU NMT System with Automatic Post-Edit...

maskcott
October 18, 2022

WAT2022_TMU NMT System with Automatic Post-Editing by Multi-Source Levenshtein Transformer for the Restricted Translation Task of WAT 2022

maskcott

October 18, 2022
Tweet

More Decks by maskcott

Other Decks in Programming

Transcript

  1. TMU NMT System with Automatic Post-Editing by Multi-Source Levenshtein Transformer

    for the Restricted Translation Task of WAT 2022 Seiichiro Kondo, Mamoru Komachi Tokyo Metropolitan University WAT2022, Restricted Translation task 1
  2. Restricted translation task Setting • An input sentence and restricted

    target vocabularies (RTVs) are given ◦ Each RTV consists of more than one word ◦ Given RTVs are random order Input この回路は ,入力信号位相の変化により共振周波数がシフトする帰還回路であり,2基のコイルの中央にある物体 の磁気特性の変化を,高い感度と分解能で検出することができる。 RTVs magnetic features, resonance frequency, feedback circuit, resolution, input signal phase Output This is a feedback circuit shifting resonance frequency by change of input signal phase, which can detect change of magnetic features of an object present at a center of two coils on high sensitivity and resolution. 2 Purpose: Generating a sentence that contains all RTVs
  3. Our approach to this task Translating an input sentence with

    consideration of RTVs • Lexical-Constraint-Aware NMT(LeCA)[Chen+, 2020] ◦ Constraints can be satisfied using with grid beam search, but it is computationally time-consuming 3 Post-processing for the translation not including RTVs 1. Sorting RTVs using fasttext [Bojanowski+, 2017] 2. Automatic post-editing for translation using multi-source Levenshtein transformer(MSLevT)[Wan+, 2020]
  4. Overview of our approach 4 If LeCA’s output does not

    satisfy constrains Sorted RTVs as initial tokens LeCA LeCA’s output Input sentence RTVs Concatnation Encoder Encoder Decoder output sentence MSLevT Sorting RTVs
  5. Lexical-Constraint-Aware NMT (LeCA) 5 ↑ ↑ ↑ ↑ ↑ ↑

    ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Transformer Encoder Information of RTVs Using positional embedding after maximum length (1024) Adding embeddings to encourage copying from input text special token ▁画像 ▁ 符 ▁器 号 ▁の ▁構成 ▁例 ▁示 ▁を し ▁た <sep> ▁configuration ▁example <sep> ▁image ▁en co der Token embeddings Segment embeddings 0 0 0 0 0 0 0 0 0 0 1 1 1 2 2 2 2 2 0 Positional embeddings 1 2 4 3 5 7 9 8 10 11 1024 1025 1026 1024 1025 1026 1027 1028 6
  6. Sorting RTVs using fasttext Sorting steps 1. Getting embeddings of

    each word of LeCA’s output and RTVs 2. Calculate cosine similarity between each RTV and LeCA’s output 3. Assign the RTV to the LeCA's output with the highest similarity 4. Define the order of RTVs based on assigned position 6 Two irregular cases • A RTV consists of more than two words • Some RTVs are assigned to the same words of LeCA’s output
  7. Sorting RTVs using fasttext When a RTV consists of more

    than two words • Summarizing LeCA’s output for each n-gram by taking the average • The first word in the n-gram (word block) as the representative word 7 DERS can calculate new a dose rate corresponding to these changes . LeCA’s output First word of n-gram (bi-gram) DERS can calculate new a dose rate corresponding to these changes dose rate dose rate = + / 2 take the average dose rate RTV
  8. When some RTVs are assigned to the same words of

    LeCA’s output • The word with higher cosine similarity is assigned priority • Discarded RTV is considered to assign to the next highest ranking word (detect) 2 / 0.51 また , 流速 体積 や の 変化 も 検出 できる こと を 確認 した 。 LeCA’s output 流量 検出 流速 RTVs 1 / 1.0 1 / 1.0 1 / 0.54 Index: Ranking of high cosine similarity of RTVs with respect to Leca's output / cosine similarity 流量 検出 流速 (volume) (flow rate) Sorting RTVs using fasttext 8 (And, it was confirmed to enable also to detect change of flow rate and volume.) (Assuming LeCA is able to translate to some extent.)
  9. Overview of MSLevT Architecture • Two encoders ◦ Input sentence

    ◦ LeCA’s output 9 Encoder Encoder Decoder Input sentence LeCA’s output output sentence • One decoder ◦ LevT RTVs as initial tokens
  10. Details of MSLevT’s Decoder 10 • Repeating 3 operations ◦

    Delete / Insert / Replace → Modify sentences by 3 steps • Initial tokens: RTVs • Forbidden operations ◦ Delete RTVs ◦ Insert words into a RTV → Output RTVs in the text forcibly Insert RTVs Deletion Classifier Placeholder Classifier Token Classifier <s>▁また ▁, ▁流 速 ▁や ▁流 量 ▁の ▁変化 ▁も ▁検出 ▁ できる ▁こと ▁を ▁確認 ▁し ▁た ▁ 。</s> <s></s> <s>▁流 速 ▁流 量 ▁検出</s> <s><PLH> <PLH> ▁流 速 <PLH> ▁流 量 <PLH> <PLH> <PLH> ▁検出 <PLH> <PLH> <PLH> <PLH> <PLH> <PLH> <PLH> <PLH></s> <s>▁流 速 ▁流 量 ▁検出</s>
  11. Experimental setting Original data • ASPEC 11 train valid test

    2,000,000(sents) 1,790(sents) 1,812(sents) Distilled data • When training MSLevT, we also use LeCA’s output instead of original target sentence
  12. Experimental setting Subword • En: mosestokenizer → SentencePiece • Ja:

    mecab-ipadic-NEologd → SentencePiece Implementation • LeCA [Chen+, 2020](Transformer big) • Fasttext [Bojanowski+, 2017] • MSLevT [Wan+, 2020] 12
  13. Evalation • BLEU score • Consisitency score(CS) ◦ Percentage of

    translations that contain all RTVs in the test data • Final score(FS) ◦ Combining the BLEU and consistency scores ◦ Translations not satisfying constraints are replaced to an empty string before calculating BLEU score 13
  14. Result 14 En → Ja Ja → En BLEU CS

    FS BLEU CS FS LeCA 52.0 0.805 36.0 39.0 0.719 19.6 MSLevT 35.8 1.000 35.8 32.6 1.000 32.6 MSLevT(distil + orig) 44.4 1.000 44.4 39.4 1.000 39.4 LeCA + MSLevT 50.1 1.000 50.1 39.3 1.000 39.3 LeCA + MSLevT(distil + orig) 50.5 1.000 50.5 39.3 1.000 39.3
  15. Inference time 15 Beam size En → Ja Ja →

    En sec/sent ratio BLEU sec/sent ratio BLEU LeCA 5 0.094 ×1.00 52.0 0.099 ×1.00 39.0 30 0.221 ×2.35 52.1 0.228 ×2.30 38.9 LeCA + MSLevT 5 0.115 ×1.22 50.1 0.126 ×1.27 39.3
  16. Conclusions We introduced an automatic post-editing approach for the restricted

    translation task of WAT 2022 • We succeeded in generating sentences including all RTVs keeping the LeCA’s BLEU score • Our proposed method can generate translations faster 16