to contain all the pre-specified restricted target vocabularies (RTVs). • We are given a source sentence and a set of RTVs, and we are supposed to generate an output sentence that contains all the RTVs in the set. 2
sentences are ordered by sentence alignment scores. → The sentences with lower scores are considered relatively noisy data. • We used forward-translation to refine the latter half of the ASPEC training data following (Morishita et al., 2019). x 1 y 1 x n/2 y n/2 x n/2+1 y n/2+1 x n y n x 1 x n/2 x n/2+1 x n x 1 x n/2 y’ n/2+ 1 y’ Ja-En model Ja-En model train x 1 y 1 x n/2 y n/2 x n/2+1 y’ n/2+1 x n y’ n 4
RTV at the beginning of an arbitrary segment. • Once the RTV has been generated, the model predicts the remainder of the segment in a semi-autoregressive manner. 6
the order where they are inserted. → The order of inserting RTVs is important for accurate translation. • We used GIZA++ to align each RTV with a word in the input sentence and sorted the RTVs in the order of their corresponding input words. 8 A, B, C RTVs list ~b~a~c~. src sentence B, A, C sorted RTVs list ~b~a~c~. src sentence A, B, C ~b~a~c~. get alignment using GIZA++ RecoverSAT
data, where the vocabulary size was set to 4,000. • When determining the insertion order of RTVs using GIZA++, we used MeCab with IPADIC to tokenize Japanese sentences Model • Transformer (base) model as Vaswani et al. • RecoverSAT as Ran et al. We examined the four models with different numbers of segment, 10, 14, 21, and 29. Data set train validation test ASPEC 3,000,000 1,790 1,812 9
of translations that satisfy the exact match of all the given constraints over the entire test corpus. ⬇ Final score. → the BLEU score using only the translations that exactly matched their RTVs. 10
+ Append RTVs 25.57 1.000 26.75 RecoverSAT 25.76 0.197 0.16 + Forced translation with random order 26.93 0.962 26.98 + Forced translation with sorted order 27.16 0.961 27.10 + Forced translation with oracle order 31.14 0.966 31.02 “Append RTVs”: we insert RTVs at the tail of the output sentence without sorting. “random order”: we insert RTVs without sorting. “sorted order”: we insert RTVs in the order of the corresponding source words. “oracle order”: we insert RTVs in the same order as that in the reference. 11
of the KdV equation nonlinear Schroedinger equation breather solution envelope soliton were described. bleu τ reference The soliton solution of the KdV equation was explained, and next, sine‐Gordon equation and breather solution of the nonlinear Schroedinger equation and 2 kink solution and envelope soliton were described. without sorting sine‐Gordon equation and 2 kink solution and soliton solution of the KdV equation were explained , and nonlinear Schroedinger equation were described , and next , the breather solution and envelope soliton were described. 60.19 0.3 with sorting soliton solution of the KdV equation was explained , and next , sine‐Gordon equation and 2 kink solution and nonlinear Schroedinger equation breather solution , and envelope soliton were described. 76.56 0.6 oracle soliton solution of the KdV equation was explained , and next , sine‐Gordon equation and breather solution of the nonlinear Schroedinger equation , 2 kink solution and envelope soliton were described. 89.44 1 τ means Kendall rank correlation coefficient of RTVs
restricted translation task. • RecoverSAT could output almost all the RTVs. • The importance of the order of the RTVs was confirmed. • In future work, investigating how to determine the best order to insert RTVs will be necessary. 14