al., 2017] model on the original training texts to augment training texts ØGenerate noisy training sentences (both English and Japanese) with the trained Transformer model Ø𝛽 values of random noising method are 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15 and 20 we used • Combine the noisy texts generated by different textual data augmentation methods for pretraining ØIf we combine each generated texts with the random noising method with 𝛽 = 1 and 𝛽 = 2 respectively, the mixed data comprise 119,124 sentences. 4/12/2020 WAT 2020 7