Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Zhou et al. - arXiv - Improving Grammatical Err...

wkwkgg
November 27, 2019

Zhou et al. - arXiv - Improving Grammatical Error Correction with Machine Translation Pairs

wkwkgg

November 27, 2019
Tweet

More Decks by wkwkgg

Other Decks in Science

Transcript

  1. Improving Grammatical Error Correction with Machine Translation Pairs Wangchunshu Zhou,

    Tao Ge, Chang Mu, Ke Xu, Furu Wei, Ming Zhou arXiv:1911.02825v1 2019/11/27 ࿦จಡΈձ ঺հऀ : B4 ߴڮ ༔ਐ
  2. In short • Generate source and target sentences using machine

    translation model for improving GEC task Pair of source and target are respectively SMT and NMT output • Performance can be improved by manually decreasing language model weight in SMT • Indicated the effectiveness compared with synthetic data generated by random corruption
  3. Introduction • Synthetic error-corrected data is helpful for improving GEC

    models • An issue of existing data synthesis approaches Pre-defined rule sets : limited error types Back-translation : limited the seed error-corrected training data • Proposed method Employs two MT models of different qualities (SMT and NMT) MT models to translate the same sentence in a bridge language into English Pair them as a pseudo error-corrected sentence pair
  4. Related Work - 1 Rule-based Monolingual Corpora Corruption [Zhao et

    al., 2019, NAACL] • Corruption monolingual corpora with pre-defined rules • Pros : Very simple and efficient to generate parallel data • Cons : Limited and only cover a small portion of grammatical error types Back-translation based Error Generation [Ge et al., 2018, ACL] • Training an error generation model by using the error-corrected corpora in opposite direction • Pros : Able to cover more diverse error types • Cons : Requires a large amount of annotated error corrected data
  5. Related Work - 2 Data Generation from Round-trip Translations [Lichtarge

    et al., 2019, arXiv:1904.05780] • This approach uses two MT models • One from English to a bridge language and the other from the bridge to English • Pros : Easy to generate error-corrected pairs ? • Cons : Good MT model : quit clean and the coverage over error types is limited Poor MT model : more paraphrase-like or information loss Data Generation from Wikipedia Revision Histories • Extract revision histories from Wikipedia • Pros : Resemble real error-corrected data • Cons : Majority of extracted revisions are not grammatical error corrections Domain of revision history is limited and different from the target GEC domain
  6. Method - 1 : Beginner Translator (SMT) Beginner Translator (SMT)

    • Meaning-preserving with respect to the input sentences • Low fluency and contain many grammatical errors • Translation output resembles that written by non-native speakers • Previous study [Qiu and Park, 2019] more effectively when source sentences are of lower fluency • Manually reduce the weight of language model in the tuned SMT Google translate : Anyway, I am very satisfied with everyone’s performance.
  7. Method - 2 : Advanced Translator (NMT) Advanced Translator (NMT)

    • “valid translation” (meaning-preserving, fluent and grammatically correct) • Available parallel corpora for MT (generally large and cheaper) • Easily convert parallel corpora into GEC training data
  8. Evaluation Evaluation data • BEA 19 shared task on GEC

    • CoNLL-2014 test set Settings • Primary gold : explore and analyze the effect of pre-training with synthetic parallel data in the proposed approach • Without extensive tricks iterative decoding model ensembling edit-weighted MLE objective right-to-left reranking external spell checker, etc.
  9. Models Beginner translator model (SMT) • Moses [Koehn et al.,

    2007] • word-aligning : MGIZA++ • language model : KenLM • tune the weights : MERT to optimize the system’s BLEU and creating two replicas of tuned model (total : 3 models, ! ) tuned model by manually increasing or decreasing the weight Advanced Translator model (NMT) • Transformer-based NMT model (transformer big) • Chinese sentence : segmented into word-level • English word tokens : split into subword (BPE) SMThigh,tuned,low
  10. Dataset Chinese-English parallel data (for training translation models) • UN

    Corpus [Ziemski et al., 2016] • 15M parallel sentence pairs with around 400M tokens Monolingual Chinese corpora (for synthesize GEC data) • news2016zh [Xu, 2019] • news corpus containing 2.5M Chinese news articles In experiments, • 10M pseudo-parallel data (SMT-NMT) • 10M sentence pairs (SMT-gold) • To compare : NewsCrawl dataset + random corruption (40M sents) [Zhao et al., 2019] • Filter the generated corpora based on the fluency [Ge et al., 2018] • Discard : fluency of target sentence is lower than that of the source sentence
  11. Performance of translation model BLUE score of SMT and NMT

    • newstest17 Chinese-English translation test set • NMT are much better than all SMT • manually decreasing the language model weight in the SMT results in a worse BLUE score • ! : indicate more grammatical errors in translated sentences ? SMTlow
  12. Results on unsupervised GEC training Unsupervised GEC training • Ours

    : proposed method • Corruption : random corruption 20M/40M sentence pairs • proposed method outperform both the rule-based corruption • It may contain more realistic errors compared with pre-defined rules Influence of the LM weight • decreasing the LM weight : better result Katsumata and Komachi, 2019, BEA
  13. Fine-tuning Results Dataset • Lang-8 + NUCLE Result • Combining

    both synthetic data sources yields consistent improvements • decreasing LM weight in SMT may help training GEC models
  14. Qualitative Analysis Advanced translator (NMT) • Translations are generally of

    good quality and are very similar to the ground- truth translation → Target sentences are generally grammatically correct Comparing erroneous sentences • Random corruption limited artificial errors (repetition and deletion of tokens) • Proposed method is able to introduce much more realistic errors which resemble that generated by ESL learners Comparing LM weights • ! : Tend to be more fluent and less grammatical errors • ! : Meaning-preserving but contain massive grammatical errors → decreasing LM weight yields better performance SMThigh SMTlow
  15. Ablation Study Analyze each component in this method • Both

    pairs contribute to GEC model • SMT-gold sentence pairs are slightly more effective than SMT-NMT • MT parallel corpora are limited in both size and domain → SMT-NMT : general and flexible
  16. Summary • MT pairs as the source and target sentences

    are effective to improve performance in GEC task • Performance can be improved by manually decreasing the LM weight in the SMT • This approach may contain more realistic errors compared with pre-defined rules