Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Neural Sequence-Labelling Models for Grammatical Error Correction

Neural Sequence-Labelling Models for Grammatical Error Correction

長岡技術科学大学
自然言語処理研究室
文献紹介(2018-04-19)

youichiro

April 18, 2018
Tweet

More Decks by youichiro

Other Decks in Technology

Transcript

  1. Neural Sequence-Labelling Models for Grammatical Error Correction Helen Yannakoudakis, Marek

    Rei, Øistein E. Andersen and Zheng Yuan Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2795–2806, 2017 ⽂献紹介(2018/04/19) ⻑岡技術科学⼤学 ⾃然⾔語処理研究室 ⼩川 耀⼀朗 1
  2. Abstract Ø This paper proposed N-best list re-ranking using neural

    sequence-labelling models. • calculates the probability of each tokens being correct or incorrect. Ø Results achieved state-of-the-art in GEC. 2
  3. Grammatical Error Correction (GEC) l GEC in non-native text attempts

    to automatically detect and correct errors. l Given an ungrammatical input sentence, the task is formulated as “translating“ it to its grammatical sentence. 3
  4. Grammatical Error Correction (GEC) l SMT framework has been successfully

    used, but 5 (Yuan et al. 2016) N-best list re-ranking
  5. Components 6 N-best candidate list Features: ・Sentence probability ・Levenshtein distance

    ・True and false positives ・SMT system's output score Error detection model using neural sequence-labelling input text output text SMT Re-ranking
  6. Neural sequence-labelling ü Error Detection ó Sequence Labelling task 7

    l This network predicts a probability of each token whether it is correct or incorrect. l Combining a regular token embedding and a character-base token representation
  7. Neural sequence-labelling 8 l Multi-task loss function which combines with

    the two language modeling objectives ü Error Detection ó Sequence Labelling task
  8. Error detection performance l Baseline LSTMFCE : token level embedding

    l LSTMFCE : proposed model (same data and evaluate) l LSTM: larger training set 9
  9. Components 10 N-best candidate list Features: ・Sentence probability ・Levenshtein distance

    ・True and false positives ・SMT system's output score Error detection model using neural sequence-labelling input text output text SMT Re-ranking
  10. N-best list re-ranking l Using following features to assign a

    score to each candidate n Sentence probability the overall sentence probability of error detection model outputs (∑ () * ) n Levenshtein distance (LD) a candidate with the smallest LD would like to be selected (+ ,- ⁄ ) n True and false positives how many times the candidate hypothesis agree or not with the detection model on the tokens identified as incorrect (01 21 ⁄ ) 11
  11. Conclusion l This paper proposed N-best list re-ranking using neural

    sequence-labelling model that calculates the probability of each token in a sentence being correct or incorrect in context. l Results achieved state-of-the-art on GEC l This approach can be applied to any GEC system that produces multiple alternative hypotheses. 13
  12. References l Zheng Yuan, Ted Briscoe, and Mariano Felice. 2016.

    Candidate re- ranking for SMT-based grammatical error correction. In Proceedings of the 11th Workshop on Innovative Use of NLP for building Educational Applications, pages 256-266. 14
  13. Other tables 16 The error types are interpreted as follows:

    Missing error; Replace error; Unnecessary error.