Neural Sequence-Labelling Models for Grammatical Error Correction

Neural Sequence-Labelling Models for Grammatical Error Correction Helen Yannakoudakis, Marek
Rei, Øistein E. Andersen and Zheng Yuan Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2795–2806, 2017 ⽂献紹介（2018/04/19）⻑岡技術科学⼤学⾃然⾔語処理研究室⼩川耀⼀朗 1

Abstract Ø This paper proposed N-best list re-ranking using neural
sequence-labelling models. • calculates the probability of each tokens being correct or incorrect. Ø Results achieved state-of-the-art in GEC. 2

Grammatical Error Correction (GEC) l GEC in non-native text attempts
to automatically detect and correct errors. l Given an ungrammatical input sentence, the task is formulated as “translating“ it to its grammatical sentence. 3

Grammatical Error Correction (GEC) l SMT framework has been successfully
used, but 4 (Yuan et al. 2016)

Grammatical Error Correction (GEC) l SMT framework has been successfully
used, but 5 (Yuan et al. 2016) N-best list re-ranking

Components 6 N-best candidate list Features: ・Sentence probability ・Levenshtein distance
・True and false positives ・SMT system's output score Error detection model using neural sequence-labelling input text output text SMT Re-ranking

Neural sequence-labelling ü Error Detection ó Sequence Labelling task 7
l This network predicts a probability of each token whether it is correct or incorrect. l Combining a regular token embedding and a character-base token representation

Neural sequence-labelling 8 l Multi-task loss function which combines with
the two language modeling objectives ü Error Detection ó Sequence Labelling task

Error detection performance l Baseline LSTMFCE : token level embedding
l LSTMFCE : proposed model (same data and evaluate) l LSTM: larger training set 9

Components 10 N-best candidate list Features: ・Sentence probability ・Levenshtein distance
・True and false positives ・SMT system's output score Error detection model using neural sequence-labelling input text output text SMT Re-ranking

N-best list re-ranking l Using following features to assign a
score to each candidate n Sentence probability the overall sentence probability of error detection model outputs (∑ () * ) n Levenshtein distance (LD) a candidate with the smallest LD would like to be selected (+ ,- ⁄ ) n True and false positives how many times the candidate hypothesis agree or not with the detection model on the tokens identified as incorrect (01 21 ⁄ ) 11

Error correction performance l a 12

Conclusion l This paper proposed N-best list re-ranking using neural
sequence-labelling model that calculates the probability of each token in a sentence being correct or incorrect in context. l Results achieved state-of-the-art on GEC l This approach can be applied to any GEC system that produces multiple alternative hypotheses. 13

References l Zheng Yuan, Ted Briscoe, and Mariano Felice. 2016.
Candidate re- ranking for SMT-based grammatical error correction. In Proceedings of the 11th Workshop on Innovative Use of NLP for building Educational Applications, pages 256-266. 14

Other tables 15

Other tables 16 The error types are interpreted as follows:
Missing error; Replace error; Unnecessary error.

Neural Sequence-Labelling Models for Grammatic...

Neural Sequence-Labelling Models for Grammatical Error Correction

youichiro

More Decks by youichiro

Other Decks in Technology

Featured

Transcript

Neural Sequence-Labelling Models for Grammatical Error Correction Helen Yannakoudakis, Marek

Abstract Ø This paper proposed N-best list re-ranking using neural

Grammatical Error Correction (GEC) l GEC in non-native text attempts

Grammatical Error Correction (GEC) l SMT framework has been successfully

Grammatical Error Correction (GEC) l SMT framework has been successfully

Components 6 N-best candidate list Features: ・Sentence probability ・Levenshtein distance

Neural sequence-labelling ü Error Detection ó Sequence Labelling task 7

Neural sequence-labelling 8 l Multi-task loss function which combines with

Error detection performance l Baseline LSTMFCE : token level embedding

Components 10 N-best candidate list Features: ・Sentence probability ・Levenshtein distance

N-best list re-ranking l Using following features to assign a

Error correction performance l a 12

Conclusion l This paper proposed N-best list re-ranking using neural

References l Zheng Yuan, Ted Briscoe, and Mariano Felice. 2016.

Other tables 15

Other tables 16 The error types are interpreted as follows: