I will introduce five papers from ACL 2012 and three papers from EMNLP 2012, mainly related to the educational application of natural language processing.
Slides were presented at the Educational NLP research group regular meeting (NAIST, Japan).
• Not complete list, so please take a look at accepted papers by yourself! • More papers on related areas such as spelling correction and text normalization (especially for microblogs like Twitter) • Disclaimer: I haven’t read any papers yet. I will talk about the impression from the presentation (oral, poster, demo) of their work. Please refer to the paper itself if you feel interested J
A Corpus of Textual Revisions in Second Language Writing • A Meta Learning Approach to Grammatical Error Correction • FLOW: A First-Language-Oriented Writing Assistant System • Grammar Error Correction Using Pseudo-Error Sentences and Domain Adaptation
Swanson and Eugene Charniak (Brown University, USA) Problem: Though syntactic features are known to be useful for native language detection, CFG rules cannot capture long range dependencies Idea: Use Tree Substitution Grammar to extract tree fragments for native language identification • Use tree fragments as features for MaxEnt classifier • Tested on ICLE and outperformed baselines (CFG and frequent-based tree mining)
Lee and Jonathan Webster (City University of Hong Kong) Problem: There is no ESL corpus containing sentence aligned revision logs Idea: Collected a corpus with (possibly multiple) revision logs of ESL learners • Errors are identified by language teachers (not necessarily the same person for each revision) • Mail them to get a copy for research purpose
Hongsuck Seo, Jonghoon Lee, Seokhwan Kim, Kyusong Lee, Sechun Kang, and Gary Geunbae Lee (PosTech, Korea) Problem: There are many ESL corpora which have different characteristics Idea: Train several classifiers using different corpora, and combine them with a meta-classifier • Base classifiers use ASO (Ando and Zhang, 2005) to train a model from both a native corpus and an error-tagged corpus • Meta-Learner improves precision and F1 on article error correction task
Hung-Ting Hsieh, Ting-Hui Kao, and Jason S. Chang (National Tsing Hua University, Taiwan) http://www.youtube.com/watch?v=uhH55fEPiqI Problem: Previous ESL assistance tool does not take context and native language into account Idea: Developed a browser-based ESL writing assistance system for Chinese speakers • Can accept Chinese input given English context, and show predictive text by N-gram • Paraphrase suggestion by translation from En- >Ch->En
Paper) Kenji Imamura, Kuniko Saito, Kugatsu Sadamitsu, and Hitoshi Nishikawa (NTT) Problem: Error-tagged corpora of language learners are hard to obtain Idea: Automatically generates error-tagged corpora using a confusion set (derived from manually tagged corpus) • Applied Frustratingly-easy domain adaptation • Domain adaptation gives stable improvement
Hwee Tou Ng (NUS, Singapore) Problem: Traditional approach uses multi-class pointwise prediction, which does not correct a sentence as a whole Idea: Build a beam search decoder that combines the classification approach and SMT • Pipeline. Proposers generate candidates and experts ranks generated candidates • Tested on spelling, article, preposition, punctuation insertion and noun number task and achieved state-of-the-art
Su-Youn Yoon and Suma Bhat (UIUC, USA) Problem: Previous studies focus on the length of the output, such as the mean length of clauses Idea: Focus on morpho-syntactic features for measuring English proficiency • Constructed POS-based vector space model for each proficiency level • POS tag sequences are robust and highly correlates with human evaluation
Mark Dras, and Mark Johnson (Macquarie University, Australia) Problem: {word,character,POS} N-gram features for native language identification do not consider long range contextual information Idea: Use Adapter Grammar (a non-parametric extension to PCFGs) to capture long n-grams • Built a MaxEnt classifier to combing syntactic language model and n-gram collocations • Experimental results are not stable, but shows better accuracy overall
M2/D students: eNLP can exploit sophisticated methods explored in sequence labeling, parsing and SMT (e.g. string-to-tree, tree substitution grammer, etc) • For M1 students: Find a good problem and think hard to solve it!