Slide 1

Slide 1 text

ACL/EMNLP 2012 review (eNLP version) Mamoru Komachi 2012/07/17 Educational NLP research group Computational Linguistics Lab Nara Institute of Science and Technology, Japan

Slide 2

Slide 2 text

Today’s agenda • Introduce several papers presented at ACL/EMNLP conferences • Not complete list, so please take a look at accepted papers by yourself! • More papers on related areas such as spelling correction and text normalization (especially for microblogs like Twitter) • Disclaimer: I haven’t read any papers yet. I will talk about the impression from the presentation (oral, poster, demo) of their work. Please refer to the paper itself if you feel interested J

Slide 3

Slide 3 text

ACL • Native Language Detection with Tree Substitution Grammars • A Corpus of Textual Revisions in Second Language Writing • A Meta Learning Approach to Grammatical Error Correction • FLOW: A First-Language-Oriented Writing Assistant System • Grammar Error Correction Using Pseudo-Error Sentences and Domain Adaptation

Slide 4

Slide 4 text

Native Language Detection with Tree Substitution Grammars (Short Paper) Ben Swanson and Eugene Charniak (Brown University, USA) Problem: Though syntactic features are known to be useful for native language detection, CFG rules cannot capture long range dependencies Idea: Use Tree Substitution Grammar to extract tree fragments for native language identification • Use tree fragments as features for MaxEnt classifier • Tested on ICLE and outperformed baselines (CFG and frequent-based tree mining)

Slide 5

Slide 5 text

A Corpus of Textual Revisions in Second Language Writing John Lee and Jonathan Webster (City University of Hong Kong) Problem: There is no ESL corpus containing sentence aligned revision logs Idea: Collected a corpus with (possibly multiple) revision logs of ESL learners • Errors are identified by language teachers (not necessarily the same person for each revision) • Mail them to get a copy for research purpose

Slide 6

Slide 6 text

A Meta Learning Approach to Grammatical Error Correction (Short Paper) Hongsuck Seo, Jonghoon Lee, Seokhwan Kim, Kyusong Lee, Sechun Kang, and Gary Geunbae Lee (PosTech, Korea) Problem: There are many ESL corpora which have different characteristics Idea: Train several classifiers using different corpora, and combine them with a meta-classifier • Base classifiers use ASO (Ando and Zhang, 2005) to train a model from both a native corpus and an error-tagged corpus • Meta-Learner improves precision and F1 on article error correction task

Slide 7

Slide 7 text

FLOW: A First-Language-Oriented Writing Assistant System Mei-Hua Chen, Shih-Ting Huang, Hung-Ting Hsieh, Ting-Hui Kao, and Jason S. Chang (National Tsing Hua University, Taiwan) http://www.youtube.com/watch?v=uhH55fEPiqI Problem: Previous ESL assistance tool does not take context and native language into account Idea: Developed a browser-based ESL writing assistance system for Chinese speakers • Can accept Chinese input given English context, and show predictive text by N-gram • Paraphrase suggestion by translation from En- >Ch->En

Slide 8

Slide 8 text

Grammar Error Correction Using Pseudo-Error Sentences and Domain Adaptation (Short Paper) Kenji Imamura, Kuniko Saito, Kugatsu Sadamitsu, and Hitoshi Nishikawa (NTT) Problem: Error-tagged corpora of language learners are hard to obtain Idea: Automatically generates error-tagged corpora using a confusion set (derived from manually tagged corpus) • Applied Frustratingly-easy domain adaptation • Domain adaptation gives stable improvement

Slide 9

Slide 9 text

EMNLP • A Beam-Search Decoder for Grammatical Error Correction • Assessment of ESL Learners’ Syntactic Competence Based on Similarity Measures • Exploring Adaptor Grammars for Native Language Identification

Slide 10

Slide 10 text

A Beam-Search Decoder for Grammatical Error Correction Daniel Dahlmeier and Hwee Tou Ng (NUS, Singapore) Problem: Traditional approach uses multi-class pointwise prediction, which does not correct a sentence as a whole Idea: Build a beam search decoder that combines the classification approach and SMT • Pipeline. Proposers generate candidates and experts ranks generated candidates • Tested on spelling, article, preposition, punctuation insertion and noun number task and achieved state-of-the-art

Slide 11

Slide 11 text

Assessment of ESL Learners’ Syntactic Competence Based on Similarity Measures Su-Youn Yoon and Suma Bhat (UIUC, USA) Problem: Previous studies focus on the length of the output, such as the mean length of clauses Idea: Focus on morpho-syntactic features for measuring English proficiency • Constructed POS-based vector space model for each proficiency level • POS tag sequences are robust and highly correlates with human evaluation

Slide 12

Slide 12 text

Exploring Adaptor Grammars for Native Language Identification Sze-Meng Jojo Wong, Mark Dras, and Mark Johnson (Macquarie University, Australia) Problem: {word,character,POS} N-gram features for native language identification do not consider long range contextual information Idea: Use Adapter Grammar (a non-parametric extension to PCFGs) to capture long n-grams • Built a MaxEnt classifier to combing syntactic language model and n-gram collocations • Experimental results are not stable, but shows better accuracy overall

Slide 13

Slide 13 text

Summary • Introduced eNLP-related papers presented at ACL/EMNLP • For M2/D students: eNLP can exploit sophisticated methods explored in sequence labeling, parsing and SMT (e.g. string-to-tree, tree substitution grammer, etc) • For M1 students: Find a good problem and think hard to solve it!