COLING 2012 review

COLING 2012 review (eNLP version) Mamoru Komachi 2012/12/17 Educational NLP
research group Computational Linguistics Lab Nara Institute of Science and Technology, Japan

Disclaimer • Not complete list, so please take a look
at the paper list by yourself! • I haven’t read any papers yet. I will talk about the impression from the presentation (oral, poster, demo) of their work. Please refer to the paper itself if you feel interested J

COLING 2012 ORAL • Joint English Spelling Error Correction and
POS Tagging for Language Learners Writing • Modeling ESL Word Choice Similarities by Representing Word Intensions and Extensions • Problems in Evaluating Grammatical Error Detection Systems • Mining Words in the Minds of Second Language Learners • Native Tongue, Lost and Found: Resources and Empirical Evaluations in Native Language Identification • Robust, Lexicalized Native Language Identification • Native Language Identification using Recurring N-grams

Joint English Spelling Error Correction and POS Tagging for Language
Leaners Writing Keisuke Sakaguchi, Tomoya Mizumoto, Mamoru Komachi and Yuji Matsumoto (NAIST, Japan) Problem: Spelling errors and POS tags often coincide, but each task has been solved separately Idea: Jointly perform spelling correction and POS tagging by variable length CRF (to deal with split/merge errors) • Joint model outperforms the pipeline model • Shorter outputs due to removal of delimiters

Modeling ESL Word Choice Similarities By Representing Word Intensions and
Extensions Huichao Xue and Rebecca Hwa (University of Pittsburgh, USA) Problem: To construct a confusion set for grammatical error correction, it often relies on manually corrected learner corpus Idea: Use only a native corpus to create confusion sets by applying relevance component analysis • Better confusion sets can be learned from bilingual corpus and native corpus • Created confusion sets correlate well with real mistakes

Problems in Evaluating Grammatical Error Detection Systems Martin Chodorow, Markus
Dickinson, Ross Israel and Joel Tetreault (City University of New York, USA) Problem: Many evaluation metrics have been used for grammatical error detection, but none of them addresses the issue of data skewness Idea: Propose best practices • Report raw frequencies (tp, fn, fp, tn) – Also report how you define true nevatives • Treat unit size (exact match/overlap) carefully • Consider weighting the reliability of judgments

Mining Words in the Minds of Second Language Learners Yo
Ehara, Issei Sato, Hidekazu Oiwa and Hiroshi Nakagawa (University of Tokyo, Japan) Problem: Though there are many studies on measuring the size of learners’ vocabulary, few studies address what kind of words they know Idea: Define a learner-specific word difficulty measure • Theoretically sound and practically useful extension to previous models • Able to obtain interpretable weight vector

Native Tongues, Lost and Found: Resources and Empirical Evaluations in
Native Language Identification Joel Tetreault, Daniel Blanchard, Aoife Cahill and Martin Chodorow (ETS, USA) Problem: Previous NLI task uses ICLE, but the corpus is highly skewed Idea: Create a new balanced corpus (TOEFL11) and evaluate on cross-corpora • Many trends in previous work on ICLE generalize to other corpora • Training on a large corpus and testing on a smaller one works well, but not vice versa • Accuracy varies across proficiency levels

Robust, Lexicalized Native Language Identification Julian Brooke and Graem Hirst
(University of Toronto, Canada) Problem: Previous NLI research uses only small single corpora, which limit using lexical features Idea: Extract an ESL corpus form Lang-8 to use lexical features and perform cross-corpus evaluation • Shallow lexical features contribute much more than sophisticated syntactic features • Domain adaptation gives improvement • Evaluation on a single corpus may be questionable

Native Language Identification using Recurring n-grams Serhiy Bykh and Detmar
Meurers (Universitaet Tuebingen, Germany) Problem: Since NLI task is a new field, features for NLI task are not well-studied Idea: Explore surface/Open-Class-POS/POS n- gram features and evaluate on cross-corpora • The finer the features, the better the accuracy • Features learned from ICLE well generalized to other corpora, unlike (Brooke and Hirst, 2011) which uses Lang-8 as a training corpus for NLI

COLING 2012 POSTERS • The Effect of Learner Corpus Size
in Grammatical Error Correction of ESL Writings • Defining Syntax for Learner Language Annotation

The Effect of Learner Corpus Size in Grammatical Error Correction
of ESL Writings Tomoya Mizumoto, Yuta Hayashibe, Mamoru Komachi, Masaaki Nagata and Yuji Matsumoto (NAIST, Japan) Problem: Until recently, no large-scale ESL corpora has been publicly available for grammatical error correction Idea: Extract ESL corpus from the web and see the effect of corpus size in grammatical correction • Phrase-based SMT trained on large-scale data is effective in preposition, article, lexical choice • Syntax and discourse information needed for tense, agreement, noun number errors

Defining Syntax for Learner Language Annotation Marwa Ragheb and Markus
Dickinson (Indiana University, USA) Problem: Though POS annotation has been proposed for ESL langauge, annotating syntax for learner language is not well studied Idea: Investigate multiple layered annotation (morphological dependencies, distributional dependencies, and subcategorization) for ESL texts • Subcategorization seems preferable over other two layers, since ESL texts are often hard to parse • Open question: how can we generalize this framework to other non-canonical languages?

Summary • Introduced eNLP-related papers presented at COLING 2012 •
A lot of work on native language identification done (there will be a shared task on NLI at BEA-8, collocated with NAACL 2013) • Cross-corpora scalability is important • Future research should go beyond surface and POS level (semantic, syntactic and discourse information be investigated)

COLING 2012 review

COLING 2012 review

Mamoru Komachi

More Decks by Mamoru Komachi

Other Decks in Research

Featured

Transcript

COLING 2012 review (eNLP version) Mamoru Komachi 2012/12/17 Educational NLP

Disclaimer • Not complete list, so please take a look

COLING 2012 ORAL • Joint English Spelling Error Correction and

Joint English Spelling Error Correction and POS Tagging for Language

Modeling ESL Word Choice Similarities By Representing Word Intensions and

Problems in Evaluating Grammatical Error Detection Systems Martin Chodorow, Markus

Mining Words in the Minds of Second Language Learners Yo

Native Tongues, Lost and Found: Resources and Empirical Evaluations in

Robust, Lexicalized Native Language Identification Julian Brooke and Graem Hirst

Native Language Identification using Recurring n-grams Serhiy Bykh and Detmar

COLING 2012 POSTERS • The Effect of Learner Corpus Size

The Effect of Learner Corpus Size in Grammatical Error Correction

Defining Syntax for Learner Language Annotation Marwa Ragheb and Markus

Summary • Introduced eNLP-related papers presented at COLING 2012 •