at the paper list by yourself! • I haven’t read any papers yet. I will talk about the impression from the presentation (oral, poster, demo) of their work. Please refer to the paper itself if you feel interested J
POS Tagging for Language Learners Writing • Modeling ESL Word Choice Similarities by Representing Word Intensions and Extensions • Problems in Evaluating Grammatical Error Detection Systems • Mining Words in the Minds of Second Language Learners • Native Tongue, Lost and Found: Resources and Empirical Evaluations in Native Language Identification • Robust, Lexicalized Native Language Identification • Native Language Identification using Recurring N-grams
Leaners Writing Keisuke Sakaguchi, Tomoya Mizumoto, Mamoru Komachi and Yuji Matsumoto (NAIST, Japan) Problem: Spelling errors and POS tags often coincide, but each task has been solved separately Idea: Jointly perform spelling correction and POS tagging by variable length CRF (to deal with split/merge errors) • Joint model outperforms the pipeline model • Shorter outputs due to removal of delimiters
Extensions Huichao Xue and Rebecca Hwa (University of Pittsburgh, USA) Problem: To construct a confusion set for grammatical error correction, it often relies on manually corrected learner corpus Idea: Use only a native corpus to create confusion sets by applying relevance component analysis • Better confusion sets can be learned from bilingual corpus and native corpus • Created confusion sets correlate well with real mistakes
Dickinson, Ross Israel and Joel Tetreault (City University of New York, USA) Problem: Many evaluation metrics have been used for grammatical error detection, but none of them addresses the issue of data skewness Idea: Propose best practices • Report raw frequencies (tp, fn, fp, tn) – Also report how you define true nevatives • Treat unit size (exact match/overlap) carefully • Consider weighting the reliability of judgments
Ehara, Issei Sato, Hidekazu Oiwa and Hiroshi Nakagawa (University of Tokyo, Japan) Problem: Though there are many studies on measuring the size of learners’ vocabulary, few studies address what kind of words they know Idea: Define a learner-specific word difficulty measure • Theoretically sound and practically useful extension to previous models • Able to obtain interpretable weight vector
Native Language Identification Joel Tetreault, Daniel Blanchard, Aoife Cahill and Martin Chodorow (ETS, USA) Problem: Previous NLI task uses ICLE, but the corpus is highly skewed Idea: Create a new balanced corpus (TOEFL11) and evaluate on cross-corpora • Many trends in previous work on ICLE generalize to other corpora • Training on a large corpus and testing on a smaller one works well, but not vice versa • Accuracy varies across proficiency levels
(University of Toronto, Canada) Problem: Previous NLI research uses only small single corpora, which limit using lexical features Idea: Extract an ESL corpus form Lang-8 to use lexical features and perform cross-corpus evaluation • Shallow lexical features contribute much more than sophisticated syntactic features • Domain adaptation gives improvement • Evaluation on a single corpus may be questionable
Meurers (Universitaet Tuebingen, Germany) Problem: Since NLI task is a new field, features for NLI task are not well-studied Idea: Explore surface/Open-Class-POS/POS n- gram features and evaluate on cross-corpora • The finer the features, the better the accuracy • Features learned from ICLE well generalized to other corpora, unlike (Brooke and Hirst, 2011) which uses Lang-8 as a training corpus for NLI
of ESL Writings Tomoya Mizumoto, Yuta Hayashibe, Mamoru Komachi, Masaaki Nagata and Yuji Matsumoto (NAIST, Japan) Problem: Until recently, no large-scale ESL corpora has been publicly available for grammatical error correction Idea: Extract ESL corpus from the web and see the effect of corpus size in grammatical correction • Phrase-based SMT trained on large-scale data is effective in preposition, article, lexical choice • Syntax and discourse information needed for tense, agreement, noun number errors
Dickinson (Indiana University, USA) Problem: Though POS annotation has been proposed for ESL langauge, annotating syntax for learner language is not well studied Idea: Investigate multiple layered annotation (morphological dependencies, distributional dependencies, and subcategorization) for ESL texts • Subcategorization seems preferable over other two layers, since ESL texts are often hard to parse • Open question: how can we generalize this framework to other non-canonical languages?
A lot of work on native language identification done (there will be a shared task on NLI at BEA-8, collocated with NAACL 2013) • Cross-corpora scalability is important • Future research should go beyond surface and POS level (semantic, syntactic and discourse information be investigated)