Gustavo Henrique Paetzold and Lucia Specia, Department of Computer Science, University of Sheffield, UK, Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 717–727, 長岡技術科学大学 自然言語処理研究室 勝田 哲弘 1 2018/4/19
基準に従って、母国語、年齢、教育レベル、 英語能力レベルを調査 • CEFR(Common European Reference for Language) 文章から難しい単語を選択する For each sentence, mark all the words you do not understand, even if you understand the sentence as a whole. If you understand all of them, just select the “I understand all words!” option. 7
and number of syllables (Burns, 2013) – Semantic: • Number of senses, synonyms, hypernyms and hyponyms (Fellbaum,1998) – Lexical: • N-gram language model log-probabilities – the SubIMDB (Paetzold and Specia, 2016) – Subtlex (Brysbaert and New, 2009) – Simple Wikipedia (Kauchak, 2013) corpora 11
of complex words with respect to their grammaticality and meaning preservation. When judging, please ignore any grammatical errors that are not caused by the substitution. – The substitution preserves the sentence’s grammaticality – The substitution preserves the original sentence’s meaning • 1600文は5人、23940文は1人ずつ • 平易さは考慮しない 16
2~4 • 3人以上が両方保持しているとした候補 • 候補を2つ出してどちらが分かりやすいか を選ぶ For each of the following instances, select which candidate makes the sentence easier to understand. If the words are equally complex/simple, select the “The words are equally simple” option. Please overlook any grammatical or spelling errors. 20