Automated Essay Scoring with Discourse-Aware Neural Models

Automated Essay Scoring with Discourse-Aware Neural Models 文献紹介 (2020-01-28) 長岡技術科学大学
自然言語処理研究室小川耀一朗 Farah Nadeem, Huy Nguyen, Yang Liu, and Mari Ostendorf BEA Workshop 2019, pages 484-493

Automated Essay Scoring 2 Prompt More and more people use
computers, but not everyone agrees that this benefits society. … Write a letter to your local newspaper in which you state your opinion on the effects computers have on people. Essay Dear local newspaper, I think effects computers have on people are great learning skills/affects because they give us time to chat with friends/new people, helps us learn about the globe(astronomy) and keeps us out of troble! … Thank you for listening. score: 4 (1~6) 自動小論文採点: 小論文に自動でスコア付けを行うタスク

Related Work これまでの流れ • feature-based ◦ length, N-gram, word category,
readability, syntactic, semanticなど ◦ Linear regressionでスコア予測 • LSTM-based ◦ feature-basedと同等かそれ以上の性能 • NN+features (Liu et al., 2019) ◦ state-of-the-artのモデル 3

Related Work これまでの流れ • feature-based ◦ length, N-gram, word category,
readability, syntactic, semanticなど ◦ Linear regressionでスコア予測 • LSTM-based ◦ feature-basedと同等かそれ以上の性能 • NN+features (Liu et al., 2019) ◦ state-of-the-artのモデル 4 本研究では • アーキテクチャを文書レベルに拡張する • 談話を認識するための事前学習を行う

Models 5 LSTMをベースとした文書レベルのモデルを用いて文書構造を捉える • Hierarchical RNN with attention (HAN) •
Bidirectional context with attention (BCA)

Hierarchical RNN with attention (HAN) 単語レベルのエンコーダと文レベルのエンコーダを重ねた階層構造単語レベル及び文レベルでAttentionを適用することにより重要な単語及び文を抽出する 6
[Yang et al., 2016]

Bidirectional context with attention (BCA) 7 [Nadeem and Ostendorf, 2018]
Hierachical RNNを拡張し, 前の文及び後の文との依存関係を捉える時刻 t の潜在表現 hlt に加えて、前の文及び後の文の単語列とのAttentionを計算し、その加重平均ベクトルを結合する

関連するタスクで事前学習を行い、訓練データ不足に対処する • Natural language inference (NLI) ◦ 2つの文から関係性(矛盾, 合意, 中性)を予測する
• Discourse marker prediction (DM) ◦ 2つの文からdiscourse markerを予測する ◦ (however, in other words, meanwhileなど) • BERT embeddings ◦ contextualized word embeddingsとして使用 Pre-training Tasks 8

Training Methods 9

Training Methods 10 NLI, DMタスクの事前学習時 2つの文の文ベクトルを出力して結合し、 Feedforward NNを通してラベルを予測する

Training Methods 11 採点タスクの訓練時事前学習でのAttentionとFeedforward NNを除いたパラメータを共有する

Non-Native Written English from the Linguistic Data Consortium (LDC) -
TOEFLエッセイに [high, medium, low] が付与されている Automated Student Assessment Prize (ASAP) - kaggleコンペのデータセット Dataset 12

Results on LDC TOEFL corpus 13 モデル設定 (1) LDC(小論文データ)で学習 (2)
NLIかDMのどちらかを事前学習+(1) (3) NLIとDMの両方を事前学習+(1) (4) (1)の入力をBERT embeddingsにする • HANよりも前後の文の関係を捉える BCA の方が性能が高い • NLIタスクの貢献は小さい • BERT-BCAが最も高い

Results on ASAP 14 TSLF(Liu 2019): BERT-HANと似た構造にhand-crafted featuresを用いた手法少ない訓練データにおいてはfeature-basedが強い

Conclusion • 自動小論文採点タスクにおいて、前後の文の関係を捉え、談話構造理解タスクで事前学習を行うことでTOEFLデータにて性能が向上した • BERT embeddingsは貢献が大きかった • ASAPコーパスにおいては訓練データが少ないため、hand-craft featuresを組
み合わせた手法が今もなお強力 15

References • [Yang et al., 2016] ◦ Hierarchical Attention Networks
for Document Classification ◦ Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, Eduard Hovy ◦ NAACL-HLT 2016, pages 1480-1489. • [Nadeem and Ostendorf, 2018] ◦ Estimating Linguistic Complexity for Science Texts ◦ Farah Nadeem, Mari Ostendorf ◦ BEA Workshop 2018, pages 45-55. 16

Additions QWK - https://ktrw.hatenablog.com/entry/2019/05/03/005011 Automated Essay Scoring: A Survey of
the State of the Art - https://www.ijcai.org/Proceedings/2019/0879.pdf - https://qiita.com/r-takahama/items/8f87aa1425cabb5d9a26 Kaggle - https://www.kaggle.com/c/asap-aes 17

Automated Essay Scoring with Discourse-Aware Ne...

Automated Essay Scoring with Discourse-Aware Neural Models

youichiro

More Decks by youichiro

Other Decks in Research

Featured

Transcript