A Neural Grammatical Error Correction System Built On Better Pre-training and Sequential Transfer Learning

Slide 1

Slide 1 text

長岡技術科学大学自然言語処理研究室小川耀一朗文献紹介（2019-10-21） A Neural Grammatical Error Correction System Built On Better Pre-training and Sequential Transfer Learning

Slide 2

Slide 2 text

Paper 2

Slide 3

Slide 3 text

Previous Work 3 Replace Delete Insert Shuffe clean sentence noisy sentence 10% 10% 10% normal distribution Copy Mechanism Denoising Auto-encoder

Slide 4

Slide 4 text

Previous Work 4 Replace Delete Insert Shuffe clean sentence noisy sentence 10% 10% 10% normal distribution Copy Mechanism Denoising Auto-encoder Randomではなく Realisticに誤り生成

Slide 5

Slide 5 text

5 Motivation 先行研究の手法では ● replace/delete/insert/shuffle をランダムに実行してノイズ生成 ● replaceで置換される単語は語彙の中からランダムに選択されるしかし ● word orderエラーは他のエラーに比べて少ない ● 置換する単語を語彙の中からランダムに選択するのは現実的ではない提案手法では ● shuffle は行わない ● 置換候補を事前に用意しておき、その中から 1つ選択して置換する

Slide 6

Slide 6 text

6 Realistic Noising Method token-based type-based token token*

Slide 7

Slide 7 text

7 Realistic Noising Method token-based type-based Prepare ● GECコーパスから[訂正前→訂正後]の編集ペアを収集 (EditDict) ● ex) [of → at], [has → have] Generate ● EditDictを逆利用することで、正しいトークンを誤りに置換する ● 入力トークンがEditDictに含まれていたら ○ 90%の確率で置換する ○ 候補の中から出現確率に従って 1つ選択して置換する token token*

Slide 8

Slide 8 text

8 Realistic Noising Method token-based type-based Prepare ● preposition, noun, verb で同じ品詞のセットを作っておく Generate ● token-basedで置換されなかったトークンに対して、品詞に応じて以下を実行 ○ preposition → 他の前置詞に置換 ○ noun → 単数形/複数形を変化 ○ verb → 活用を変化 (候補の中からランダムに選択 ) token token*

Slide 9

Slide 9 text

9 Realistic Noising Method 3つのタグなし学習者コーパスを擬似誤り生成のシードコーパスとして使用 ● Gutenberg ○ エラーの少ないcleanなコーパス ● Tatoeba ○ 口語で、辞書的な説明文 ● WikiText-103 ○ Wikipedia記事 Gutenberg × 1 times + Tatoeba × 12 times + WikiText-103 × 5 times = 45M を擬似誤りデータとして使用

Slide 10

Slide 10 text

10 Models ● large (実験ではこのモデルを使用) ○ vanilla Transformer ○ 6 blocks ○ 1024-4096 units ○ 16 attention heads ○ pre-attention layer normalization ● base ○ vanilla Transformer ○ 6 blocks ○ 512-2048 units ○ 8 attention heads ● copy ○ copy-augmented Transformer (Zhao et al,. 2019) ○ 6 blocks ○ 512-4096 units ○ 8 attention heads

Slide 11

Slide 11 text

11 Training ● Pre-training(DAE) ○ 擬似誤りデータ(45M)でモデルをpre-train ● Training ○ 学習者コーパスをtrain ● Fine-tuning ○ テストセットにドメインが近い訓練コーパスで fine-tuning(もう一回train)する ○ Domain-adaptation

Slide 12

Slide 12 text

12 Datasets BEA Workshop 2019のRestricted Track, Low-resource Track 及び CoNLL-2014の3つの実験で使用したコーパス

Slide 13

Slide 13 text

13 Results (Restricted) Table 3: BEA Workshop Restricted Track results. ● Pre-trainの時点で高いスコア(54.82) → realistic noisingの効果か ● base, large × 2, copy × 2 の5つでアンサンブル ● BEA Workshop 2019 では2位

Slide 14

Slide 14 text

14 Results (CoNLL2014) ● state-of-the-artなスコアに匹敵 ● 本家のcopy-augmented Transformerには届かず ○ 理由に言及なし

Slide 15

Slide 15 text

15 ● realisticの方がスコアは高い ● ギャップは減少していく Comparison of noising methods

Slide 16

Slide 16 text

16 ● 先行研究(copy-aug.)の擬似誤り生成手法を random ではなく realistic に変更して検証 ● BEA Workshop 2019 ではRestricted Trackで2位、Low-resource Trackで2位 Conclusion