Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting

Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting Yen-Chun Chen, Mohit
Bansal ACL 2018 読み手：松丸和樹（東京工業大学岡崎研究室 M1）最先端NLP勉強会 2018 2018/08/04 特に断りがない場合、図表等はすべて原論文より引用

Document Summarization (1/3) 2

Document Summarization (2/3) There are two approaches: Extractive, Abstractive ◼
Extractive method ◼ Abstractive method 3 1 2 … 1 3 4 Document sentences Summary sentences 1 2 … 1 2 3 Document sentences Summary sentences

Document Summarization (3/3) Some studies combine extractive and abstractive method.
4 From: Get To The Point: Summarization with Pointer-Generator Networks (See et al. 2017) soft switch to choose extractive or abstractive

Summary ◼ A novel model for document summarization  Model
extracts sentences first, then rewrites them  Use RL to bridge the nondifferentiable computation ◼ Improve repetition problem  At sentence-level, this problem doesn’t occur because model extracts original sentences first  At summary, it is improved by Repetition-Avoiding Reranking ◼ Get faster training and inference speed by parallel decoding ◼ SotA on CNN/Daily mail dataset 5

Summary ◼ A novel model for document summarization  Model
extracts sentences first, then rewrites them  Use RL to bridge the nondifferentiable computation ◼ Improve repetition problem  At sentence-level, this problem doesn’t occur because model extracts original sentences first  At summary, it is improved by Repetition-Avoiding Reranking ◼ Get faster training and inference speed by parallel decoding ◼ SotA on CNN/Daily mail dataset 6

Extract-Then-Rewrite In this paper, the extract-then-rewrite approach is proposed. 7
1 2 … 1 3 4 Document sentences Extracted sentences 1 2 3 Summary sentences “Extractor” “Abstractor”

Model Architecture 8 Architecture of Whole Model

Model Architecture - Extractor 1. the convolutional encoder computes for
each sentence 2. the RNN encoder (blue) computes ℎ 3. the RNN decoder (green) selects sentence at time step 9 ℎ4 ℎ3 ℎ2 ℎ1

Model Architecture - Abstractor Abstractor is given a sentence and
rewrites it. The architecture is almost the same as below. 11 From: Get To The Point: Summarization with Pointer-Generator Networks (See et al. 2017)

Learning Starting from an end-to-end fashion is infeasible.  When
randomly initialized, the extractor would often select sentences that are not relevant, so it would be difficult for the abstractor to learn. Hence, learning is conducted in two phases: 1. Train the Extractor and Abstractor respectively 2. Reinforce-Guided Extraction (train the full model) 12

Learning – Extractor 1. Train the Extractor and Abstractor respectively
 Extractor ◼ Provide a ‘proxy’ target label by finding the most similar document sentence with ground-truth summary : = argmax ROUGE– Lrecall , ◼ Then trained to minimize the cross-entropy loss. 13 1 1 is most similar to … 3 2 is most similar to … 4 3 is most similar to … Ground-truth summary ‘proxy’ target labels

Learning – Abstractor 1. Train the Extractor and Abstractor respectively
 Abstractor ◼ Create training pairs by taking each summary sentence and pairing it with its extracted document sentence from previous method. ◼ The network is trained as an usual sequence-to-sequence model to minimize the cross-entropy loss . 14 1 1 3 2 4 3 Ground-truth summary ‘proxy’ target labels Abstractor

Learning – Full Model 2. Reinforce-Guided Extraction  Extractor’s selecting
sentence behavior is non-differentiable because the extraction probability of already extracted sentences are forced to zero.  Use Reinforcement Learning by making the extractor an agent ◼ adopt Advantage Actor-Critic (A2C)  State, Action, Reward are defined as: 16 State: = , −1 Action: ~ , ( , ) = () Reward: + 1 = ROUGE– LF1 , Document and extracted sentence at time step t-1 extraction probability from the Extractor Similarity between model output and ground-truth

Other Techniques ◼ Learning how many sentences to extract 
In the RL training phase, add another set of trainable parameters (EOE stands for ‘End-Of-Extraction’) ◼ Repetition-Avoiding Reranking  At sentence-level, repetition problem doesn’t occur because model extracts original sentences first  For removing a few ‘across-sentence’ repetitions, apply the same beam-search tri-gram avoidance 17

Experiment 18 ↑ non-anonymized CNN/Daily Mail dataset anonymized CNN/DM→

Experiment ◼ Human Evaluation  Human answered “A is better,
B is better, both are good/bad” ◼ Abstractiveness  the ratio of novel n-grams in the generated summary 19

Experiment ◼ Speed Comparison 20 Parallel decoding

Experiment 21

Experiment 22

Conclusion ◼ A novel sentence-level RL model for summarization ◼
SotA on CNN/Daily mail dataset ◼ Get faster training and inference speed 23

Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting

Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting

Kazuki Matsumaru

Other Decks in Research

Featured

Transcript