Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting

Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting

第10回最先端 NLP 勉強会 (https://sites.google.com/view/snlp-jp/home/2018) で上記の論文を紹介した時のスライドです。

Paper: https://arxiv.org/abs/1805.11080

Kazuki Matsumaru

August 04, 2018
Tweet

Other Decks in Research

Transcript

  1. Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting Yen-Chun Chen, Mohit

    Bansal ACL 2018 読み手:松丸和樹(東京工業大学 岡崎研究室 M1) 最先端NLP勉強会 2018 2018/08/04 特に断りがない場合、図表等はすべて原論文より引用
  2. Document Summarization (2/3) There are two approaches: Extractive, Abstractive ◼

    Extractive method ◼ Abstractive method 3 1 2 … 1 3 4 Document sentences Summary sentences 1 2 … 1 2 3 Document sentences Summary sentences
  3. Document Summarization (3/3) Some studies combine extractive and abstractive method.

    4 From: Get To The Point: Summarization with Pointer-Generator Networks (See et al. 2017) soft switch to choose extractive or abstractive
  4. Summary ◼ A novel model for document summarization  Model

    extracts sentences first, then rewrites them  Use RL to bridge the nondifferentiable computation ◼ Improve repetition problem  At sentence-level, this problem doesn’t occur because model extracts original sentences first  At summary, it is improved by Repetition-Avoiding Reranking ◼ Get faster training and inference speed by parallel decoding ◼ SotA on CNN/Daily mail dataset 5
  5. Summary ◼ A novel model for document summarization  Model

    extracts sentences first, then rewrites them  Use RL to bridge the nondifferentiable computation ◼ Improve repetition problem  At sentence-level, this problem doesn’t occur because model extracts original sentences first  At summary, it is improved by Repetition-Avoiding Reranking ◼ Get faster training and inference speed by parallel decoding ◼ SotA on CNN/Daily mail dataset 6
  6. Extract-Then-Rewrite In this paper, the extract-then-rewrite approach is proposed. 7

    1 2 … 1 3 4 Document sentences Extracted sentences 1 2 3 Summary sentences “Extractor” “Abstractor”
  7. Model Architecture - Extractor 1. the convolutional encoder computes for

    each sentence 2. the RNN encoder (blue) computes ℎ 3. the RNN decoder (green) selects sentence at time step 9 ℎ4 ℎ3 ℎ2 ℎ1
  8. Model Architecture - Abstractor Abstractor is given a sentence and

    rewrites it. The architecture is almost the same as below. 11 From: Get To The Point: Summarization with Pointer-Generator Networks (See et al. 2017)
  9. Learning Starting from an end-to-end fashion is infeasible.  When

    randomly initialized, the extractor would often select sentences that are not relevant, so it would be difficult for the abstractor to learn. Hence, learning is conducted in two phases: 1. Train the Extractor and Abstractor respectively 2. Reinforce-Guided Extraction (train the full model) 12
  10. Learning – Extractor 1. Train the Extractor and Abstractor respectively

     Extractor ◼ Provide a ‘proxy’ target label by finding the most similar document sentence with ground-truth summary : = argmax ROUGE– Lrecall , ◼ Then trained to minimize the cross-entropy loss. 13 1 1 is most similar to … 3 2 is most similar to … 4 3 is most similar to … Ground-truth summary ‘proxy’ target labels
  11. Learning – Abstractor 1. Train the Extractor and Abstractor respectively

     Abstractor ◼ Create training pairs by taking each summary sentence and pairing it with its extracted document sentence from previous method. ◼ The network is trained as an usual sequence-to-sequence model to minimize the cross-entropy loss . 14 1 1 3 2 4 3 Ground-truth summary ‘proxy’ target labels Abstractor
  12. Learning – Full Model 2. Reinforce-Guided Extraction  Extractor’s selecting

    sentence behavior is non-differentiable because the extraction probability of already extracted sentences are forced to zero.  Use Reinforcement Learning by making the extractor an agent ◼ adopt Advantage Actor-Critic (A2C)  State, Action, Reward are defined as: 16 State: = , −1 Action: ~ , ( , ) = () Reward: + 1 = ROUGE– LF1 , Document and extracted sentence at time step t-1 extraction probability from the Extractor Similarity between model output and ground-truth
  13. Other Techniques ◼ Learning how many sentences to extract 

    In the RL training phase, add another set of trainable parameters (EOE stands for ‘End-Of-Extraction’) ◼ Repetition-Avoiding Reranking  At sentence-level, repetition problem doesn’t occur because model extracts original sentences first  For removing a few ‘across-sentence’ repetitions, apply the same beam-search tri-gram avoidance 17
  14. Experiment ◼ Human Evaluation  Human answered “A is better,

    B is better, both are good/bad” ◼ Abstractiveness  the ratio of novel n-grams in the generated summary 19
  15. Conclusion ◼ A novel sentence-level RL model for summarization ◼

    SotA on CNN/Daily mail dataset ◼ Get faster training and inference speed 23