Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting

Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting

第10回最先端 NLP 勉強会 (https://sites.google.com/view/snlp-jp/home/2018) で上記の論文を紹介した時のスライドです。

Paper: https://arxiv.org/abs/1805.11080

Kazuki Matsumaru

August 04, 2018
Tweet

Other Decks in Research

Transcript

  1. Fast Abstractive Summarization with
    Reinforce-Selected Sentence Rewriting
    Yen-Chun Chen, Mohit Bansal
    ACL 2018
    読み手:松丸和樹(東京工業大学 岡崎研究室 M1)
    最先端NLP勉強会 2018
    2018/08/04
    特に断りがない場合、図表等はすべて原論文より引用

    View Slide

  2. Document Summarization (1/3)
    2

    View Slide

  3. Document Summarization (2/3)
    There are two approaches: Extractive, Abstractive
    ◼ Extractive method
    ◼ Abstractive method
    3
    1
    2


    1
    3
    4
    Document sentences
    Summary sentences
    1
    2


    1
    2
    3
    Document sentences
    Summary sentences

    View Slide

  4. Document Summarization (3/3)
    Some studies combine extractive and abstractive method.
    4
    From: Get To The Point: Summarization with Pointer-Generator Networks (See et al. 2017)
    soft switch to choose
    extractive or abstractive

    View Slide

  5. Summary
    ◼ A novel model for document summarization
     Model extracts sentences first, then rewrites them
     Use RL to bridge the nondifferentiable computation
    ◼ Improve repetition problem
     At sentence-level, this problem doesn’t occur because model
    extracts original sentences first
     At summary, it is improved by Repetition-Avoiding Reranking
    ◼ Get faster training and inference speed by parallel decoding
    ◼ SotA on CNN/Daily mail dataset
    5

    View Slide

  6. Summary
    ◼ A novel model for document summarization
     Model extracts sentences first, then rewrites them
     Use RL to bridge the nondifferentiable computation
    ◼ Improve repetition problem
     At sentence-level, this problem doesn’t occur because model
    extracts original sentences first
     At summary, it is improved by Repetition-Avoiding Reranking
    ◼ Get faster training and inference speed by parallel decoding
    ◼ SotA on CNN/Daily mail dataset
    6

    View Slide

  7. Extract-Then-Rewrite
    In this paper, the extract-then-rewrite approach is proposed.
    7
    1
    2


    1
    3
    4
    Document sentences Extracted sentences
    1
    2
    3
    Summary sentences
    “Extractor” “Abstractor”

    View Slide

  8. Model Architecture
    8
    Architecture of Whole Model

    View Slide

  9. Model Architecture - Extractor
    1. the convolutional encoder computes
    for each sentence
    2. the RNN encoder (blue) computes ℎ
    3. the RNN decoder (green) selects sentence
    at time step
    9
    ℎ4
    ℎ3
    ℎ2
    ℎ1

    View Slide

  10. Model Architecture
    10
    Architecture of Whole Model

    View Slide

  11. Model Architecture - Abstractor
    Abstractor is given a sentence and rewrites it.
    The architecture is almost the same as below.
    11
    From: Get To The Point: Summarization with Pointer-Generator Networks (See et al. 2017)

    View Slide

  12. Learning
    Starting from an end-to-end fashion is infeasible.
     When randomly initialized, the extractor would often select
    sentences that are not relevant, so it would be difficult for the
    abstractor to learn.
    Hence, learning is conducted in two phases:
    1. Train the Extractor and Abstractor respectively
    2. Reinforce-Guided Extraction (train the full model)
    12

    View Slide

  13. Learning – Extractor
    1. Train the Extractor and Abstractor respectively
     Extractor
    ◼ Provide a ‘proxy’ target label by finding the most similar document
    sentence
    with ground-truth summary
    :

    = argmax
    ROUGE– Lrecall

    ,
    ◼ Then trained to minimize the cross-entropy loss.
    13
    1
    1 is most similar to …
    3
    2 is most similar to …
    4
    3 is most similar to …
    Ground-truth summary ‘proxy’ target labels

    View Slide

  14. Learning – Abstractor
    1. Train the Extractor and Abstractor respectively
     Abstractor
    ◼ Create training pairs by taking each summary sentence and pairing
    it with its extracted document sentence from previous method.
    ◼ The network is trained as an usual sequence-to-sequence model to
    minimize the cross-entropy loss .
    14
    1
    1
    3
    2
    4
    3
    Ground-truth summary
    ‘proxy’ target labels Abstractor

    View Slide

  15. Model Architecture
    15
    Architecture of Whole Model

    View Slide

  16. Learning – Full Model
    2. Reinforce-Guided Extraction
     Extractor’s selecting sentence behavior is non-differentiable
    because the extraction probability of already extracted
    sentences are forced to zero.
     Use Reinforcement Learning by making the extractor an agent
    ◼ adopt Advantage Actor-Critic (A2C)
     State, Action, Reward are defined as:
    16
    State:
    = , −1
    Action:
    ~ ,
    (
    , ) = ()
    Reward: + 1 = ROUGE– LF1

    ,
    Document and extracted
    sentence at time step t-1
    extraction probability
    from the Extractor
    Similarity between model output and ground-truth

    View Slide

  17. Other Techniques
    ◼ Learning how many sentences to extract
     In the RL training phase, add another set of trainable
    parameters
    (EOE stands for ‘End-Of-Extraction’)
    ◼ Repetition-Avoiding Reranking
     At sentence-level, repetition problem doesn’t occur because
    model extracts original sentences first
     For removing a few ‘across-sentence’ repetitions, apply the
    same beam-search tri-gram avoidance
    17

    View Slide

  18. Experiment
    18
    ↑ non-anonymized
    CNN/Daily Mail dataset
    anonymized CNN/DM→

    View Slide

  19. Experiment
    ◼ Human Evaluation
     Human answered “A is better, B is better, both are good/bad”
    ◼ Abstractiveness
     the ratio of novel n-grams in the generated summary
    19

    View Slide

  20. Experiment
    ◼ Speed Comparison
    20
    Parallel decoding

    View Slide

  21. Experiment
    21

    View Slide

  22. Experiment
    22

    View Slide

  23. Conclusion
    ◼ A novel sentence-level RL model for summarization
    ◼ SotA on CNN/Daily mail dataset
    ◼ Get faster training and inference speed
    23

    View Slide