Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Text summarization Phase 2 Evaluation 1

Text summarization Phase 2 Evaluation 1

Phase 2 evaluation 1 of text summarization final year project, under professor U.A. Deshpande in collaboration with TCS, presentation.

Till this evaluation we researched about sequence2sequence encoder-decoder architecture, attention model and reinforcement learning model for abstractive summarization.

Phase 1 evaluation 2 presentation: https://speakerdeck.com/gautamabhishek46/text-summarization-phase-1-evaluation-2

Team:
Abhishek Gautam
Atharva Parwatkar
Sharvil Nagarkar

Professor in-charge: U. A. Deshpande
TCS Mentor : Dr. Sagar Sunkle

Abhishek Gautam

February 19, 2019
Tweet

More Decks by Abhishek Gautam

Other Decks in Education

Transcript

  1. Text Summarization
    Abhishek Gautam (BT15CSE002)
    Atharva Parwatkar (BT15CSE015)
    Sharvil Nagarkar (BT15CSE052)
    Under Prof. U.A.Deshpande

    View Slide

  2. Recap
    ● We created an unsupervised model using sentence
    embeddings
    ● Model was extractive in nature
    2

    View Slide

  3. Abstractive summarisation
    ● Limitations of extractive summarisation
    ● What abstractive model does differently
    ● Our approaches
    3

    View Slide

  4. Our Approaches
    ● A domain specific abstractive model
    ● A neural network using reinforcement learning
    ● A Seq2Seq Neural Attention model
    4

    View Slide

  5. Domain specific abstractive model
    ● Data
    ○ CNN Dailymail Dataset
    ○ Movie subtitles and their corresponding plots
    ■ Scraped movie subtitles data from https://yifysubtitles.org
    ■ Scraped Wikipedia articles for extracting plots of movies
    5

    View Slide

  6. Domain specific abstractive model
    ● Extractive model for pulling out most of the relevant information
    ● Applying Key-phrase extraction on the intermediate text generated for
    domain specific summary generation
    6

    View Slide

  7. Key Phrase Extraction
    ● Automatic keyphrase extraction is typically a two-step process:
    ○ A set of words and phrases that could convey the topical content of a
    document are identified.
    ○ Then these candidates are scored/ranked and the “best” are selected as a
    document’s keyphrases.
    ● Key phrase extraction is achieved by SpaCy.
    7

    View Slide

  8. 1. Candidate Identification
    ● Common heuristics include filtering for words with certain parts of speech or,
    for multi-word phrases, certain POS patterns; and using external knowledge
    bases like WordNet or Wikipedia as a reference source of good/bad keyphrases.
    ● Noun phrases matching the POS pattern {(* + )? *
    +} (a regular expression written in a simplified format)
    8

    View Slide

  9. 2. Keyphrase Selection
    ● Graph-based ranking method, in which the importance of a candidate is
    determined by its relatedness to other candidates, where “relatedness” may be
    measured by two terms’ frequency of co-occurrence or semantic relatedness.
    This method assumes that more important candidates are related to a greater
    number of other candidates, and that more of those related candidates are also
    considered important
    9

    View Slide

  10. Model using reinforcement
    learning
    ● Sequence to sequence sentence generation
    ● Reinforcement learning
    10

    View Slide

  11. Sequence to sequence sentence generation
    11
    ● Used for various NLP tasks such as machine translation, Q&A, etc.
    ● Encoder and decoder architecture is used.
    ○ It consists of LSTM or bidirectional LSTM.
    ○ Word embeddings are fed to encoder at each timestep.
    ○ Encoder creates a context vector and passes it to decoder when it receives a EOS (end
    of sentence symbol).
    ○ In each time step decoder predicts the next word using the previous hidden state
    output and predicted word until it predicts a EOS symbol.

    View Slide

  12. Sequence to sequence
    sentence generation
    12

    View Slide

  13. 13
    Encoder
    Using
    bidirectional
    LSTM

    View Slide

  14. Reinforcement learning
    ● Extractor: CNN-then-RNN.
    ● Extractor generates
    representation of important word,
    phrases and sentences.
    ● Using extractor’s output important
    sentences are selected using a
    Pointer Network. (Not shown in
    the image)
    14

    View Slide

  15. Reinforcement learning
    ● Abstractor network then
    compresses and rewrite an
    extracted document sentences to a
    concise summary sentences.
    ● ROUGE score is calculated and
    then extractor is trained using it.
    15

    View Slide

  16. Reinforcement learning (Extractor)
    16

    View Slide

  17. What is attention?
    17
    ● What does current model do?
    ● How does attention help?

    View Slide

  18. What current model does?
    18
    ...front against
    russian
    terrorism
    defence...

    View Slide

  19. What current model does?
    19
    ● In the picture, “front”, “against” and “terrorism” words are fed into an encoder,
    and after a special signal the decoder starts producing a translated (simplified)
    sentence.
    ● The decoder is supposed to generate a translation solely based on the last hidden
    state from the encoder.
    ● It seems unreasonable to assume that we can encode all information about a
    potentially very long sentence into a single vector and then have the decoder
    produce a good translation based on only that.
    ● Attention model solves this problem.

    View Slide

  20. Attention Model
    20
    ● With an attention mechanism we no longer try encode the full source sentence
    into a fixed-length vector.
    ● Rather, we allow the decoder to “attend” to different parts of the source sentence
    at each step of the output generation.
    ● We let the model learn what to attend to based on the input sentence and what it
    has produced so far.
    ● Each decoder output word depends on a weighted combination of all the hidden
    states of input, not just the last state of input.
    ● Weights are updated during training.

    View Slide

  21. Cost of using Attention Model
    21
    ● We need to calculate an attention value for each
    combination of input and output word.
    ● If you have a 50-word input sequence and generate a
    50-word output sequence that would be 2500 attention
    values.

    View Slide

  22. 22
    An Example
    Visualization of attention model

    View Slide

  23. Generating Summaries
    23
    ● Beam Search

    View Slide

  24. Generating Summaries
    24
    At each step, the decoder outputs a probability distribution over the target vocabulary.
    To get the output word at this step we can do the following:
    ● Greedy Sampling
    ○ Choose the word with highest probability at each timestep.
    ○ Sometimes tend to produce incorrect results.
    ● Better approach is to use Beam Search.

    View Slide

  25. Beam Search
    25
    ● Ideally all the possible branches should be checked for best result.
    ● But, this is not feasible as the number of possible hypotheses is exponential.
    ● Hence, we compromise between an exact solution and greedy approach using
    Beam Search.
    ● Essentially, Beam Search maintains k top hypothesis for the summary.
    ● It uses pruning to retain top k results.
    ● This ensures that each target word gets a fair shot at generating the summary.

    View Slide

  26. Beam Search Example
    26

    View Slide

  27. Future Approach
    27
    ● Implementation of Seq2Seq model, attention model, etc.
    ● Testing with CNN/DM dataset, scraped movies dataset.
    ● Tuning the model, under supervision, to gain deep insights.

    View Slide

  28. References
    ● https://arxiv.org/pdf/1805.11080.pdf
    ● http://bdewilde.github.io/blog/2014/09/23/intro-to-automatic-keyphrase-extraction/
    ● http://home.iitk.ac.in/~soumye/cs498a/pres.pdf
    ● https://github.com/icoxfog417/awesome-text-summarization
    ● https://www.aclweb.org/anthology/D/D15/D15-1044.pdf
    ● https://www.cs.cmu.edu/~bhiksha/courses/deeplearning/Fall.2015/slides/lec14.neubig.seq_to_se
    q.pdf
    28

    View Slide

  29. Thank you!
    29

    View Slide