Text summarization Phase 2 Evaluation 1

Text Summarization Abhishek Gautam (BT15CSE002) Atharva Parwatkar (BT15CSE015) Sharvil Nagarkar
(BT15CSE052) Under Prof. U.A.Deshpande

Recap • We created an unsupervised model using sentence embeddings
• Model was extractive in nature 2

Abstractive summarisation • Limitations of extractive summarisation • What abstractive
model does differently • Our approaches 3

Our Approaches • A domain specific abstractive model • A
neural network using reinforcement learning • A Seq2Seq Neural Attention model 4

Domain specific abstractive model • Data ◦ CNN Dailymail Dataset
◦ Movie subtitles and their corresponding plots ▪ Scraped movie subtitles data from https://yifysubtitles.org ▪ Scraped Wikipedia articles for extracting plots of movies 5

Domain specific abstractive model • Extractive model for pulling out
most of the relevant information • Applying Key-phrase extraction on the intermediate text generated for domain specific summary generation 6

Key Phrase Extraction • Automatic keyphrase extraction is typically a
two-step process: ◦ A set of words and phrases that could convey the topical content of a document are identified. ◦ Then these candidates are scored/ranked and the “best” are selected as a document’s keyphrases. • Key phrase extraction is achieved by SpaCy. 7

1. Candidate Identification • Common heuristics include filtering for words
with certain parts of speech or, for multi-word phrases, certain POS patterns; and using external knowledge bases like WordNet or Wikipedia as a reference source of good/bad keyphrases. • Noun phrases matching the POS pattern {(<JJ>* <NN.*>+ <IN>)? <JJ>* <NN.*>+} (a regular expression written in a simplified format) 8

2. Keyphrase Selection • Graph-based ranking method, in which the
importance of a candidate is determined by its relatedness to other candidates, where “relatedness” may be measured by two terms’ frequency of co-occurrence or semantic relatedness. This method assumes that more important candidates are related to a greater number of other candidates, and that more of those related candidates are also considered important 9

Model using reinforcement learning • Sequence to sequence sentence generation
• Reinforcement learning 10

Sequence to sequence sentence generation 11 • Used for various
NLP tasks such as machine translation, Q&A, etc. • Encoder and decoder architecture is used. ◦ It consists of LSTM or bidirectional LSTM. ◦ Word embeddings are fed to encoder at each timestep. ◦ Encoder creates a context vector and passes it to decoder when it receives a EOS (end of sentence symbol). ◦ In each time step decoder predicts the next word using the previous hidden state output and predicted word until it predicts a EOS symbol.

Sequence to sequence sentence generation 12

13 Encoder Using bidirectional LSTM

Reinforcement learning • Extractor: CNN-then-RNN. • Extractor generates representation of
important word, phrases and sentences. • Using extractor’s output important sentences are selected using a Pointer Network. (Not shown in the image) 14

Reinforcement learning • Abstractor network then compresses and rewrite an
extracted document sentences to a concise summary sentences. • ROUGE score is calculated and then extractor is trained using it. 15

Reinforcement learning (Extractor) 16

What is attention? 17 • What does current model do?
• How does attention help?

What current model does? 18 ...front against russian terrorism defence...

What current model does? 19 • In the picture, “front”,
“against” and “terrorism” words are fed into an encoder, and after a special signal the decoder starts producing a translated (simplified) sentence. • The decoder is supposed to generate a translation solely based on the last hidden state from the encoder. • It seems unreasonable to assume that we can encode all information about a potentially very long sentence into a single vector and then have the decoder produce a good translation based on only that. • Attention model solves this problem.

Attention Model 20 • With an attention mechanism we no
longer try encode the full source sentence into a fixed-length vector. • Rather, we allow the decoder to “attend” to different parts of the source sentence at each step of the output generation. • We let the model learn what to attend to based on the input sentence and what it has produced so far. • Each decoder output word depends on a weighted combination of all the hidden states of input, not just the last state of input. • Weights are updated during training.

Cost of using Attention Model 21 • We need to
calculate an attention value for each combination of input and output word. • If you have a 50-word input sequence and generate a 50-word output sequence that would be 2500 attention values.

22 An Example Visualization of attention model

Generating Summaries 23 • Beam Search

Generating Summaries 24 At each step, the decoder outputs a
probability distribution over the target vocabulary. To get the output word at this step we can do the following: • Greedy Sampling ◦ Choose the word with highest probability at each timestep. ◦ Sometimes tend to produce incorrect results. • Better approach is to use Beam Search.

Beam Search 25 • Ideally all the possible branches should
be checked for best result. • But, this is not feasible as the number of possible hypotheses is exponential. • Hence, we compromise between an exact solution and greedy approach using Beam Search. • Essentially, Beam Search maintains k top hypothesis for the summary. • It uses pruning to retain top k results. • This ensures that each target word gets a fair shot at generating the summary.

Beam Search Example 26

Future Approach 27 • Implementation of Seq2Seq model, attention model,
etc. • Testing with CNN/DM dataset, scraped movies dataset. • Tuning the model, under supervision, to gain deep insights.

References • https://arxiv.org/pdf/1805.11080.pdf • http://bdewilde.github.io/blog/2014/09/23/intro-to-automatic-keyphrase-extraction/ • http://home.iitk.ac.in/~soumye/cs498a/pres.pdf • https://github.com/icoxfog417/awesome-text-summarization •
https://www.aclweb.org/anthology/D/D15/D15-1044.pdf • https://www.cs.cmu.edu/~bhiksha/courses/deeplearning/Fall.2015/slides/lec14.neubig.seq_to_se q.pdf 28

Thank you! 29

Text summarization Phase 2 Evaluation 1

Text summarization Phase 2 Evaluation 1

Abhishek Gautam

More Decks by Abhishek Gautam

Other Decks in Education

Featured

Transcript

Text Summarization Abhishek Gautam (BT15CSE002) Atharva Parwatkar (BT15CSE015) Sharvil Nagarkar

Recap • We created an unsupervised model using sentence embeddings

Abstractive summarisation • Limitations of extractive summarisation • What abstractive

Our Approaches • A domain specific abstractive model • A

Domain specific abstractive model • Data ◦ CNN Dailymail Dataset

Domain specific abstractive model • Extractive model for pulling out

Key Phrase Extraction • Automatic keyphrase extraction is typically a

1. Candidate Identification • Common heuristics include filtering for words

2. Keyphrase Selection • Graph-based ranking method, in which the

Model using reinforcement learning • Sequence to sequence sentence generation

Sequence to sequence sentence generation 11 • Used for various

Sequence to sequence sentence generation 12

13 Encoder Using bidirectional LSTM

Reinforcement learning • Extractor: CNN-then-RNN. • Extractor generates representation of

Reinforcement learning • Abstractor network then compresses and rewrite an

Reinforcement learning (Extractor) 16

What is attention? 17 • What does current model do?

What current model does? 18 ...front against russian terrorism defence...

What current model does? 19 • In the picture, “front”,

Attention Model 20 • With an attention mechanism we no

Cost of using Attention Model 21 • We need to

22 An Example Visualization of attention model

Generating Summaries 23 • Beam Search

Generating Summaries 24 At each step, the decoder outputs a

Beam Search 25 • Ideally all the possible branches should

Beam Search Example 26

Future Approach 27 • Implementation of Seq2Seq model, attention model,

References • https://arxiv.org/pdf/1805.11080.pdf • http://bdewilde.github.io/blog/2014/09/23/intro-to-automatic-keyphrase-extraction/ • http://home.iitk.ac.in/~soumye/cs498a/pres.pdf • https://github.com/icoxfog417/awesome-text-summarization •

Thank you! 29