Text summarization Phase 1 evaluation 2

Text Summarization Abhishek Gautam (BT15CSE002) Atharva Parwatkar (BT15CSE015) Sharvil Nagarkar
(BT15CSE052) Under Prof. U.A.Deshpande

Problem • Neural networks are inefficient with text input. •
Sentences and documents are of variable length. • How to represent words or sentences in fixed length? 2

Word Embeddings • Word embeddings are representation of plain text
words in fixed size numerical vector. • It is capable of capturing context of a word in a document, semantic and syntactic similarity, relation with other words, etc. 3

Types of word embeddings • Frequency based • Prediction based
4

Frequency based word embeddings (Overview) 5

Advantages • Fast computation • Preserves the semantic relationship between
words. Disadvantages • Huge memory requirement. • Good results are not obtained. 6

Prediction based word embeddings 7

Different prediction based word embeddings 1. word2vec 2. doc2vec 3.
FastText Doc2Vec and FastText are versions of word2Vec. 8

One-Hot-Encoding Representation of word “sample” Word : Index : Array
: Index : 9

word2vec word2vec uses one of the following self-supervised model architecture
to produce word embeddings. 1. CBOW (continuous bag of words) 2. Skip-gram model 10

CBOW • This architecture tends to predict the probability of
a word given a context window. • Take One-Hot encoded vector of word as input. Example: Try to predict Fox given, “Quick”, “Brown”, “Jump” and “Over” 11

CBOW architecture (for one word context) 12 One-Hot encoded input
vector Softmax probabilities of each vocabulary word

• One hidden layer is used. • No activation function
is used in hidden layer. • Softmax activation function is used in output layer. • Error is calculated by subtracting target one hot encoded array from softmax probabilities. • Error is propagated using gradient descent • Size of hidden layer is equal to the size of fixed length vector of word embeddings. CBOW 13

CBOW architecture (for multi word context) • Calculate One-Hot encoded
vectors for each context word. • Concatenate them in order same as in sentence. • Pass this vector as the input to CBOW network. 14

CBOW Advantages • Low storage requirement than frequency based word
embeddings. • Can perform reasoning like: (King - man + woman = Queen) CBOW Disadvantages • Takes the average of the context of a word. (Ex: Apple can be a company and apple) • Huge training time. 15

Skip-gram model • This architecture tends to predict the context
of a given word. • Take One-Hot encoded vector of word as input. Example: Try to predict “Quick”, “Brown”, “Jump” and “Over” given, “Fox” 16

Skip-gram model • One hidden layer is used. • No
activation function is used in hidden layer. • Softmax activation function is used in output layer. • Error is calculated by subtracting target one hot encoded array from softmax probabilities. • Error is propagated using gradient descent • Size of hidden layer is equal to the size of fixed length vector of word embeddings. 17

Skip-gram model Advantages • Skip-gram model can capture more than
semantics for a single word. CBOW Disadvantages • Fails to identify combined words. Example: New york 18

Working Model • Workflow Diagram • Data Preprocessing • Creation
of Word Embeddings • Clusterization • Summarization 19

Workflow Diagram Building Word Embeddings Sentence Embeddings Clusterization Document Data
Preprocessing Summarization 20

Data Collection and Preprocessing • This step involves collection of
news articles from files. • After collection, following preprocessing steps are performed: ◦ Tokenization ◦ Normalization ▪ Removal of non ASCII characters, punctuations, stopwords ▪ Lemmatization 21

Example 22

Building Word Embeddings • Word Embeddings are created by using
Word2Vec class of Gensim library. • Object of this class is trained with data (news articles) to create a vocabulary and word embeddings for each word. 23

Example 24

Use of Word Embeddings • Word embeddings can be used
by Deep Learning models to represent words. • One interesting example of use of word embeddings is to find words similar to a given word. 25

Example 26

Generation of Sentence Embeddings • Sentence embeddings are created as
a weighted average of word embeddings of words in the sentence. • Here, the notion is a frequent word in sentence should have less weightage. • Hence, weight of a word embedding is inversely proportional to its frequency in the sentence. 27

Example 28

Clusterization and Summarization • Clusters of sentences in input document
are created using K-means clustering algorithm. • Sentences closest to the cluster centers are then selected for the summary generation. • Average of indexes for each cluster is found and sentences are ordered on the basis of this. 29

Example 30

Example Continued 31

Skip Thought Vectors • The Skip-thought model was inspired by
the skip-gram structure used in word2vec, which is based on the idea that a word’s meaning is embedded by the surrounding words. • Similarly, in contiguous text, nearby sentences provide rich semantic and contextual information 32

Skip Thought Model • The model is based on an
encoder-decoder architecture. • All variants of this architecture share a common goal: encoding source inputs into fixed-length vector representations, and then feeding such vectors through a “narrow passage” to decode into a target output. • In the case of Neural Machine Translation, the input sentence is in a source language (English), and the target sentence is its translation in a target language (German). 33

Skip Thought Vectors • With the Skip-thought model, the encoding
of a source sentence is mapped to two target sentences: one as the preceding sentence, the other as the subsequent sentence. • Unlike previous method, skip thought encoders take the sequence of words in the sentence into account. • Prevents from incurring undesirable losses 34

Encoder Decoder Network 35

Skip Thought Architecture • Then, an encoder, built using recurrent
neural network layers(GRU),is able to capture the patterns of sequential word vectors. The hidden states of the encoder are fed as representations of the inputs into two separate decoders (to predict the preceding and subsequent sentence) • Intuitively speaking, the encoder generates a representation of the input sentence itself. Back-propagating costs from the decoder during training enables the encoder to capture the relationship of the input sentence to its surrounding sentences as well. 36

Gated Recurrent Unit (GRU) • 37

GRUs • GRUs are improved version of standard recurrent neural
network. • To solve the vanishing gradient problem of a standard RNN, GRU uses update gate and reset gate. • Update Gate: The update gate helps the model to determine how much of the past information (from previous time steps) needs to be passed along to the future. 38

GRUs • Reset Gate: This gate is used from the
model to decide how much of the past information to forget • Current memory content: We introduce a new memory content which will use the reset gate to store the relevant information from the past. 39

GRUs • Final memory at current time step: As a
last step, the network needs to calculate h(t)— vector which holds information for the current unit and passes it down to the network. 40

• 41

Contextual Summarization • Query-based summarization • Extending the existing models
42

Our Approach • The model created above can be extended
further to generate summary based on the context asked by leveraging the similarities obtained using sentence embeddings between query and the document. • Sentences which are most similar to the context passed will be returned by the model. 43

References • https://arxiv.org/pdf/1411.2 - Word2vec Parameter Learning Explained • https://www.analyticsvidhya.com/blog/2017/06/word-embeddings-count-word2veec/
- Word embeddings • https://arxiv.org/pdf/1506.06726.pdf - Skip-Thought Vectors • https://towardsdatascience.com/understanding-gru-networks-2ef37df6c9be - Understanding GRU Networks • https://medium.com/jatana/unsupervised-text-summarization-using-sentence-embedding s-adb15ce83db1 44

Thank you! • 45

Text summarization Phase 1 evaluation 2

Text summarization Phase 1 evaluation 2

More Decks by Abhishek Gautam

Other Decks in Education

Featured

Transcript