AUTOMATIC TEXT SUMMARIZATION : Maximum Marginal Relevance (MMR) Technique

TECH TALK AUTOMATIC TEXT SUMMARIZATION: Maximum Marginal Relevance (MMR) Technique
Fajri Koto Analytic Team June 3rd 2016 Jakarta, Indonesia PT Kreatif Media Karya www.kmklabs.com [email protected]

Outline 1. Why Do We Need Summarization Engine? 2. Text
Summarization Overview 3. Vector Space Model 4. MMR Algorithm 5. Overall Summarization Stages 6. Demo

1. Why do we need Summarization Engine? Definition is the
process of reducing a text document with a computer program in order to create a summary that retains the most important points of the original document.

1. Why do we need Summarization Engine?  Our storage
become cheaper and larger.  The availability of documents become larger and larger (image, video, text)  To obtain information quickly Which Summary?  A quality informative summary

2. Text Summarization Overview Abstractive Summarization  building new sentences
as summary of the whole text Extractive Summarization  Selecting the most representative (informative) sentences from the text / document itself as the summary  By scoring

2. Text Summarization Overview Extractive Summarization (cont’ d) Sentence 1
Sentence 2 Sentence 3 Sentence 4 Sentence 5 ….. Sentence n score 1 score 2 score 3 score 4 score 5 ….. score n Select Sentences With The Highest Score

3. Vector Space Model  We want to do scoring.
Thus, we have to change text representation into number representation  Term Frequency (TF) - Vector Space Model Sentence 1  Saya pergi ke pasar Sentence 2  Ibu pergi ke rumah Bag of unique words: saya, pergi, ke, pasar, ibu, rumah

3. Vector Space Model  Term Frequency (TF) - Vector
Space Model Sentence 1  Saya pergi ke pasar Sentence 2  Ibu pergi ke rumah saya, pergi, ke, pasar, ibu, rumah

3. Vector Space Model  Now we can find similarity
score between two vector Similarity score : Cosine Similarity Where t is element vector (TF) of document D D1 and D2 is vector of document (vector of sentence)

4. Maximum Marginal Relevance Carbonell, J., & Goldstein, J. (1998,
August). The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval (pp. 335-336). ACM.

4. Maximum Marginal Relevance Maximum Marginal Relevance  MMR has
been widely used in text summarization because of its simplicity and efficiency  MMR will Re rank the sentence according to its relevance score  This formula look at and handle the redundant sentence.

4. Maximum Marginal Relevance Vector space TF TF (Term Frequency)
is standard vector in doing summarization Another vector Space TF - IDF TFIDF = TF (t) * IDF (t,D)

4. Maximum Marginal Relevance Similarity score : Cosine Similarity Where
t is element vector (TF or TFIDF) of document D D1 and D2 is vector of document (vector of sentence)

4. Maximum Marginal Relevance MMR Process in Summarization Passage 1
: 0.8 Passage 2 : 0.7 Passage 3 : 0.6 Passage 4 : 0.5 Passage 5 : 0.4 Document Summary Re-calculate score, and re-rank Choose the highest score Passage1: 0.8

5. Overall Summarization Stages Documents Sentence 1 Sentence 2 Sentence
3 Sentence 4 Sentence 5 ….. Sentence n Preprocessing Building Vector Space Scoring Selecting top-x

6. DEMO Preprocessing used in my implementation  Removing special
character  Convert all text into lower case  Stemming, ex: membawa  bawa bernyanyi  nyanyi  Removing stopwords, ex: “saya, dia, ke, apa, apakah, etc….”

Thank You  Any Question?

Evaluation Technique stands for Recall-Oriented Understudy for Gisting Evaluation ROGUE
– N gram to do evaluation:

AUTOMATIC TEXT SUMMARIZATION : Maximum Marginal...

AUTOMATIC TEXT SUMMARIZATION : Maximum Marginal Relevance (MMR) Technique

KMKLabs

More Decks by KMKLabs

Other Decks in Technology

Featured

Transcript

TECH TALK AUTOMATIC TEXT SUMMARIZATION: Maximum Marginal Relevance (MMR) Technique

Outline 1. Why Do We Need Summarization Engine? 2. Text

1. Why do we need Summarization Engine? Definition is the

1. Why do we need Summarization Engine?  Our storage

2. Text Summarization Overview Abstractive Summarization  building new sentences

2. Text Summarization Overview Extractive Summarization (cont’ d) Sentence 1

3. Vector Space Model  We want to do scoring.

3. Vector Space Model  Term Frequency (TF) - Vector

3. Vector Space Model  Now we can find similarity

4. Maximum Marginal Relevance Carbonell, J., & Goldstein, J. (1998,

4. Maximum Marginal Relevance Maximum Marginal Relevance  MMR has

4. Maximum Marginal Relevance Vector space TF TF (Term Frequency)

4. Maximum Marginal Relevance Similarity score : Cosine Similarity Where

4. Maximum Marginal Relevance MMR Process in Summarization Passage 1

5. Overall Summarization Stages Documents Sentence 1 Sentence 2 Sentence

6. DEMO Preprocessing used in my implementation  Removing special

Thank You  Any Question?

Thank You  Any Question?

Evaluation Technique stands for Recall-Oriented Understudy for Gisting Evaluation ROGUE