Summarization by Analogy: An Example-based Approach for News Articles
Megumi Makino and Kazuhide Yamamoto. Summarization by Analogy: An Example-based Approach for News Articles. Proceedings of The Third International Joint Conference on Natural Language Processing (IJCNLP2008), pp.739-744 (2008.1)
works –There are a lot of sentence extraction and sentence compression methods. –Figure importance of each word or sentence • By using term frequency, title and location ...etc. –Extract sentences or compress one sentence • However, when we generate a summary, ... 1
generate a summary –By using much knowledge and experience in our mind We can not measure the importance of each word or each sentence. –By combining phrases in some sentences It was difficult to generate a summary which contains phrases in some sentences. 2
–To generate summaries as if human does The summaries are generated by selecting and combining phrases in whole text. –To generate summaries without using the importance measure. 3
–Example-based approach • We use summary collection as instance. • We generate a summary by imitating the instance and combining phrases from the whole input. The summaries are generated by human. So, the summaries include their knowledge and experience. 4
systems by only adding or changing instances. 2. Use of similarity rather than importance –We substitute a similarity between two phrase for the importance. 3. High applicability of local context –We use similar instances to input, and can increase the fitness of input contents. 5
G H I. J K L M N O P. . a b c d e. . . Compare the input to each instance and Retrieve similar instance Input text Step1 Similar instance a b c d e. Phrase alignment between similar instance and input Step2 a b c d e A D E I P J N O B E L H E a b c d e A D E I P J N O B E L H E Output: A / L / O / B / E /. Combine the corresponding phrases Step3 System Overview of Example-based Summarization One sentence Corresponding phrases Highest score path 6
To obtain a similar instance which has similar contents to the input –Figure Sim(E,I) between the input I and each instance E in the instance collection –Obtain a similar instance which has highest similarity Sim(E,I) 7
sentences in input – Score(i) and w : weight for the main topic of the input – Tpi(・) : the set of predicate words in i-th sentence – Tci(・) : the set of content words in i-th sentence – : the number of overlaps ( ) { } ∑ = ∩ + ∩ ⋅ ⋅ = n i ci c pi p I T E T I T E T w i Score I E Sim 1 1 1 ) ( ) ( ) ( ) ( ) ( , ) ( ) ( 1 I T E T pi p ∩ 8
align the phrases –One to many correspondences We link one phrase in the similar instance to some similar phrases in the input. –4 alignment measures • Agreement of grammatical case • Agreement of named entity tag • Edit distance • Word similarity using mutual information 9
link Abbreviation phrases 日銀 日本銀行 (Bank of Japan) – We correspond top 3 small distance phrases to a instance phrase. (4) Similarity with Mutual Information [Lin98] – To link syntactically similar phrases 大会を開く 会議を開く (to hold a convention) (to hold a meeting) – We correspond top 3 similar phrases to a instance phrase. Phrase Alignment Using 4 Measures 11
consists of similar phrases to the similar instance. –The summary has good readability. • Node (wi) score –The score indicates a reliability of phrase similarity. = rank w N i / 1 5 . 0 max ) ( if grammatical case or NE tag is matched otherwise similarity rank order of phrase, acquired in edit distance or similarity with MI. 13
score –The score indicates a adequacy of phrase connection. • Search the maximum score path 1 ) ( ) ( 1 ) , ( 1 1 + − = − − i i i i w loc w loc w w E ) , ( ) 1 ( ) ( ) ( 1 1 0 i m i i m i i p p w w E w N W Score ∑ ∑ = − = − + = α α location of sentence in the input containing phrase wi 14
– input: Nihon Keizai Shimbun (Japanese newspaper) 134 news article – training : 150 news article and their summaries (We tuned the parameter α by using the training set.) • Evaluation – We evaluated each part of our system by an examinee. • Retrieval process of similar instance • Phrase alignment and combination 15
• How similar are the input and obtained similar instance? 1. quite similar 40 2. slightly similar 37 3. not very similar 29 4. not similar 28 total 134 Similarity by matching content words –Our plan: other measure focused on similar word 77 / 134 tests are good; the accuracy is 57%. 16
are judged as good similar instances in evaluation of the retrieving process • How proper is the output summary? 1. quite proper 33 2. slightly proper 15 3. not very proper 22 4. not proper 7 total 77 48 / 77 tests are good; the accuracy is 62%. 17
の論告求刑が開かれ、検察側は宅間被告に死刑を求刑した。 (The prosecution made Takuma's closing arguments on the 22nd in the trial at the Osaka District Court, and asked for death penalty.) Output summary: 横浜地裁で二十一日、論告求刑が開かれ、検察側は川 野被告に懲役一年六月を求刑した。 (The prosecution made Kawano's closing arguments on 21st in the trial at the Yokohama District Court and demanded one and half years in prison.) Phrases around the whole text are picked to combine one sentence. 18
instance:株式時価総額でキャノンが9日、ソニーを抜いて電気機器 業界トップに。 (Canon beats Sony in total market value and takes the No.1 position of electrical equipment market.) Output summary:株式時価総額でソフトバンク株が十四日、トヨタ自動車を 抜いて第三位に。 (Softbank Corp. beats TOYOTA in the market value and takes the No.3 position at 14th.) Imitating similar instances enables readable and compressed summaries. 19
similar instance. –We compare directly between phrases in the input and its similar instance. • Not need to figure the importance of sentence or word. • High in fitness of local contexts. 20
sentence by picking and combining the phrase in some sentences. –We can make a summary which have contents in the whole text. –Sentence extraction and sentence compression methods can not generate summaries like our outputs. 21