Summarization by Analogy: An Example-based Approach for News Articles

Summarization by Analogy: An Example-based Approach for News Article Megumi
Makino Kazuhide Yamamoto (Nagaoka University of Technology, JAPAN)

Previous works figure importance of word or sentence • Previous
works –There are a lot of sentence extraction and sentence compression methods. –Figure importance of each word or sentence • By using term frequency, title and location ...etc. –Extract sentences or compress one sentence • However, when we generate a summary, ... 1

Measuring importance of word or sentence is impossible • We
generate a summary –By using much knowledge and experience in our mind We can not measure the importance of each word or each sentence. –By combining phrases in some sentences It was difficult to generate a summary which contains phrases in some sentences. 2

We summarize a text as if human does • Goal
–To generate summaries as if human does The summaries are generated by selecting and combining phrases in whole text. –To generate summaries without using the importance measure. 3

We summarize a text by imitating a instance • Idea
–Example-based approach • We use summary collection as instance. • We generate a summary by imitating the instance and combining phrases from the whole input. The summaries are generated by human. So, the summaries include their knowledge and experience. 4

Advantages of Example-based approach 1. High modularity –We can improve
systems by only adding or changing instances. 2. Use of similarity rather than importance –We substitute a similarity between two phrase for the importance. 3. High applicability of local context –We use similar instances to input, and can increase the fitness of input contents. 5

Instance collection (News headlines) A B C D E. F
G H I. J K L M N O P. 　　　　　　. a b c d e. 　　　　　　. 　　　　　　. Compare the input to each instance and Retrieve similar instance Input text Step1 Similar instance a b c d e. Phrase alignment between similar instance and input Step2 　　 a　　 b　　 c　　 d　　 e 　　 A 　　 D 　　 E 　　 I 　　 P 　　 J 　　 N 　　 O　　 B 　　 E 　　 L 　　 H 　　 E 　　 a 　　 b 　　 c 　　 d 　　 e 　　 A 　　 D 　　 E 　　 I 　　 P 　　 J 　　 N 　　 O 　　 B 　　 E 　　 L 　　 H 　　 E Output: A / L / O / B / E /. Combine the corresponding phrases Step3 System Overview of Example-based Summarization One sentence Corresponding phrases Highest score path 6

Retrieval of Similar Instance • Figure a similarity –Aim •
To obtain a similar instance which has similar contents to the input –Figure Sim(E,I) between the input I and each instance E in the instance collection –Obtain a similar instance which has highest similarity Sim(E,I) 7

Retrieval of Similar Instance – n : the number of
sentences in input – Score(i) and w : weight for the main topic of the input – Tpi(・) : the set of predicate words in i-th sentence – Tci(・) : the set of content words in i-th sentence – : the number of overlaps ( ) { } ∑ = ∩ + ∩ ⋅ ⋅ = n i ci c pi p I T E T I T E T w i Score I E Sim 1 1 1 ) ( ) ( ) ( ) ( ) ( , ) ( ) ( 1 I T E T pi p ∩ 8

Phrase Alignment : one to many correspondences • Compare and
align the phrases –One to many correspondences We link one phrase in the similar instance to some similar phrases in the input. –4 alignment measures • Agreement of grammatical case • Agreement of named entity tag • Edit distance • Word similarity using mutual information 9

Phrase Alignment Using 4 Measures (1) Agreement of Grammatical Case
(2) Agreement of Named Entity 私が(I subj) 彼が(he subj) 計画を(plan obj) 予定を(schedule obj) [subj,obj: subject or object case marker] Panasonic SONY (ORGANIZATION) 24日 15日 (DATE) 10

(3) Enhanced Edit Distance [Yamamoto et al. 03] – To
link Abbreviation phrases 日銀日本銀行 (Bank of Japan) – We correspond top 3 small distance phrases to a instance phrase. (4) Similarity with Mutual Information [Lin98] – To link syntactically similar phrases 大会を開く会議を開く (to hold a convention) (to hold a meeting) – We correspond top 3 similar phrases to a instance phrase. Phrase Alignment Using 4 Measures 11

Combine the Phrases Using Dynamic Programming Similar instance Nodes are
corresponding phrases in input to “a” in instance <s> a c d </s> e b A L D O E • Search the best path Best path : <s> A L O D E </s> 12

Combine the Phrases : node score • Aim –The summary
consists of similar phrases to the similar instance. –The summary has good readability. • Node (wi) score –The score indicates a reliability of phrase similarity.    = rank w N i / 1 5 . 0 max ) ( if grammatical case or NE tag is matched otherwise similarity rank order of phrase, acquired in edit distance or similarity with MI. 13

Combine the Phrases : edge score • Edge (wi, wi-1)
score –The score indicates a adequacy of phrase connection. • Search the maximum score path 1 ) ( ) ( 1 ) , ( 1 1 + − = − − i i i i w loc w loc w w E ) , ( ) 1 ( ) ( ) ( 1 1 0 i m i i m i i p p w w E w N W Score ∑ ∑ = − = − + = α α location of sentence in the input containing phrase wi 14

Sectional Evaluation • Test Corpus – instance: 26,764 news headlines
– input: Nihon Keizai Shimbun (Japanese newspaper) 134 news article – training : 150 news article and their summaries (We tuned the parameter α by using the training set.) • Evaluation – We evaluated each part of our system by an examinee. • Retrieval process of similar instance • Phrase alignment and combination 15

Result of Retrieval Process of Similar Instance : 57% Acc.
• How similar are the input and obtained similar instance? 1. quite similar 40 2. slightly similar 37 3. not very similar 29 4. not similar 28 total 134 Similarity by matching content words –Our plan: other measure focused on similar word 77 / 134 tests are good; the accuracy is 57%. 16

Result of Output Summary: 62% Acc. • 77 tests that
are judged as good similar instances in evaluation of the retrieving process • How proper is the output summary? 1. quite proper 33 2. slightly proper 15 3. not very proper 22 4. not proper 7 total 77 48 / 77 tests are good; the accuracy is 62%. 17

Output Example1 Input:神奈川県警の一連の不祥事のうち、厚木署集団警ら隊の集団暴行事件で起訴された元巡査部長、川野優被告の論告求刑公判が二十一日、横浜地裁で開かれた。検察側はひまを持て余して部下に短銃を突き付けるなど、組織における地位の高さに乗じた悪質な行為などと理不尽な暴力を指弾し、川野被告に懲役一年六月を求刑した。判決は一月十一日に言い渡される。 Similar instance:大阪地裁で22日、８人が犠牲となった池田小児童殺傷事件
の論告求刑が開かれ、検察側は宅間被告に死刑を求刑した。 (The prosecution made Takuma's closing arguments on the 22nd in the trial at the Osaka District Court, and asked for death penalty.) Output summary: 横浜地裁で二十一日、論告求刑が開かれ、検察側は川野被告に懲役一年六月を求刑した。 (The prosecution made Kawano's closing arguments on 21st in the trial at the Yokohama District Court and demanded one and half years in prison.) Phrases around the whole text are picked to combine one sentence. 18

Output Example2 Input:十四日の東京株式市場でソフトバンク株が急伸し、株式時価総額でトヨタ自動車を抜いて第三位に浮上した。インターネット関連の中核銘柄として、国内外の機関投資家や個人投資家の買いが集まった結果だ。日本を代表するめかであるトヨタの時価総額を抜いたことに付いて、市場では日本の産業構造の変化を象徴しているとの声も出ている。(skip the rest.) Similar
instance:株式時価総額でキャノンが９日、ソニーを抜いて電気機器業界トップに。 (Canon beats Sony in total market value and takes the No.1 position of electrical equipment market.) Output summary:株式時価総額でソフトバンク株が十四日、トヨタ自動車を抜いて第三位に。 (Softbank Corp. beats TOYOTA in the market value and takes the No.3 position at 14th.) Imitating similar instances enables readable and compressed summaries. 19

Conclusion • Our method generates a summary by imitating a
similar instance. –We compare directly between phrases in the input and its similar instance. • Not need to figure the importance of sentence or word. • High in fitness of local contexts. 20

Conclusion • Out method can summarize long text to one
sentence by picking and combining the phrase in some sentences. –We can make a summary which have contents in the whole text. –Sentence extraction and sentence compression methods can not generate summaries like our outputs. 21

Summarization by Analogy: An Example-based Appr...

Summarization by Analogy: An Example-based Approach for News Articles

自然言語処理研究室

More Decks by 自然言語処理研究室

Other Decks in Research

Featured

Transcript

Summarization by Analogy: An Example-based Approach for News Article Megumi

Previous works figure importance of word or sentence • Previous

Measuring importance of word or sentence is impossible • We

We summarize a text as if human does • Goal

We summarize a text by imitating a instance • Idea

Advantages of Example-based approach 1. High modularity –We can improve

Instance collection (News headlines) A B C D E. F

Retrieval of Similar Instance • Figure a similarity –Aim •

Retrieval of Similar Instance – n : the number of

Phrase Alignment : one to many correspondences • Compare and

Phrase Alignment Using 4 Measures (1) Agreement of Grammatical Case

(3) Enhanced Edit Distance [Yamamoto et al. 03] – To

Combine the Phrases Using Dynamic Programming Similar instance Nodes are

Combine the Phrases : node score • Aim –The summary

Combine the Phrases : edge score • Edge (wi, wi-1)

Sectional Evaluation • Test Corpus – instance: 26,764 news headlines

Result of Retrieval Process of Similar Instance : 57% Acc.

Result of Output Summary: 62% Acc. • 77 tests that

Conclusion • Our method generates a summary by imitating a

Conclusion • Out method can summarize long text to one