made for alternatives of the original; • information is preserved as much as possible. Indicative summary • made for judgment if we should read the original; • less important information is dropped. Generated summary depends on what kinds we need.
words, concepts, etc. • paraphrasing • selection of important parts out of the input (particularly, sentence selection) Most of the proposed methods for summarization is conducted by selecting sentences.
history since 1958 • criterion: words that appear frequent are important • Sum of TF*IDF for each word is considered to be the importance of the sentence (1996) – similar to human selection result – better than head selection in newspaper articles.
the target word w in all documents IDF(w): inverse document frequency • = log (N/DF(w))+1 • N: number of all documents, • DF: number of documents that includes w. TF*IDF computes degree of concentration of the target word in a particular documents. In order to compute TF*IDF, we need to define "document" in advance.
are used as a key to select the sentence, or not to select the sentence. • "for example" – may follow examples that are considered not important • "therefore", "in summary", "consequently" – may follow conclusions that are important.
also be a key for sentence extraction when given documents are something like newspaper or editorial. • newspaper: head is important. • editorials: not only head but the end are important since there may be conclusion written at the end. We give some weights of importance to each sentence according to its position in the document.
coherence throughout the text • no reference resolution is conducted. • the longer, the better. – since scores are somewhat counted. • chunk; is sentence best chunk for selection? • duplication; suppose that sentence A is somehow important, sentence A' that is close to A should be also important (and may be selected in the summary).
– I bought an apple, an orange, and a grape. – I bought some fruits. • Use of dictionary – I give him a good reason and cause him to stop hiking. – I persuaded him to stop hiking. Both attempts are still experimental. They have difficulties in constructing language resources such as thesaurus and dictionary.
multiple documents summarization has been attempted in order to meet the following demands: • One may want to browse an accident or an event, such as the earthquake. • One may want to pick up core description among many articles. • One may want to read the same event in a different point of view. • One may want to delete duplication.
summaries? • (automatically) compare to human-written summary • Ranking by human • Task evaluation; reading comprehension etc. Evaluation criteria • readability; how natural the summary is. • degree of involvement of important words.
sentence, not a document. • It is applied for automatic narration generation (of one's speech). • It can also be used for newswire for mobile phone. 「首相がスキャンダルの責任を取って辞意。来月中 に解散総選挙へ」
和英,池田 諭史, 大橋 一輝. 「新幹線要約」のための文末の整形. 自然言語処理, Vol.12, No.6, pp.85-111 , 言語処理学会 (2005.11) Satoshi Ikeda and Kazuhide Yamamoto. Transforming a Sentence End into News Headline Style. Proceedings of The Third International Workshop on Paraphrasing (IWP2005), pp.41-48 (2005.10)
seen on the Shinkansen. • Written in 60 characters for each article. • The same messages were obtained by e-mail service. – 3 times a day, 5 days a week.
simple. • Most of them are one- or two-sentence summary • Omission of expression at the end. – 「... 実質3.2%減と想定」 • Particle (having special meaning) at the end of sentence – 「... 四半期ごとに開示するよう要請へ」 • Many Chinese-derived words – 「決める」→「決定」、「選ぶ」→「選出」
Expressions at the end of sentence – 断定の「だ」、ですます、「...てしまう」 • Particles – 「遺体を発見」→「遺体発見」 • Functional words – 「協議する意向を示す」 – 「開催することで合意」 • Paraphrasing into shorter words – 「...が見つかった」→「...を発見」 – 「...を調べている」→「...を調査中」
山本 和英, 牧野 恵. 要約事例を用例として模倣利用したニュース記事要約. 自然言語処理, Vol.15, No.3, pp.115-158 , 言語処理学会 (2008.7) Megumi Makino and Kazuhide Yamamoto. Summarization by Analogy: An Example-based Approach for News Articles. Proceedings of The Third International Joint Conference on Natural Language Processing (IJCNLP2008), pp.739-744 (2008.1)