Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Transforming a Sentence End into News Headline Style

Transforming a Sentence End into News Headline Style

Satoshi Ikeda and Kazuhide Yamamoto. Transforming a Sentence End into News Headline Style. Proceedings of The Third International Workshop on Paraphrasing (IWP2005), pp.41-48 (2005.10)

自然言語処理研究室

October 31, 2005
Tweet

More Decks by 自然言語処理研究室

Other Decks in Research

Transcript

  1. Transforming a sentence end into news headline style Satoshi Ikeda

    Kazuhide Yamamoto Nagaoka University of Technology
  2. News headlines  News headlines are summary of the newspaper's

    articles.  It is very short.  We can see them at the bulletin board on the street and in trains.  In English, headlines are unique, and also Japanese.
  3. Information of collect e-mail e-mails 3365 articles 21127 sentences 40374

     We can collect Japanese news headlines from e-mail service.  We collected them during Dec. 1999 to Aug. 2004.
  4. Characteristics of headlines(1/2)  Not necessarily ordinary Japanese  Verbal

    nouns and case particles are often used at end of headline.  Headlines are shorter than ordinary sentences, but is still possible to understand the meaning.  We count POS of sentence end.
  5. POS occurrence at sentence end POS Occurrence[%] newspaper headline noun

    23.70 55.92 (verbal noun) (5.00) (39.90) verb 28.66 15.91 adjective 1.80 0.19 adverb 0.20 0.22 particle 1.56 8.83 (case particle) (0.34) (6.41) auxiliary verb 38.59 18.52 symbol 5.42 0.40
  6. Characteristics of headlines(2/2)  Use shorter expressions.  The words

    from Chinese are used more than Japanese origin words  We count ratios of Chinese and Japanese origin words.
  7. Ratios of Japanese and Chinese origin verbs Occurrence[%] Japanese Chinese

    Newspaper(a) headline(b) a/b 決める 0.62 2.18 0.26 選ぶ 0.21 2.64 0.08 分かる 0.18 2.88 0.06 命じる 1.13 3.84 0.3 調べる 6.28 53.33 0.12 total 2.71 7.27 0.37 決定(to decide) 選出(to elect) 判明(to find out) 命令(to order) 調査(to inverstigate) Shorter expressions are often used in news headlines.
  8. Focus of our method  Focused on the sentence end

    of news headlines.  Aim is to transform sentence end into headline style.  This work can be used BEFORE using other summarization methods.
  9. Related works  Wakao et al.(1997) compare the title and

    voice of news, investigate summarization to title.  Satoh et al.(2004) extract the rules between news for PC and mobile phone.
  10. Proposed method  Deletion of target words at sentence end

     Deletion with minor transformation after the target  Transformation of sentence end
  11. Deletion of target word at sentence end  Cut off

    slight meaning words at sentence end  Cut off dictum and honorific phrase  Cut off 「を示す(wo shimesu:to show)」  Cut off 「てしまう(teshimau:negative feeling word)」
  12. Deletion with minor transformation after the target words  Cut

    off the slight meaning words at sentence end. After then, this sentence end is minor transformation.  Change verbal noun  Cut off 「なる(naru:to become)」  Cut off the part after 「明らかに (akirakani:obvious)」  Change words of Japanese origin
  13. Change the verbal noun  At the verb, verbal noun

    is made to cut off 「する(suru:do)」.  And verbal noun work as verb before cutting off 「する」.  We cut off the part which following 「する」. If the sentence does not have the correct interpretation, the particle is changed to be correct.
  14. Change words of Japanese origin  We change Japanese origin

    word at sentence end to corresponding Chinese origin word.  After the change, if the sentence is not still a correct interpretation, then the particle is modified into the correct one.
  15. Transforming of the sentence end  After finishing other transformation,

    change to compound noun  If the sentence end is noun+particle+verbal noun, the particle is deleted to be a compound noun.
  16. Experiments and evaluations  Input is newspaper's 232,083 sentences. 

    This system summarizes 73,512 sentences.  Evaluation  Correctness of each procedures  Comparison to the human summarization
  17. Correctness of each procedure 0 0.2 0.4 0.6 0.8 1

    total deletion with minor transforming after the target words deletion of target words at sentence end correctness ratio
  18. Correctness change by personal difference >=1 >=2 =3 correctness 0.98

    0.95 0.91 personal difference in correctness judgment is small.
  19. Discussion of the errors  We can observe the system

    error at the common nouns, that are wrongly treated as verbal nouns.  Japanese verbal nouns are special in the sense that they are used as nouns as well as verbs.  But common noun never work as verb.
  20. Comparison to the human results Machine Human Sentence 72727 100

    Summ. Ratio 0.94 0.92 Reduced characters 2.45 3.87 Summarization ratios are not big difference Reduced characters are difference
  21. Discussion of human results  When we summarize the sentence

    end, we try to change many parts of sentence end, by considering the sentence meaning.  But, machine can not summarize it with such a consideration.  This difference appears at the number of reduced characters in the experiment.
  22. Conclusion  We present method of transforming Japanese sentence end

    expressions  The accuracy is 95%  Future works  Add the phrases of dictum and honorifics  When sentences are summarized, we need to regard noun which common or verbal.