Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Natural Language Processing (9) Language genera...

Natural Language Processing (9) Language generation

自然言語処理研究室

November 15, 2013
Tweet

More Decks by 自然言語処理研究室

Other Decks in Education

Transcript

  1. 1 / 24 Natural Language Processing (9) Language generation Kazuhide

    Yamamoto Dept. of Electrical Engineering Nagaoka University of Technology
  2. 2 / 24 Introduction Language generation process • does the

    opposite direction of language analysis process. • generates language expression given one's intention. What is the intention?
  3. 3 / 24 Case study: machine translation (of JE) •

    Input: a Japanese sentence • Process: morphological analysis, parsing, semantic analysis, (and discourse analysis) • The output of the above process is the input of the (target language = English) generation process.
  4. 4 / 24 Case study: question answering We ask a

    question. • What time is the next train to Tokyo? The system analyzes the question. • Q-type: time, destination: Tokyo, condition: the earliest The system searches an answer. • 1430 hrs The system generates a sentence. • The next train is at 1430 hrs.
  5. 5 / 24 Case study: summarizer • The text is

    given. • The system analyzes all the sentences that generates semantic representation. • It then select important part out of the semantic representation. • It finally generates sentences. However, current summarizers do not realize the processes above.
  6. 6 / 24 What is the input for generation? •

    Case frames, semantic network, and other semantic representations. – Sometimes it is used in machine translation. • Non-linguistic data – stock prices – baseball scores – weather forecasts
  7. 7 / 24 Example: stock data Input: all prices in

    a stock market Output: daily report 「今日の東京株式は反落。朝から鉄鋼や造船の大型 株を中心に値下がりし、電機や商業株も安い。午後 になって若干値を戻すが、引けにかけて一段安。日 経平均株価は...円、東証株価指数 TOPIXは...」
  8. 8 / 24 Example: baseball news Input: baseball scoresheet Output:

    news article 「楽天は守りのミスがたたって10日、エース岩隈 で黒星。3回、ラロッカの飛球を関川が見失って2 点二塁打にしてしまうと、逆転した7回は、藤井が 生還を許す三塁悪送球などで再び試合をひっくり返 された。対広島4連敗で、借金は30。」
  9. 9 / 24 Semantic network 机 黒い 居間 本 料理

    母 読む 対象 動作主 内容 色 場所 場所 木製 材質 昨日 時間
  10. 10 / 24 Generated text • 昨日母が居間の黒い木製の机で料理の本を読ん だ。 • 母は居間で昨日本を読んだ。本は料理の本で、母

    は黒い机で読んでいた。 • 母が料理の本を読んだのは居間である。居間には 黒い木製の机があり、そこで読んでいた。 .....
  11. 11 / 24 Semantic network 「居間には木製の黒い机があった。...」 机 黒い 居間 本

    料理 母 読む 対象 動作主 内容 色 場所 場所 木製 材質 昨日 時間
  12. 12 / 24 Generation from semantic network In general many

    sentences can be generated from the same semantic network. • How do we select one out of many? • What is the difference? • What affects the generation result?
  13. 13 / 24 Generation method Sentences are generated according to

    the syntax rules shown below. • S -> NP VP • NP -> N P • VP -> V aux • N -> 花、学校、トマト • P -> が、を、に、から • V -> 咲く、行く、食べる • aux -> ない、たい、らしい
  14. 14 / 24 Generation (1) random generation • Many years

    ago an experiment is conducted to generate a sentence at random to verify the hand-made syntax rules. • But we have no way when the generated one is ungrammatical.
  15. 15 / 24 Generation (2) template-based Another method to generate

    a sentence is to provide a template and fill out something that gives an output. This is called template generation, and is widely used for fixed expressions. N時N分発、(のぞみ|ひかり|こだま)N号はN番線から発 車します。自由席は、N号車からN号車まで、... N error is found. – The sentences changes whether N is 1 or not!
  16. 16 / 24 Generating referential expressions We need to use

    different referential expressions that depend on the situation of the entity. 「.....本を貸してください」 • 本が目の前に1冊しかないとき • 薄い本と厚い本がある場合 • 厚い本が料理の本の場合 • テーブルの上に1冊しかない場合
  17. 17 / 24 Issue of indicator Repetition of the same

    noun sounds redundant, so some of them are replaced by pronouns, definite article, and some are deleted (as a zero anaphora). Taro says Taro and Taro's friend went back to Taro's home by Taro's car. – absolutely unnatural; instead of Taro, we use he(his), the, or nothing(=ellipsis).
  18. 18 / 24 Effect by context Semantically these expressions shows

    same, but we feel something different. It is not clear (and thus a big problem for researcher) what makes us feel different. 「太郎は花を買った。その花はバラである」 「太郎は花を買った。花はバラである」 「太郎は花を買った。それはバラである」 「太郎は花を買った。バラである」
  19. 19 / 24 Word Selection Word selection is also a

    problem. See the example above, and choose appropriate one. Guess why you choose it here. I was bitten by a dog of the Suzuki family. • I was bitten by an animal. • I was bitten by a mammal. • I was bitten by a beast. • I was bitten by a puppy. • I was bitten by a West Highland White Terrier.
  20. 20 / 24 Paraphrasing by synonym Synonym is not always

    used to replace an expression. It may be logically strange, or we may feel funny in terms of language style or formality.   Last night I went to bed early.   The night of yesterday I went to bed early. • It depends on the situation and the environment. • It is not always possible to be paraphrased. – Last night means the night of yesterday. – 「お誕生日に食事(ごはん|めし)に招待された」
  21. 21 / 24 Causal relation Subordinate-superior relationship is changed when

    we emphasize the subordinate first. Causal relation should be considered this time. e.g. S1: He caught a cold. S2: He is absent. S1+S2 He caught a cold so that he is absent. He is absent since he caught a cold.
  22. 22 / 24 Generation differs by viewpoint We also need

    to take care of viewpoint to generate an expression. e.g. Football Japan vs. Korea • Japan won last night thanks to Honda. • Last night Korea is beaten by Honda • It's true to see that Japan will get so good performance. • Japan was lucky since Korea just happened to have a bad condition.
  23. 23 / 24 Why generation is difficult? • There is

    no strong constraint; – In language analysis, the answer should be one, while in language generation many answers are possible. • It depends on the application or the domain. – "General" language generation application are difficult to be considered. • It is difficult to evaluate; – It is hard to define good generation in the engineering matter.
  24. 24 / 24 Summary: today's key words • diversity of

    language generation • method: syntactic-based and template-based • difficulties