Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Syntactic Piece:Idea, Purpose and Application t...

Syntactic Piece:Idea, Purpose and Application to Sentiment Analysis

Kazuki Takigawa and Kazuhide Yamamoto. Syntactic Piece:Idea, Purpose and Application to Sentiment Analysis. Proceedings of 7th International Conference on Natural Language Processing and Knowledge Engineering (NLPKE 2011), pp.401-404 (2011.11)

Avatar for 自然言語処理研究室

自然言語処理研究室

November 30, 2011
Tweet

More Decks by 自然言語処理研究室

Other Decks in Research

Transcript

  1. Kazuki TAKIGAWA and Kazuhide YAMAMOTO Department of Electrical Engineering Nagaoka

    University of Technology, JAPAN {takigawa,yamamoto}@jnlp.org 1
  2. Background • bag-of-words o It is difficult to see sense

    of an expression. ex.)“かける[kakeru]” has some meaning; “do up”,”put on”, “take out” and so on. • word n-gram o It is often creates unnecessary elements. ex.) ”で-ある-こと[de-aru-koto](3-gram)” A processing unit which can keep meaning of expression is needed. Mainly processing units have some problems in Japanese 2
  3. • bag-of-words o It is difficult to see sense of

    an expression. o ex.) 「かける」という単語 • word n-gram o It is often creates unnecessary elements. ex.) 「が,かける(2-gram)」「で,ある,こと(3-gram)」 A processing unit which can keep meaning of expression is needed. Mainly processing units in NLP We propose “syntactic piece”. 3 Background
  4. • Syntactic piece is a minimum unit of syntactic structure.

    • It consists of a pair of modifier and modificand, derived from syntactic structure. • This pair is expressed as: modifier → modificand Recently, immediate noise is very big. (最近まわりの騒音がとても大きい) recently big 最近→大きい Syntactic Piece What’s Syntactic Piece? very big とても→大きい immediate noise まわりの→騒音 noise is big 騒音が→大きい 4
  5. Advantages of Syntactic Piece „ Very simple „ It is

    easy to use, just like n-gram. „ It has syntactic structure „ It contains more information than n- gram. „ Similar to phrasal idiom „ It can deal with a chunk of meaning. 5
  6. Advantages of Syntactic Piece „ Very simple „ It is

    easy to use, just like n-gram. „ It has syntactic structure „ It contains more information than n- gram. „ Similar to phrasal idiom „ It can deal with a chunk of meaning. But syntactic piece has some problems. 6
  7. 1) Length of syntactic piece tends to be long because

    syntactic piece is pair of phrase. So if we use syntactic piece, then we get many unique expressions. 2) Some phrase pairs not have meaning are included in the phrase pair generated by current method. Problem of Syntactic Piece We suggest solution of these problems. 7
  8. Method(1) - Generalization of Same Class Expressions - We generalize

    “same class expressions” for decreasing unique expressions. “Same class expressions” means a set of expressions which have similar meaning even if the surface is different. 1.cake is delicious (ケーキ-が→おいしい) 2.delicious cake (おいしい→ケーキ) In these two expressions, the surface structure is different. But the meaning of both expression are very similar. These expressions, we call “same class expressions”. 8
  9. Method(1) - Generalization of Same Class Expressions - We generalize

    same class expressions. Same class expressions have two criteria. 9
  10. Method(1) - Generalization of Same Class Expressions - (1)Syntactic pieces

    constructed by adjective and noun with the same contents words. noun(-particle) → adjective adjective → noun 騒音-が → 大きい noise is big 大きい → 騒音 big noise We generalize same class expressions. Same class expressions have two criteria. 10
  11. Method(1) - Generalization of Same Class Expressions - (2) Syntactic

    pieces constructed by verb and noun with the same contents words. noun(-particle) → verb verb → noun 子供-が → 楽しむ a child rejoice 楽しむ → 子供 rejoicing child We generalize same class expressions. Same class expressions have two criteria. 11
  12. Method(2) - Coping with form word - I can be

    satisfied.(満足することができる) Some phrase pairs not have meaning. 12 be satisfied 満足する→ こと(mannzoku-suru koto) I can be こと-が → できる(koto-ga dekiru)
  13. be satisfied 満足する→ こと(mannzoku-suru koto) Method(2) - Coping with form

    word - I can be satisfied.(満足することができる) Modification relation is nothing. Some phrase pairs not have meaning. 13 I can be こと-が → できる(koto-ga dekiru)
  14. Method(2) - Coping with form word - I can be

    satisfied.(満足することができる) be satisfied 満足する→ こと(mannzoku-suru koto) I can be こと-が → できる(koto-ga dekiru) Modification relation is nothing. Any meaning is nothing. Some phrase pairs not have meaning. 14
  15. Method(2) - Coping with form word - The reason of

    this problem is that it is treated “こと[koto]” as “form word”. Form word is a type of content word, but it is diminished original meaning and used formally in Japanese. This is similar to relation pronoun such as “which”, “who”, ”when” etc. in English. 15
  16. • We collected form words by manual. • We treat

    the phrase having form word as function word for before content word. be satisfied 満足する → こと I can be こと-が → できる I can be satisfied very much.(とても満足することができる) satisfied very much とても → 満足する 16 Method(2) - Coping with form word - conventional syntactic piece
  17. be satisfied 満足する → こと I can be こと-が →

    できる I can be satisfied 満足すること-が → できる I can be satisfied very much.(とても満足することができる) satisfied very much とても → 満足する 17 Method(2) - Coping with form word - • We collected form words by manual. • We treat the phrase having form word as function word for before content word. copying with form word
  18. Application to Sentiment Analysis • We apply to sentiment analysis

    for verifying effectivity of improved syntactic piece. • Target of sentiment analysis is a sentence, and a sentence is classified into positive, negative, or other. 1. A pair of evaluative expression and semantic orientation score (SO-score) are registered in a dictionary. in this: evaluative expression = syntactic piece 2.Each expression in input sentence is given SO-score from the dictionary. 3.A sentence is classified by summation of SO-score. 18
  19. noise of fan is big. (ファンの騒音が大きい。) noise of fan (ファン-の

    → 騒音) noise is big (騒音-が → 大きい) noise is big(騒音が大きい) negative obtained syntactic piece matching noise is big:negative (騒音-が → 大きい) SO of syntactic Piece input dictionary Sentence Classification noise of fan (ファン-の → 騒音) input: negative 19
  20. noise of fan is big. (ファンの騒音が大きい。) noise of fan (ファン-の

    → 騒音) noise is big (騒音-が → 大きい) noise is big(騒音が大きい) negative obtained syntactic piece matching noise is big:negative (騒音-が → 大きい) SO of syntactic Piece input dictionary Sentence Classification noise of fan (ファン-の → 騒音) input: negative Syntactic pieces are obtained from input. 20
  21. noise of fan is big. (ファンの騒音が大きい。) noise of fan (ファン-の

    → 騒音) noise is big (騒音-が → 大きい) noise is big(騒音が大きい) negative obtained syntactic piece matching input dictionary Sentence Classification noise is big:negative (騒音-が → 大きい) SO of syntactic Piece noise of fan (ファン-の → 騒音) input: negative Obtained syntactic piece and word(s) of a dictionary are matched. 21
  22. noise of fan is big. (ファンの騒音が大きい。) noise of fan (ファン-の

    → 騒音) noise is big (騒音-が → 大きい) obtained syntactic piece noise is big:negative (騒音-が → 大きい) SO of syntactic Piece input Sentence Classification noise of fan (ファン-の → 騒音) dictionary matching noise is big(騒音が大きい) negative We can treat that “noise is big” is negative. input: negative 22
  23. noise of fan is big. (ファンの騒音が大きい。) noise of fan (ファン-の

    → 騒音) noise is big (騒音-が → 大きい) noise is big(騒音が大きい) negative obtained syntactic piece matching noise is big:negative (騒音-が → 大きい) SO of syntactic Piece input dictionary Sentence Classification noise of fan (ファン-の → 騒音) input: negative SO of input is negative. 23
  24. Reason for Applying Sentiment Analysis • This method uses a

    dictionary, so If we have SO-score of an expression:“noise is big”, then we can give SO-score from “big noise” by same class expressions. • There should not be an expression which does not have meaning in a dictionary, such as “I can be” is “positive” by coping with form word. 24
  25. Preparation for Sentence Classification - Making of Seed Dictionary -

    syntactic piece positive negative size is big 5 1 slow to respond 0 8 softly-colored 3 0 ・ ・ positive sentences negative sentences seed dictionary training data 25
  26. positive sentences negative sentences Preparation for Sentence Classification - Making

    of Seed Dictionary - syntactic piece positive negative size is big 5 1 slow to respond 0 8 softly-colored 3 0 ・ ・ seed dictionary We prepare positive and negative sentences as training data. training data 26
  27. positive sentences negative sentences Preparation for Sentence Classification - Making

    of Seed Dictionary - syntactic piece positive negative size is big 5 1 slow to respond 0 8 softly-colored 3 0 ・ ・ seed dictionary Syntactic pieces are obtained from training data, and calculated frequency. training data 27
  28. positive sentences negative sentences Preparation for Sentence Classification - Making

    of Seed Dictionary - syntactic piece positive negative size is big 5 1 slow to respond 0 8 softly-colored 3 0 ・ ・ seed dictionary training data Each syntactic piece are given SO-score, and we treat the result of this as seed dictionary. 28
  29. positive sentences negative sentences Preparation for Sentence Classification - Making

    of Seed Dictionary - syntactic piece positive negative size is big 5 1 slow to respond 0 8 softly-colored 3 0 ・ ・ seed dictionary training data SO-score is calculated by probability of occurrence. (Fujimura et al.[04]) 29 Each syntactic piece are given SO-score, and we treat the result of this as seed dictionary.
  30. Evaluation expression is more, the better. For this, we need

    huge training data. It is costly to prepare by manual. We want to get training data automatically. So we make expanded dictionary. Preparation for Sentence Classification - Expansion of Dictionary - 30
  31. new training data We obtain syntactic piece Preparation for Sentence

    Classification - Expansion of Dictionary - seed dictionary syntactic piece positive negative continuing is difficult 0 5 good design 8 0 to be gift 5 1 ・ ・ expanded dictionary large scale corpus positive negative 31
  32. We obtain syntactic piece Preparation for Sentence Classification - Expansion

    of Dictionary - syntactic piece positive negative continuing is difficult 0 5 good design 8 0 to be gift 5 1 ・ ・ expanded dictionary Sentences from corpus are classified positive and negative by seed dictionary. We treat the result of this as new training data. new training data seed dictionary large scale corpus positive negative 32
  33. new training data We obtain syntactic piece Preparation for Sentence

    Classification - Expansion of Dictionary - seed dictionary syntactic piece positive negative continuing is difficult 0 5 good design 8 0 to be gift 5 1 ・ ・ expanded dictionary large scale corpus positive positive Syntactic pieces are obtained from new training data, and calculated frequency like making a seed dictionary. 33
  34. new training data We obtain syntactic piece Preparation for Sentence

    Classification - Expansion of Dictionary - seed dictionary syntactic piece positive negative continuing is difficult 0 5 good design 8 0 to be gift 5 1 ・ ・ large scale corpus positive positive Also semantic orientation score, and we treat the result of this as expanded dictionary. expanded dictionary 34
  35. Experiment • We manually prepared; • approximately 2,000 positive sentences

    • approximately 1,000 negative sentences • approximately 210,000 sentences as large scale corpus for expansion • We analyzed sentiment using the following methods for efficacy examination of each of our methods. (1) Using only generalization of same class expressions (2) Using only coping with form word (3) Combination of (1) and (2) (4) Using conventional syntactic piece (for baseline) 35
  36. Result 78.7 47.7 (3) (1)+(2) 75.5 47.1 (4) Baseline 77.3

    44.6 (2) only coping with Form word 77.1 49.8 (1) only generalization of same class expressions precision(%) recall(%) language processing units ・We can confirm the improvement of precision by all methods more than baseline. ・We can also improve recall in generalization of same class expressions. 36
  37. Discussion - Generalization of Same Class Expression - • It

    turned out high in recall than baseline. We could give the semantic orientation score to more sentences, and scale of the expansion dictionary is increased. We could get approximately 14,000 sentences (approximately 5.7% of increase) as new training data greater than conventional syntactic piece. 37
  38. Discussion - Coping with form word - 38 We tried

    to solve the problem of extraction of phrase pair which does not have meaning. In the result, some sentences that accidentally became the correct answer using conventional syntactic piece. In the dictionary using conventional syntactic piece • “Think that(なる,と → 思う[naru-to → omou])” is given positive score. This expression does not have semantic orientation.
  39. Our method can treat semantic orientation of each expression. In

    the dictionary using our method • “think to be cumber(邪魔になる-と → 思う[jama ni naru- to → omou])” is given negative score. • “think to become a present(プレゼントになる-と → 思う [present ni naru-to omou])” is given positive score. Discussion - Coping with form word - 39
  40. 79.9 78.8 word 2-gram 78.0 75.3 word 3-gram 77.1 49.8

    Using same class expressions precision(%) recall(%) language processing units Recall is lower than word 2-gram and word 3-gram. Discussion - Comparison with other language processing unit - 40
  41. Conclusion • We suggested two methods for improvement of syntactic

    piece. • We applied sentiment analysis to verify effectivity of improved syntactic piece. • As a result, recall and precision of improved syntactic piece increased than conventional one. • It is inferior as compared with word 2-gram or 3- gram. • In future works we intend to improve recall. 41