Syntactic Piece:Idea, Purpose and Application to Sentiment Analysis
Kazuki Takigawa and Kazuhide Yamamoto. Syntactic Piece:Idea, Purpose and Application to Sentiment Analysis. Proceedings of 7th International Conference on Natural Language Processing and Knowledge Engineering (NLPKE 2011), pp.401-404 (2011.11)
of an expression. ex.)“かける[kakeru]” has some meaning; “do up”,”put on”, “take out” and so on. • word n-gram o It is often creates unnecessary elements. ex.) ”で-ある-こと[de-aru-koto](3-gram)” A processing unit which can keep meaning of expression is needed. Mainly processing units have some problems in Japanese 2
an expression. o ex.) 「かける」という単語 • word n-gram o It is often creates unnecessary elements. ex.) 「が,かける(2-gram)」「で,ある,こと(3-gram)」 A processing unit which can keep meaning of expression is needed. Mainly processing units in NLP We propose “syntactic piece”. 3 Background
• It consists of a pair of modifier and modificand, derived from syntactic structure. • This pair is expressed as: modifier → modificand Recently, immediate noise is very big. (最近まわりの騒音がとても大きい) recently big 最近→大きい Syntactic Piece What’s Syntactic Piece? very big とても→大きい immediate noise まわりの→騒音 noise is big 騒音が→大きい 4
easy to use, just like n-gram. It has syntactic structure It contains more information than n- gram. Similar to phrasal idiom It can deal with a chunk of meaning. 5
easy to use, just like n-gram. It has syntactic structure It contains more information than n- gram. Similar to phrasal idiom It can deal with a chunk of meaning. But syntactic piece has some problems. 6
syntactic piece is pair of phrase. So if we use syntactic piece, then we get many unique expressions. 2) Some phrase pairs not have meaning are included in the phrase pair generated by current method. Problem of Syntactic Piece We suggest solution of these problems. 7
“same class expressions” for decreasing unique expressions. “Same class expressions” means a set of expressions which have similar meaning even if the surface is different. 1.cake is delicious (ケーキ-が→おいしい) 2.delicious cake (おいしい→ケーキ) In these two expressions, the surface structure is different. But the meaning of both expression are very similar. These expressions, we call “same class expressions”. 8
constructed by adjective and noun with the same contents words. noun(-particle) → adjective adjective → noun 騒音-が → 大きい noise is big 大きい → 騒音 big noise We generalize same class expressions. Same class expressions have two criteria. 10
pieces constructed by verb and noun with the same contents words. noun(-particle) → verb verb → noun 子供-が → 楽しむ a child rejoice 楽しむ → 子供 rejoicing child We generalize same class expressions. Same class expressions have two criteria. 11
satisfied.(満足することができる) be satisfied 満足する→ こと(mannzoku-suru koto) I can be こと-が → できる(koto-ga dekiru) Modification relation is nothing. Any meaning is nothing. Some phrase pairs not have meaning. 14
this problem is that it is treated “こと[koto]” as “form word”. Form word is a type of content word, but it is diminished original meaning and used formally in Japanese. This is similar to relation pronoun such as “which”, “who”, ”when” etc. in English. 15
the phrase having form word as function word for before content word. be satisfied 満足する → こと I can be こと-が → できる I can be satisfied very much.(とても満足することができる) satisfied very much とても → 満足する 16 Method(2) - Coping with form word - conventional syntactic piece
できる I can be satisfied 満足すること-が → できる I can be satisfied very much.(とても満足することができる) satisfied very much とても → 満足する 17 Method(2) - Coping with form word - • We collected form words by manual. • We treat the phrase having form word as function word for before content word. copying with form word
for verifying effectivity of improved syntactic piece. • Target of sentiment analysis is a sentence, and a sentence is classified into positive, negative, or other. 1. A pair of evaluative expression and semantic orientation score (SO-score) are registered in a dictionary. in this: evaluative expression = syntactic piece 2.Each expression in input sentence is given SO-score from the dictionary. 3.A sentence is classified by summation of SO-score. 18
→ 騒音) noise is big (騒音-が → 大きい) noise is big(騒音が大きい) negative obtained syntactic piece matching noise is big:negative (騒音-が → 大きい) SO of syntactic Piece input dictionary Sentence Classification noise of fan (ファン-の → 騒音) input: negative 19
→ 騒音) noise is big (騒音-が → 大きい) noise is big(騒音が大きい) negative obtained syntactic piece matching noise is big:negative (騒音-が → 大きい) SO of syntactic Piece input dictionary Sentence Classification noise of fan (ファン-の → 騒音) input: negative Syntactic pieces are obtained from input. 20
→ 騒音) noise is big (騒音-が → 大きい) noise is big(騒音が大きい) negative obtained syntactic piece matching input dictionary Sentence Classification noise is big:negative (騒音-が → 大きい) SO of syntactic Piece noise of fan (ファン-の → 騒音) input: negative Obtained syntactic piece and word(s) of a dictionary are matched. 21
→ 騒音) noise is big (騒音-が → 大きい) obtained syntactic piece noise is big:negative (騒音-が → 大きい) SO of syntactic Piece input Sentence Classification noise of fan (ファン-の → 騒音) dictionary matching noise is big(騒音が大きい) negative We can treat that “noise is big” is negative. input: negative 22
→ 騒音) noise is big (騒音-が → 大きい) noise is big(騒音が大きい) negative obtained syntactic piece matching noise is big:negative (騒音-が → 大きい) SO of syntactic Piece input dictionary Sentence Classification noise of fan (ファン-の → 騒音) input: negative SO of input is negative. 23
dictionary, so If we have SO-score of an expression:“noise is big”, then we can give SO-score from “big noise” by same class expressions. • There should not be an expression which does not have meaning in a dictionary, such as “I can be” is “positive” by coping with form word. 24
of Seed Dictionary - syntactic piece positive negative size is big 5 1 slow to respond 0 8 softly-colored 3 0 ・ ・ seed dictionary We prepare positive and negative sentences as training data. training data 26
of Seed Dictionary - syntactic piece positive negative size is big 5 1 slow to respond 0 8 softly-colored 3 0 ・ ・ seed dictionary Syntactic pieces are obtained from training data, and calculated frequency. training data 27
of Seed Dictionary - syntactic piece positive negative size is big 5 1 slow to respond 0 8 softly-colored 3 0 ・ ・ seed dictionary training data Each syntactic piece are given SO-score, and we treat the result of this as seed dictionary. 28
of Seed Dictionary - syntactic piece positive negative size is big 5 1 slow to respond 0 8 softly-colored 3 0 ・ ・ seed dictionary training data SO-score is calculated by probability of occurrence. (Fujimura et al.[04]) 29 Each syntactic piece are given SO-score, and we treat the result of this as seed dictionary.
huge training data. It is costly to prepare by manual. We want to get training data automatically. So we make expanded dictionary. Preparation for Sentence Classification - Expansion of Dictionary - 30
Classification - Expansion of Dictionary - seed dictionary syntactic piece positive negative continuing is difficult 0 5 good design 8 0 to be gift 5 1 ・ ・ expanded dictionary large scale corpus positive negative 31
of Dictionary - syntactic piece positive negative continuing is difficult 0 5 good design 8 0 to be gift 5 1 ・ ・ expanded dictionary Sentences from corpus are classified positive and negative by seed dictionary. We treat the result of this as new training data. new training data seed dictionary large scale corpus positive negative 32
Classification - Expansion of Dictionary - seed dictionary syntactic piece positive negative continuing is difficult 0 5 good design 8 0 to be gift 5 1 ・ ・ expanded dictionary large scale corpus positive positive Syntactic pieces are obtained from new training data, and calculated frequency like making a seed dictionary. 33
Classification - Expansion of Dictionary - seed dictionary syntactic piece positive negative continuing is difficult 0 5 good design 8 0 to be gift 5 1 ・ ・ large scale corpus positive positive Also semantic orientation score, and we treat the result of this as expanded dictionary. expanded dictionary 34
• approximately 1,000 negative sentences • approximately 210,000 sentences as large scale corpus for expansion • We analyzed sentiment using the following methods for efficacy examination of each of our methods. (1) Using only generalization of same class expressions (2) Using only coping with form word (3) Combination of (1) and (2) (4) Using conventional syntactic piece (for baseline) 35
44.6 (2) only coping with Form word 77.1 49.8 (1) only generalization of same class expressions precision(%) recall(%) language processing units ・We can confirm the improvement of precision by all methods more than baseline. ・We can also improve recall in generalization of same class expressions. 36
turned out high in recall than baseline. We could give the semantic orientation score to more sentences, and scale of the expansion dictionary is increased. We could get approximately 14,000 sentences (approximately 5.7% of increase) as new training data greater than conventional syntactic piece. 37
to solve the problem of extraction of phrase pair which does not have meaning. In the result, some sentences that accidentally became the correct answer using conventional syntactic piece. In the dictionary using conventional syntactic piece • “Think that(なる,と → 思う[naru-to → omou])” is given positive score. This expression does not have semantic orientation.
the dictionary using our method • “think to be cumber(邪魔になる-と → 思う[jama ni naru- to → omou])” is given negative score. • “think to become a present(プレゼントになる-と → 思う [present ni naru-to omou])” is given positive score. Discussion - Coping with form word - 39
Using same class expressions precision(%) recall(%) language processing units Recall is lower than word 2-gram and word 3-gram. Discussion - Comparison with other language processing unit - 40
piece. • We applied sentiment analysis to verify effectivity of improved syntactic piece. • As a result, recall and precision of improved syntactic piece increased than conventional one. • It is inferior as compared with word 2-gram or 3- gram. • In future works we intend to improve recall. 41