Generation of Descriptive Elements for Text

Generation of Descriptive Elements for Text Mutsugu Kuboki, Kazuhide Yamamoto
Nagaoka University of Technology, Japan 1

2 What description about query is these texts? Query is
“LPF” We can‟t recognize it immediately. It may be that the text may not describe query. We want to know content at web search results. ….But it is difficult.

3 We try to generate Descriptive Elements(DE). Query is “LPF”
Structure, Type, … Background, Work, … …

Main works 1. Extraction Candidates of DEs 2. Assigning DEs
to text (This work is tried by Japanese texts only) 4

Extraction of DEs  DEs are different of query. Examples
Apple: Kind, Size, Area-of-Production, … LPF: Role, Structure, Performance , …  We try to get candidates in advance. 6

Extraction of DEs Extract DEs from web search results. We
use following rules. 7 (1)Pattern ex) enforcement of LawProtectingPersonalInformation(eng) kojinjyouhouhogohou-no-shikou(jpn) (2)DEs are one word in Japanese. „noun or compound nouns‟ of „query‟ „query “no” nouns or compound nouns‟(jpn) Note. Japanese word “no” means “of”

Candidates extraction  Query „law protecting personal information‟ (”kojinjouhouhogohou” in
Japanese)  Data top 10,000 Google search results  Evaluation We evaluate candidate manually. 8

Result of candidates extraction Candidates 366 Adequate DEs 289(79%) Inadequate
DEs 77(21%) Adequate)infraction, operation, influence, … Inadequate)learning, expert, … 79% of candidates are useful. 9

Result of candidates extraction Candidates 366 Adequate DEs 289(79%) Inadequate
DEs 77(21%)  Next experiment use 54 DEs from adequate Candidates. 10 Above results include a lot of low frequency DEs. These DEs are rejected from candidates.

Method We assume that texts of same DE include same
words. 12 Paragraph 1 ={w1,w2,w3,…} Paragraph 2 ={w2,w3,w4,…} … Paragraphs DE: X {w2,w3} Trigger of DE X Triggers construct 1, 2 and 3 morphemes. (2)Extract cooccurrence words. (1)Extract text from the web. (3)Collect Triggers.

Method We assume that same DE texts include same words.
13 {w2, w3}, {w4, w7,w9}, {w11},… Triggers of DE X Does text include Triggers? YES DE: X No This text is not DE X.

How to make Triggers(1) 14 1. Extract paragraphs which include
“query-no-DE(jpn)” from the web. 2. Extract the content words from the paragraphs. 3. Extract cooccurrence words from the same DE paragraphs. Triggers

How to make Triggers(2) If Triggers apply to following rules,
these are excluded from Triggers.  Appearance frequency is 10% or under of whole paragraphs which include „query-no- DE‟  Same to query words 15

How to make Triggers(3) Try to pilot test. And we
use combination of following rules. (1)[used Trigger] used by the pilot test Effect to increase accuracy (2)[unused Trigger] assign mistake over two (3)[unused Trigger] used by over 2 Des Lead to error (not decisive factor) 16

Result of cooccurrence Trigger • 1, 2, 3 triggers are
number of morphemes. • (1)(2)(3) are restriction rules. Type Recall Precision F value average nominations Data (100 texts) 0.72 0.06 0.10 54.0 1 Trigger (1) 0.70 0.07 0.13 41.4 2 Trigger (1) 0.70 0.08 0.14 36.5 3 Trigger (1) 0.62 0.09 0.16 27.3 1 Trigger (1)(2) 0.42 0.15 0.22 5.9 2 Trigger (1)(2) 0.54 0.10 0.17 20.9 3 Trigger (1)(2) 0.55 0.10 0.16 21.8 1 Trigger (1)(2)(3) 0.37 0.16 0.22 3.4 2 Trigger (1)(2)(3) 0.52 0.10 0.17 18.5 3 Trigger (1)(2)(3) 0.55 0.10 0.17 20.3 17

Result of cooccurrence Trigger • High recall values and low
precision values • Restriction rule is effective (nomination is decreasing) Type Recall Precision F value average nominations Data (100 texts) 0.72 0.06 0.10 54.0 1 Trigger (1) 0.70 0.07 0.13 41.4 2 Trigger (1) 0.70 0.08 0.14 36.5 3 Trigger (1) 0.62 0.09 0.16 27.3 1 Trigger (1)(2) 0.42 0.15 0.22 5.9 2 Trigger (1)(2) 0.54 0.10 0.17 20.9 3 Trigger (1)(2) 0.55 0.10 0.16 21.8 1 Trigger (1)(2)(3) 0.37 0.16 0.22 3.4 2 Trigger (1)(2)(3) 0.52 0.10 0.17 18.5 3 Trigger (1)(2)(3) 0.55 0.10 0.17 20.3 18

Issues of proposed method Low Precision Value. We want to
know factor to decide DEs.  Let‟s try to use more strong rules. Modification relation Triggers Notice. Next experiment uses 19 DEs for simplicity (Similar DEs are rejected from candidates) 19

Modification relation Triggers  Used patterns 1. noun and DEs
2. noun and synonym of DEs 3. noun and hyponym of DEs Synonyms and hyponyms are obtained from Japanese WordNet. 20

Result of modification relation Triggers p/p=system out right answers p/n=system
out mistake answers n/p=system doesn‟t looking for DE (answers have DE) n/n=system does recognize non DE text 21 Trigger Prec. system/answer p/p p/n n/p n/n All 0.31 11 24 181 1615 DE 0.67 6 3 - - Synonym 0.21 3 11 - - Hyponym 0.17 2 10 - - Answer data - 192 - - 1708

Result of modification relation Triggers 22 Trigger Prec. system/answer p/p
p/n n/p n/n All 0.31 11 24 181 1615 DE 0.67 6 3 - - Synonym 0.21 3 11 - - Hyponym 0.17 2 10 - - Answer data - 192 - - 1708 Results have a lot of mistakes. Trigger is not effect to evaluate true or false? check results

Result p/n(assigning errors)(1) Almost Triggers are constructed by words to
relate DE. 23  22/24 results are constructed by relation words.  Examples  Operation(cabinet, citizen, month)  Enforcement-status(announcement, cabinet, year) Words of Triggers relate to DE. But precision value is low.

Result p/n(assigning errors)(2)  Conclusion Error factor isn‟t necessarily Trigger
words. 24 Only judgment that text have keywords or not doesn‟t assure precision. What factor does increase precision?

Result p/n(system can‟t find DE) how to decide DEs by
people. we check n/p(181 pairs) manually 25 Factor is unclear • 28 pairs(15%) • 11 pair DE is “description” • Others are low frequently. Factor is clear • 153 pairs(85%) • These have specific expression. Word, Words or Phrase

How to decide DEs by people  Conclusion  We
use only part of text to decide almost DE.(Text only explain query)  Point  don‟t use all text.  Example …Law protecting personal information is established for fiscal year 2003… 26

Conclusions  Effect of Trigger  Trigger don‟t assure precision.
 How to increase precision  Don‟t use all text. System have to use only part of text that explain query. 27

Future work  Assigning DEs use part of text 
Look into… • Effect of using part of text • Other factor to decide DEs 28

Mistake in my paper… I‟m sorry … III. Experiments and
Results C. Assigning DE Using Restricted Triggers Please change (1) to (2). 30

Example of p/p •Difinition（生存-living, 識別-recognize）死者に関する情報であってもその内容が遺族等の生存する個人を識別できる場合には個人情報保護法の個人情報として取り扱う必要があります。 31

Example of p/n • Effect（多い-many, 施行-enforcement）主催した道中小企業家同友会帯広支部の石戸谷和政事務局長は「個人情報保護法といっても、正直、何から始めればいいのか分からない経営者が多い。施行が目の前に迫り、せっぱ詰まっている」と経営者
たちの胸の内を代弁する。 32

Example of n/p •影響(Effect) 情報漏洩罪が出てきた背景には、従業員が個人情報を漏洩するケースが多く、かつ技術による防御には限界があるという認識がある。情報セキュリティに完璧はありえない。完璧を求めなくとも情報セキュリティ対策にはコストがかかり、個人情報保護法の施行以来、企業は多大なコスト負担に泣いているとい
う現状がある。 33

Generation of Descriptive Elements for Text

Generation of Descriptive Elements for Text

自然言語処理研究室

More Decks by 自然言語処理研究室

Other Decks in Research

Featured

Transcript

Generation of Descriptive Elements for Text Mutsugu Kuboki, Kazuhide Yamamoto

2 What description about query is these texts? Query is

3 We try to generate Descriptive Elements(DE). Query is “LPF”

Main works 1. Extraction Candidates of DEs 2. Assigning DEs

Main works 1. Extraction Candidates of DEs 2. Assigning DEs

Extraction of DEs  DEs are different of query. Examples

Extraction of DEs Extract DEs from web search results. We

Candidates extraction  Query „law protecting personal information‟ (”kojinjouhouhogohou” in

Result of candidates extraction Candidates 366 Adequate DEs 289(79%) Inadequate

Result of candidates extraction Candidates 366 Adequate DEs 289(79%) Inadequate

Main works 1. Extraction Candidates of DEs 2. Assigning DEs

Method We assume that texts of same DE include same

Method We assume that same DE texts include same words.

How to make Triggers(1) 14 1. Extract paragraphs which include

How to make Triggers(2) If Triggers apply to following rules,

How to make Triggers(3) Try to pilot test. And we

Result of cooccurrence Trigger • 1, 2, 3 triggers are

Result of cooccurrence Trigger • High recall values and low

Issues of proposed method Low Precision Value. We want to

Modification relation Triggers  Used patterns 1. noun and DEs

Result of modification relation Triggers p/p=system out right answers p/n=system

Result of modification relation Triggers 22 Trigger Prec. system/answer p/p

Result p/n(assigning errors)(1) Almost Triggers are constructed by words to

Result p/n(assigning errors)(2)  Conclusion Error factor isn‟t necessarily Trigger

Result p/n(system can‟t find DE) how to decide DEs by

How to decide DEs by people  Conclusion  We

Conclusions  Effect of Trigger  Trigger don‟t assure precision.

Future work  Assigning DEs use part of text 

29

Mistake in my paper… I‟m sorry … III. Experiments and

Example of p/p •Difinition（生存-living, 識別-recognize）死者に関する情報であってもその内容が遺族等の生存する個人を識別できる場合には個人情報保護法の個人情報として取り扱う必要があります。 31