Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Generation of Descriptive Elements for Text

Generation of Descriptive Elements for Text

Mutsugu Kuboki and Kazuhide Yamamoto. Generation of Descriptive Elements for Text. Proceedings of 7th International Conference on Natural Language Processing and Knowledge Engineering (NLPKE 2011), pp.56-59 (2011.11)

自然言語処理研究室

November 30, 2011
Tweet

More Decks by 自然言語処理研究室

Other Decks in Research

Transcript

  1. 2 What description about query is these texts? Query is

    “LPF” We can‟t recognize it immediately. It may be that the text may not describe query. We want to know content at web search results. ….But it is difficult.
  2. 3 We try to generate Descriptive Elements(DE). Query is “LPF”

    Structure, Type, … Background, Work, … …
  3. Main works 1. Extraction Candidates of DEs 2. Assigning DEs

    to text (This work is tried by Japanese texts only) 4
  4. Main works 1. Extraction Candidates of DEs 2. Assigning DEs

    to text (This work is tried by Japanese texts only) 5
  5. Extraction of DEs  DEs are different of query. Examples

    Apple: Kind, Size, Area-of-Production, … LPF: Role, Structure, Performance , …  We try to get candidates in advance. 6
  6. Extraction of DEs Extract DEs from web search results. We

    use following rules. 7 (1)Pattern ex) enforcement of LawProtectingPersonalInformation(eng) kojinjyouhouhogohou-no-shikou(jpn) (2)DEs are one word in Japanese. „noun or compound nouns‟ of „query‟ „query “no” nouns or compound nouns‟(jpn) Note. Japanese word “no” means “of”
  7. Candidates extraction  Query „law protecting personal information‟ (”kojinjouhouhogohou” in

    Japanese)  Data top 10,000 Google search results  Evaluation We evaluate candidate manually. 8
  8. Result of candidates extraction Candidates 366 Adequate DEs 289(79%) Inadequate

    DEs 77(21%) Adequate)infraction, operation, influence, … Inadequate)learning, expert, … 79% of candidates are useful. 9
  9. Result of candidates extraction Candidates 366 Adequate DEs 289(79%) Inadequate

    DEs 77(21%)  Next experiment use 54 DEs from adequate Candidates. 10 Above results include a lot of low frequency DEs. These DEs are rejected from candidates.
  10. Main works 1. Extraction Candidates of DEs 2. Assigning DEs

    to text (This work is tried by Japanese texts only) 11
  11. Method We assume that texts of same DE include same

    words. 12 Paragraph 1 ={w1,w2,w3,…} Paragraph 2 ={w2,w3,w4,…} … Paragraphs DE: X {w2,w3} Trigger of DE X Triggers construct 1, 2 and 3 morphemes. (2)Extract cooccurrence words. (1)Extract text from the web. (3)Collect Triggers.
  12. Method We assume that same DE texts include same words.

    13 {w2, w3}, {w4, w7,w9}, {w11},… Triggers of DE X Does text include Triggers? YES DE: X No This text is not DE X.
  13. How to make Triggers(1) 14 1. Extract paragraphs which include

    “query-no-DE(jpn)” from the web. 2. Extract the content words from the paragraphs. 3. Extract cooccurrence words from the same DE paragraphs. Triggers
  14. How to make Triggers(2) If Triggers apply to following rules,

    these are excluded from Triggers.  Appearance frequency is 10% or under of whole paragraphs which include „query-no- DE‟  Same to query words 15
  15. How to make Triggers(3) Try to pilot test. And we

    use combination of following rules. (1)[used Trigger] used by the pilot test Effect to increase accuracy (2)[unused Trigger] assign mistake over two (3)[unused Trigger] used by over 2 Des Lead to error (not decisive factor) 16
  16. Result of cooccurrence Trigger • 1, 2, 3 triggers are

    number of morphemes. • (1)(2)(3) are restriction rules. Type Recall Precision F value average nominations Data (100 texts) 0.72 0.06 0.10 54.0 1 Trigger (1) 0.70 0.07 0.13 41.4 2 Trigger (1) 0.70 0.08 0.14 36.5 3 Trigger (1) 0.62 0.09 0.16 27.3 1 Trigger (1)(2) 0.42 0.15 0.22 5.9 2 Trigger (1)(2) 0.54 0.10 0.17 20.9 3 Trigger (1)(2) 0.55 0.10 0.16 21.8 1 Trigger (1)(2)(3) 0.37 0.16 0.22 3.4 2 Trigger (1)(2)(3) 0.52 0.10 0.17 18.5 3 Trigger (1)(2)(3) 0.55 0.10 0.17 20.3 17
  17. Result of cooccurrence Trigger • High recall values and low

    precision values • Restriction rule is effective (nomination is decreasing) Type Recall Precision F value average nominations Data (100 texts) 0.72 0.06 0.10 54.0 1 Trigger (1) 0.70 0.07 0.13 41.4 2 Trigger (1) 0.70 0.08 0.14 36.5 3 Trigger (1) 0.62 0.09 0.16 27.3 1 Trigger (1)(2) 0.42 0.15 0.22 5.9 2 Trigger (1)(2) 0.54 0.10 0.17 20.9 3 Trigger (1)(2) 0.55 0.10 0.16 21.8 1 Trigger (1)(2)(3) 0.37 0.16 0.22 3.4 2 Trigger (1)(2)(3) 0.52 0.10 0.17 18.5 3 Trigger (1)(2)(3) 0.55 0.10 0.17 20.3 18
  18. Issues of proposed method Low Precision Value. We want to

    know factor to decide DEs.  Let‟s try to use more strong rules. Modification relation Triggers Notice. Next experiment uses 19 DEs for simplicity (Similar DEs are rejected from candidates) 19
  19. Modification relation Triggers  Used patterns 1. noun and DEs

    2. noun and synonym of DEs 3. noun and hyponym of DEs Synonyms and hyponyms are obtained from Japanese WordNet. 20
  20. Result of modification relation Triggers p/p=system out right answers p/n=system

    out mistake answers n/p=system doesn‟t looking for DE (answers have DE) n/n=system does recognize non DE text 21 Trigger Prec. system/answer p/p p/n n/p n/n All 0.31 11 24 181 1615 DE 0.67 6 3 - - Synonym 0.21 3 11 - - Hyponym 0.17 2 10 - - Answer data - 192 - - 1708
  21. Result of modification relation Triggers 22 Trigger Prec. system/answer p/p

    p/n n/p n/n All 0.31 11 24 181 1615 DE 0.67 6 3 - - Synonym 0.21 3 11 - - Hyponym 0.17 2 10 - - Answer data - 192 - - 1708 Results have a lot of mistakes. Trigger is not effect to evaluate true or false? check results
  22. Result p/n(assigning errors)(1) Almost Triggers are constructed by words to

    relate DE. 23  22/24 results are constructed by relation words.  Examples  Operation(cabinet, citizen, month)  Enforcement-status(announcement, cabinet, year) Words of Triggers relate to DE. But precision value is low.
  23. Result p/n(assigning errors)(2)  Conclusion Error factor isn‟t necessarily Trigger

    words. 24 Only judgment that text have keywords or not doesn‟t assure precision. What factor does increase precision?
  24. Result p/n(system can‟t find DE) how to decide DEs by

    people. we check n/p(181 pairs) manually 25 Factor is unclear • 28 pairs(15%) • 11 pair DE is “description” • Others are low frequently. Factor is clear • 153 pairs(85%) • These have specific expression. Word, Words or Phrase
  25. How to decide DEs by people  Conclusion  We

    use only part of text to decide almost DE.(Text only explain query)  Point  don‟t use all text.  Example …Law protecting personal information is established for fiscal year 2003… 26
  26. Conclusions  Effect of Trigger  Trigger don‟t assure precision.

     How to increase precision  Don‟t use all text. System have to use only part of text that explain query. 27
  27. Future work  Assigning DEs use part of text 

    Look into… • Effect of using part of text • Other factor to decide DEs 28
  28. 29

  29. Mistake in my paper… I‟m sorry … III. Experiments and

    Results C. Assigning DE Using Restricted Triggers Please change (1) to (2). 30