Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Set Expansion with understanding

Set Expansion with understanding

This poster was part of Set Expansion project


December 12, 2012

More Decks by dharmeshkakadia

Other Decks in Technology


  1. Set  Expansion  with  Understanding   Dharmesh  Kakadia,  Avinash  Ram,  Ravishankar

     Shravanam      Search  and  Informa7on  Extrac7on  Lab    (SIEL)    Interna4onal  Ins4tute  of  Informa4on  Technology,  Hyderabad     Set Expansion •  Set expansion aims to expand the given set into a larger set by discovering other entities belonging to the conceptual set. •  Example, given { dollar, rupee, dinar } we want to identify that this set is about currency and identify all the other currencies. Motivation •  At the core of many IR applications. •  Most application also require the identification of the concept also. •  Problems are known to be hard, when approached individually. Set Expansion •  Web Search -  Used Bing API. -  Queries are of the AND form along with the concept. •  WebList Extraction -  Uses the html formatting like list and select tags and extracts the potential entities for expansion. -  Also extracts other information like, context text, title, heading, url etc. •  Candidate Identification •  Expansion Validation -  Based on the overlap between the candidate and the current set. -  Use of selected information from web pages instead of full document similarity. Architecture Concept Identification •  Concept prediction -  Approach based on accumulated frequency of overlapped information. -  URL, page title, context and heading of the WebList as potential indicators for the concept. •  Concept Validation -  Ensuring that the identified concept is general and not tied to any particular entity in the set. -  Overlap between the entities in the set and concept is used as a distance and its variation indicates the quality of the set and the concept. The combination -  The intuition behind using the concept in set expansion is the easily identifiable hints of concept for noisy lists. Features •  Language Independent. •  Domain Independent. •  No training required. •  Scalable Applications •  Semantic Set Instance generation. •  User recommendations. •  Computational advertisement. Evaluation Future Work •  Other approaches for WebList Extraction. •  Other models for validation. •  Influence based framework for set quality. References •  R. Wang and W. Cohen. Iterative set expansion of named entity using the web. In ICDM, 2008. •  Y. He and D. Xin. SEISA: set expansion by iterative similarity aggregation. In WWW, pages 427–436, 2011. candidate = max Wi * 1 Rank(i) ∑ Query  Concept   Seed  Set   Size   Expanded   Set  Size   Concept   Iden4fied   False   en44es   Countries   4   236   countries   1   Car   Manufacturers   4   79   cars   0   Football  players   3   68   players   4