Slide 1

Slide 1 text

Set  Expansion  with  Understanding   Dharmesh  Kakadia,  Avinash  Ram,  Ravishankar  Shravanam      Search  and  Informa7on  Extrac7on  Lab    (SIEL)    Interna4onal  Ins4tute  of  Informa4on  Technology,  Hyderabad     Set Expansion •  Set expansion aims to expand the given set into a larger set by discovering other entities belonging to the conceptual set. •  Example, given { dollar, rupee, dinar } we want to identify that this set is about currency and identify all the other currencies. Motivation •  At the core of many IR applications. •  Most application also require the identification of the concept also. •  Problems are known to be hard, when approached individually. Set Expansion •  Web Search -  Used Bing API. -  Queries are of the AND form along with the concept. •  WebList Extraction -  Uses the html formatting like list and select tags and extracts the potential entities for expansion. -  Also extracts other information like, context text, title, heading, url etc. •  Candidate Identification •  Expansion Validation -  Based on the overlap between the candidate and the current set. -  Use of selected information from web pages instead of full document similarity. Architecture Concept Identification •  Concept prediction -  Approach based on accumulated frequency of overlapped information. -  URL, page title, context and heading of the WebList as potential indicators for the concept. •  Concept Validation -  Ensuring that the identified concept is general and not tied to any particular entity in the set. -  Overlap between the entities in the set and concept is used as a distance and its variation indicates the quality of the set and the concept. The combination -  The intuition behind using the concept in set expansion is the easily identifiable hints of concept for noisy lists. Features •  Language Independent. •  Domain Independent. •  No training required. •  Scalable Applications •  Semantic Set Instance generation. •  User recommendations. •  Computational advertisement. Evaluation Future Work •  Other approaches for WebList Extraction. •  Other models for validation. •  Influence based framework for set quality. References •  R. Wang and W. Cohen. Iterative set expansion of named entity using the web. In ICDM, 2008. •  Y. He and D. Xin. SEISA: set expansion by iterative similarity aggregation. In WWW, pages 427–436, 2011. candidate = max Wi * 1 Rank(i) ∑ Query  Concept   Seed  Set   Size   Expanded   Set  Size   Concept   Iden4fied   False   en44es   Countries   4   236   countries   1   Car   Manufacturers   4   79   cars   0   Football  players   3   68   players   4