Shravanam Search and Informa7on Extrac7on Lab (SIEL) Interna4onal Ins4tute of Informa4on Technology, Hyderabad Set Expansion • Set expansion aims to expand the given set into a larger set by discovering other entities belonging to the conceptual set. • Example, given { dollar, rupee, dinar } we want to identify that this set is about currency and identify all the other currencies. Motivation • At the core of many IR applications. • Most application also require the identification of the concept also. • Problems are known to be hard, when approached individually. Set Expansion • Web Search - Used Bing API. - Queries are of the AND form along with the concept. • WebList Extraction - Uses the html formatting like list and select tags and extracts the potential entities for expansion. - Also extracts other information like, context text, title, heading, url etc. • Candidate Identification • Expansion Validation - Based on the overlap between the candidate and the current set. - Use of selected information from web pages instead of full document similarity. Architecture Concept Identification • Concept prediction - Approach based on accumulated frequency of overlapped information. - URL, page title, context and heading of the WebList as potential indicators for the concept. • Concept Validation - Ensuring that the identified concept is general and not tied to any particular entity in the set. - Overlap between the entities in the set and concept is used as a distance and its variation indicates the quality of the set and the concept. The combination - The intuition behind using the concept in set expansion is the easily identifiable hints of concept for noisy lists. Features • Language Independent. • Domain Independent. • No training required. • Scalable Applications • Semantic Set Instance generation. • User recommendations. • Computational advertisement. Evaluation Future Work • Other approaches for WebList Extraction. • Other models for validation. • Influence based framework for set quality. References • R. Wang and W. Cohen. Iterative set expansion of named entity using the web. In ICDM, 2008. • Y. He and D. Xin. SEISA: set expansion by iterative similarity aggregation. In WWW, pages 427–436, 2011. candidate = max Wi * 1 Rank(i) ∑ Query Concept Seed Set Size Expanded Set Size Concept Iden4fied False en44es Countries 4 236 countries 1 Car Manufacturers 4 79 cars 0 Football players 3 68 players 4