Slide 1
Slide 1 text
Set
Expansion
with
Understanding
Dharmesh
Kakadia,
Avinash
Ram,
Ravishankar
Shravanam
Search
and
Informa7on
Extrac7on
Lab
(SIEL)
Interna4onal
Ins4tute
of
Informa4on
Technology,
Hyderabad
Set Expansion
• Set expansion aims to expand the given set into a larger
set by discovering other entities belonging to the
conceptual set.
• Example, given { dollar, rupee, dinar } we want to identify
that this set is about currency and identify all the other
currencies.
Motivation
• At the core of many IR applications.
• Most application also require the identification of the
concept also.
• Problems are known to be hard, when approached
individually.
Set Expansion
• Web Search
- Used Bing API.
- Queries are of the AND form along with the concept.
• WebList Extraction
- Uses the html formatting like list and select tags and
extracts the potential entities for expansion.
- Also extracts other information like, context text, title,
heading, url etc.
• Candidate Identification
• Expansion Validation
- Based on the overlap between the candidate and the
current set.
- Use of selected information from web pages instead of full
document similarity.
Architecture
Concept Identification
• Concept prediction
- Approach based on accumulated frequency of overlapped
information.
- URL, page title, context and heading of the WebList as
potential indicators for the concept.
• Concept Validation
- Ensuring that the identified concept is general and not tied
to any particular entity in the set.
- Overlap between the entities in the set and concept is used
as a distance and its variation indicates the quality of the
set and the concept.
The combination
- The intuition behind using the concept in set expansion is
the easily identifiable hints of concept for noisy lists.
Features
• Language Independent.
• Domain Independent.
• No training required.
• Scalable
Applications
• Semantic Set Instance generation.
• User recommendations.
• Computational advertisement.
Evaluation
Future Work
• Other approaches for WebList Extraction.
• Other models for validation.
• Influence based framework for set quality.
References
• R. Wang and W. Cohen. Iterative set expansion of named
entity using the web. In ICDM, 2008.
• Y. He and D. Xin. SEISA: set expansion by iterative
similarity aggregation. In WWW, pages 427–436, 2011.
candidate = max Wi *
1
Rank(i)
∑
Query
Concept
Seed
Set
Size
Expanded
Set
Size
Concept
Iden4fied
False
en44es
Countries
4
236
countries
1
Car
Manufacturers
4
79
cars
0
Football
players
3
68
players
4