$30 off During Our Annual Pro Sale. View Details »

Set Expansion with understanding

Set Expansion with understanding

This poster was part of Set Expansion project

dharmeshkakadia

December 12, 2012
Tweet

More Decks by dharmeshkakadia

Other Decks in Technology

Transcript

  1. Set  Expansion  with  Understanding  
    Dharmesh  Kakadia,  Avinash  Ram,  Ravishankar  Shravanam  
       Search  and  Informa7on  Extrac7on  Lab
       (SIEL)  
     Interna4onal  Ins4tute  of  Informa4on  Technology,  Hyderabad    
    Set Expansion
    •  Set expansion aims to expand the given set into a larger
    set by discovering other entities belonging to the
    conceptual set.
    •  Example, given { dollar, rupee, dinar } we want to identify
    that this set is about currency and identify all the other
    currencies.
    Motivation
    •  At the core of many IR applications.
    •  Most application also require the identification of the
    concept also.
    •  Problems are known to be hard, when approached
    individually.
    Set Expansion
    •  Web Search
    -  Used Bing API.
    -  Queries are of the AND form along with the concept.
    •  WebList Extraction
    -  Uses the html formatting like list and select tags and
    extracts the potential entities for expansion.
    -  Also extracts other information like, context text, title,
    heading, url etc.
    •  Candidate Identification
    •  Expansion Validation
    -  Based on the overlap between the candidate and the
    current set.
    -  Use of selected information from web pages instead of full
    document similarity.
    Architecture
    Concept Identification
    •  Concept prediction
    -  Approach based on accumulated frequency of overlapped
    information.
    -  URL, page title, context and heading of the WebList as
    potential indicators for the concept.
    •  Concept Validation
    -  Ensuring that the identified concept is general and not tied
    to any particular entity in the set.
    -  Overlap between the entities in the set and concept is used
    as a distance and its variation indicates the quality of the
    set and the concept.
    The combination
    -  The intuition behind using the concept in set expansion is
    the easily identifiable hints of concept for noisy lists.
    Features
    •  Language Independent.
    •  Domain Independent.
    •  No training required.
    •  Scalable
    Applications
    •  Semantic Set Instance generation.
    •  User recommendations.
    •  Computational advertisement.
    Evaluation
    Future Work
    •  Other approaches for WebList Extraction.
    •  Other models for validation.
    •  Influence based framework for set quality.
    References
    •  R. Wang and W. Cohen. Iterative set expansion of named
    entity using the web. In ICDM, 2008.
    •  Y. He and D. Xin. SEISA: set expansion by iterative
    similarity aggregation. In WWW, pages 427–436, 2011.
    candidate = max Wi *
    1
    Rank(i)

    Query  Concept  
    Seed  Set  
    Size  
    Expanded  
    Set  Size  
    Concept  
    Iden4fied  
    False  
    en44es  
    Countries   4   236   countries   1  
    Car  
    Manufacturers  
    4   79   cars   0  
    Football  players   3   68   players   4  

    View Slide