Seed Selection for Genre Specific Search Search and Information Extraction Lab P Nikhil Priyatam Krish Perumal Dharmesh Kakadia Vasudeva Varma International Institute of Information Technology, Hyderabad AIM
• This work aims to get a set of diverse seed URLs for genre specific search using Twitter data. SYSTEM
ARCHITECTURE
PROPOSED
ALGORITHM
WORKING
OF
ALGORITHM
EVALUATION
ARCHITECTURE
EXPERIMENTAL
RESULTS
MOTIVATION
• Coverage and diversity are crucial aspects of genre specific search engines. These depend largely on the initial set of seed URLs. There is no existing work that automates the process of seed URL selection with a focus on diversity. CONCLUSION
• First work to automate the process of seed URL selection for genre specific search • Addressed the issue of crawl diversity, which was hitherto neglected. DIVERSITY
SCORES
SIMILARITY
MEASURES
FOR
EDGES
• Content overlap • URL n-gram overlap • Timestamp similarity • Follower-followee relations or Retweets