Slide 1
Slide 1 text
Seed Selection for Genre Specific Search
Search and Information Extraction Lab P Nikhil Priyatam Krish Perumal Dharmesh Kakadia
Vasudeva Varma
International Institute of Information Technology, Hyderabad
AIM
• This work aims to get a set of diverse seed URLs
for genre specific search using Twitter data.
SYSTEM
ARCHITECTURE
PROPOSED
ALGORITHM
WORKING
OF
ALGORITHM
EVALUATION
ARCHITECTURE
EXPERIMENTAL
RESULTS
MOTIVATION
• Coverage and diversity are crucial aspects of
genre specific search engines. These depend
largely on the initial set of seed URLs. There is
no existing work that automates the process of
seed URL selection with a focus on diversity.
CONCLUSION
• First work to automate the process of seed URL
selection for genre specific search
• Addressed the issue of crawl diversity, which was
hitherto neglected.
DIVERSITY
SCORES
SIMILARITY
MEASURES
FOR
EDGES
• Content overlap
• URL n-gram overlap
• Timestamp similarity
• Follower-followee relations or Retweets