Lab P Nikhil Priyatam Krish Perumal Dharmesh Kakadia Vasudeva Varma International Institute of Information Technology, Hyderabad AIM • This work aims to get a set of diverse seed URLs for genre specific search using Twitter data. SYSTEM ARCHITECTURE PROPOSED ALGORITHM WORKING OF ALGORITHM EVALUATION ARCHITECTURE EXPERIMENTAL RESULTS MOTIVATION • Coverage and diversity are crucial aspects of genre specific search engines. These depend largely on the initial set of seed URLs. There is no existing work that automates the process of seed URL selection with a focus on diversity. CONCLUSION • First work to automate the process of seed URL selection for genre specific search • Addressed the issue of crawl diversity, which was hitherto neglected. DIVERSITY SCORES SIMILARITY MEASURES FOR EDGES • Content overlap • URL n-gram overlap • Timestamp similarity • Follower-followee relations or Retweets