Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Seed Selection for Genre Specific Search

Seed Selection for Genre Specific Search

This poster was presented at IIIT-H RnD show case

dharmeshkakadia

February 09, 2013
Tweet

More Decks by dharmeshkakadia

Other Decks in Technology

Transcript

  1. Seed Selection for Genre Specific Search Search and Information Extraction

    Lab P Nikhil Priyatam Krish Perumal Dharmesh Kakadia Vasudeva Varma International Institute of Information Technology, Hyderabad AIM   •  This work aims to get a set of diverse seed URLs for genre specific search using Twitter data. SYSTEM  ARCHITECTURE   PROPOSED  ALGORITHM   WORKING  OF  ALGORITHM   EVALUATION  ARCHITECTURE   EXPERIMENTAL  RESULTS   MOTIVATION   •  Coverage and diversity are crucial aspects of genre specific search engines. These depend largely on the initial set of seed URLs. There is no existing work that automates the process of seed URL selection with a focus on diversity. CONCLUSION   •  First work to automate the process of seed URL selection for genre specific search •  Addressed the issue of crawl diversity, which was hitherto neglected. DIVERSITY  SCORES   SIMILARITY  MEASURES  FOR  EDGES   •  Content overlap •  URL n-gram overlap •  Timestamp similarity •  Follower-followee relations or Retweets