Introduction Recommender systems represent user preferences for the purpose of suggesting items to purchase or examine. Through this project we have tried to address this problem by providing recommendation results by using latent information about the user's research interests that exists in their publication list. The datasets can be used for other purposes such as classification, clustering, trend analysis.
What is Scholarec ? Scholarec is a Recommender System for Scientific Documents It classifies documents and uses personalization features to suggest/recommend similar ones.
Features Ability to search from a huge collection of Articles, Reports and other scholarly works. Seamless extension to current online repositories of Scholarly Articles Robust Back-end search engine Interactive User Interface Personalization through OpenID and Oauth integration Recommendations based on user's interests.
Archive Dump Pdf to Text Keyword extraction User feedback rating and content based filtering Custom search algorithms Word Similarity Representation of recommendation
Content based filtering Recommendation after comparing items vs. user-profiles. Each item's content is a set of identifiers. Content-based Filtering tries to estimate ratings for the user based on user's history. This is the generalization of the aggregation functions used for content based filtering.
Other Algorithms used Item based algorithm: Serves as the heart of recommendation Tf-IDf algorithm: Searching purpose Matrix factorization: Table generation / operations on matrix Bag of words Approach: Field suggestion Reg -ex based algorithm: Parsing through Lucene/ElasticSearch Word similarity/ implicit algorithms: Keyword suggestion
Market Research Existing products in the market, like Google Scholar, Microsoft Virtual Academy, Arxiv provide a way to search among the articles and rate them, but not recommend them.
• dblp.uni-trier.de • more than 2.3 million articles on computer science in October 2013 • Developer: Alexander Weber • Alexa Rank: 8,715 (April 2014 • Arxiv.org • 939,001 e-prints in Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance and Statistics • Creator: Paul Ginsberg • Owner: Cornell Library • Submission rate is more than 7000 per month. • scholar.google.com • bibliographic database • Owner : Google Inc • High weight on citation counts • First search results are often highly cited articles • Google Scholar index includes most peer- reviewed online journals Comparison