for scholarly articles
Recommender systems represent user preferences for the purpose of suggesting
items to purchase or examine.
Through this project we have tried to address this problem by providing
recommendation results by using latent information about the user's research
interests that exists in their publication list.
The datasets can be used for other purposes such as classification, clustering,
What is Scholarec ?
Scholarec is a Recommender System for Scientific Documents
It classifies documents and uses personalization features to suggest/recommend
Ability to search from a huge collection of Articles, Reports and other scholarly
Seamless extension to current online repositories of Scholarly Articles
Robust Back-end search engine
Interactive User Interface
Personalization through OpenID and Oauth integration
Recommendations based on user's interests.
How ScholaRec works ?
Pdf to Text
User feedback rating and content based
Flowchart of the Scholarec
Content based filtering
Recommendation after comparing items vs. user-profiles. Each item's content is a set of
Content-based Filtering tries to estimate ratings for the user based on user's history.
This is the generalization of the aggregation
functions used for content based filtering.
Other Algorithms used
Item based algorithm: Serves as the heart of recommendation
Tf-IDf algorithm: Searching purpose
Matrix factorization: Table generation / operations on matrix
Bag of words Approach: Field suggestion
Reg -ex based algorithm: Parsing through Lucene/ElasticSearch
Word similarity/ implicit algorithms: Keyword suggestion
A wide variety of free and open sourced software tools and libraries
Python programming & scripting language
Django Web framework
HTML5,CSS3 & jQuery
Twitter Bootstrap (Responsive UI)
ElasticSearch & MongoDB
Dec’13 1st week Jan’14
2nd and 3rd week Jan 2014
- 28 April’14
2 Mar’14 to 19
1 Feb’14 to Mar’14
4th week Jan’14
Existing products in the market, like Google Scholar, Microsoft Virtual Academy, Arxiv provide
a way to search among the articles and rate them, but not recommend them.
• more than 2.3 million
articles on computer
science in October 2013
• Developer: Alexander
• Alexa Rank: 8,715 (April
• 939,001 e-prints in
• Creator: Paul Ginsberg
• Owner: Cornell Library
• Submission rate is more
than 7000 per month.
• bibliographic database
• Owner : Google Inc
• High weight on
• First search results are
often highly cited
• Google Scholar index
includes most peer-
reviewed online journals
Division of Work
Code Refactoring and
GitHub page creation
& Market Research
Data gathering &
Packaging & Testing
Data structuring &
Python & shell
Packaging & Testing