ScholaRec

Recommender System for scholarly articles Archit Sharma [email protected] http://work.arcolife.in

Introduction  Recommender systems represent user preferences for the purpose
of suggesting items to purchase or examine.  Through this project we have tried to address this problem by providing recommendation results by using latent information about the user's research interests that exists in their publication list.  The datasets can be used for other purposes such as classification, clustering, trend analysis.

What is Scholarec ?  Scholarec is a Recommender System
for Scientific Documents  It classifies documents and uses personalization features to suggest/recommend similar ones.

Features  Ability to search from a huge collection of
Articles, Reports and other scholarly works.  Seamless extension to current online repositories of Scholarly Articles  Robust Back-end search engine  Interactive User Interface  Personalization through OpenID and Oauth integration  Recommendations based on user's interests.

How ScholaRec works ?

Archive Dump Pdf to Text Keyword extraction User feedback rating
and content based filtering Custom search algorithms Word Similarity Representation of recommendation

Flowchart of the Scholarec

Algorithms

Content based filtering  Recommendation after comparing items vs. user-profiles.
Each item's content is a set of identifiers.  Content-based Filtering tries to estimate ratings for the user based on user's history. This is the generalization of the aggregation functions used for content based filtering.

Other Algorithms used  Item based algorithm: Serves as the
heart of recommendation  Tf-IDf algorithm: Searching purpose  Matrix factorization: Table generation / operations on matrix  Bag of words Approach: Field suggestion  Reg -ex based algorithm: Parsing through Lucene/ElasticSearch  Word similarity/ implicit algorithms: Keyword suggestion

Data Representation

Technology Stack

A wide variety of free and open sourced software tools
and libraries  Python programming & scripting language  Django Web framework  HTML5,CSS3 & jQuery  D3.js(for visualizations)  Twitter Bootstrap (Responsive UI)  ArXiv API  ElasticSearch & MongoDB  GNU/Linux  LaTeX  Git

Project Timeline

Deciding right algorithms for task Task 4 Understanding recommendation algorithms
Task 3 Data sources (Dblp,Arxiv) Task 2 Finding application area & deciding academic research Task 1 Dec’13 1st week Jan’14 2nd and 3rd week Jan 2014

Bug testing/ user feedback Task 8 Implementations & Web development
Task 7 Data structuring ,mining & analysis Task 6 Deciding on technology stack Task 5 - 28 April’14 2 Mar’14 to 19 April’14 1 Feb’14 to Mar’14 4th week Jan’14

Market Research Existing products in the market, like Google Scholar,
Microsoft Virtual Academy, Arxiv provide a way to search among the articles and rate them, but not recommend them.

• dblp.uni-trier.de • more than 2.3 million articles on computer
science in October 2013 • Developer: Alexander Weber • Alexa Rank: 8,715 (April 2014 • Arxiv.org • 939,001 e-prints in Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance and Statistics • Creator: Paul Ginsberg • Owner: Cornell Library • Submission rate is more than 7000 per month. • scholar.google.com • bibliographic database • Owner : Google Inc • High weight on citation counts • First search results are often highly cited articles • Google Scholar index includes most peer- reviewed online journals Comparison

Division of Work

Browser Interface Creation Code Refactoring and Docstring GitHub page creation
Entire Documentation & Market Research Front end  Module A

Data gathering & analysis Packaging & Testing Data structuring &
transformation Python & shell scripting Backend  Module B

Home Page

Results

Packaging & Testing

GitHub Page

Thank you! questions ? http://arcolife.github.io/scholarec

ScholaRec

ScholaRec

Archit Sharma

More Decks by Archit Sharma

Other Decks in Research

Featured

Transcript

Recommender System for scholarly articles Archit Sharma [email protected] http://work.arcolife.in

Introduction  Recommender systems represent user preferences for the purpose

What is Scholarec ?  Scholarec is a Recommender System

Features  Ability to search from a huge collection of

How ScholaRec works ?

Archive Dump Pdf to Text Keyword extraction User feedback rating

Flowchart of the Scholarec

Algorithms

Content based filtering  Recommendation after comparing items vs. user-profiles.

Other Algorithms used  Item based algorithm: Serves as the

Data Representation

Technology Stack

A wide variety of free and open sourced software tools

Project Timeline

Deciding right algorithms for task Task 4 Understanding recommendation algorithms

Bug testing/ user feedback Task 8 Implementations & Web development

Market Research Existing products in the market, like Google Scholar,

• dblp.uni-trier.de • more than 2.3 million articles on computer

Division of Work

Browser Interface Creation Code Refactoring and Docstring GitHub page creation

Data gathering & analysis Packaging & Testing Data structuring &

Demo

Home Page

Home Page

Results

Packaging & Testing

GitHub Page

Thank you! questions ? http://arcolife.github.io/scholarec