Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ScholaRec

 ScholaRec

Recommendation Engine for Scholarly Articles.

Archit Sharma

May 06, 2014
Tweet

More Decks by Archit Sharma

Other Decks in Research

Transcript

  1. Introduction  Recommender systems represent user preferences for the purpose

    of suggesting items to purchase or examine.  Through this project we have tried to address this problem by providing recommendation results by using latent information about the user's research interests that exists in their publication list.  The datasets can be used for other purposes such as classification, clustering, trend analysis.
  2. What is Scholarec ?  Scholarec is a Recommender System

    for Scientific Documents  It classifies documents and uses personalization features to suggest/recommend similar ones.
  3. Features  Ability to search from a huge collection of

    Articles, Reports and other scholarly works.  Seamless extension to current online repositories of Scholarly Articles  Robust Back-end search engine  Interactive User Interface  Personalization through OpenID and Oauth integration  Recommendations based on user's interests.
  4. Archive Dump Pdf to Text Keyword extraction User feedback rating

    and content based filtering Custom search algorithms Word Similarity Representation of recommendation
  5. Content based filtering  Recommendation after comparing items vs. user-profiles.

    Each item's content is a set of identifiers.  Content-based Filtering tries to estimate ratings for the user based on user's history. This is the generalization of the aggregation functions used for content based filtering.
  6. Other Algorithms used  Item based algorithm: Serves as the

    heart of recommendation  Tf-IDf algorithm: Searching purpose  Matrix factorization: Table generation / operations on matrix  Bag of words Approach: Field suggestion  Reg -ex based algorithm: Parsing through Lucene/ElasticSearch  Word similarity/ implicit algorithms: Keyword suggestion
  7. A wide variety of free and open sourced software tools

    and libraries  Python programming & scripting language  Django Web framework  HTML5,CSS3 & jQuery  D3.js(for visualizations)  Twitter Bootstrap (Responsive UI)  ArXiv API  ElasticSearch & MongoDB  GNU/Linux  LaTeX  Git
  8. Deciding right algorithms for task Task 4 Understanding recommendation algorithms

    Task 3 Data sources (Dblp,Arxiv) Task 2 Finding application area & deciding academic research Task 1 Dec’13 1st week Jan’14 2nd and 3rd week Jan 2014
  9. Bug testing/ user feedback Task 8 Implementations & Web development

    Task 7 Data structuring ,mining & analysis Task 6 Deciding on technology stack Task 5 - 28 April’14 2 Mar’14 to 19 April’14 1 Feb’14 to Mar’14 4th week Jan’14
  10. Market Research Existing products in the market, like Google Scholar,

    Microsoft Virtual Academy, Arxiv provide a way to search among the articles and rate them, but not recommend them.
  11. • dblp.uni-trier.de • more than 2.3 million articles on computer

    science in October 2013 • Developer: Alexander Weber • Alexa Rank: 8,715 (April 2014 • Arxiv.org • 939,001 e-prints in Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance and Statistics • Creator: Paul Ginsberg • Owner: Cornell Library • Submission rate is more than 7000 per month. • scholar.google.com • bibliographic database • Owner : Google Inc • High weight on citation counts • First search results are often highly cited articles • Google Scholar index includes most peer- reviewed online journals Comparison
  12. Browser Interface Creation Code Refactoring and Docstring GitHub page creation

    Entire Documentation & Market Research Front end  Module A
  13. Data gathering & analysis Packaging & Testing Data structuring &

    transformation Python & shell scripting Backend  Module B