Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Multiple Ways of Building a Recommender System with Elasticsearch - Elastic Meetup Switzerland - Andrii Vozniuk

Multiple Ways of Building a Recommender System with Elasticsearch - Elastic Meetup Switzerland - Andrii Vozniuk

My talk at the Swiss Elastic Meetup #20: https://www.meetup.com/elasticsearch-switzerland/events/237184939/

Elasticsearch (ES) is commonly known as a search and analytics engine. At the same time, information retrieval techniques available in ES can be used to deliver additional value to the users by providing recommendations.

In my talk, I show how to employ ES to obtain various types of recommendations. We consider basic content-based techniques as well as hybrid ones involving automatic user interests identification. Considering the example of our web app Graasp (http://graasp.net), I give ideas how recommendations can be integrated into your product.

Andrii Vozniuk

March 29, 2017
Tweet

More Decks by Andrii Vozniuk

Other Decks in Programming

Transcript

  1. The copyright of images belongs to their authors. Drop me

    a message at [email protected] to remove Talk description: https://www.meetup.com/elasticsearch-switzerland/events/237184939/ MULTIPLE WAYS OF BUILDING A RECOMMENDER SYSTEM WITH ELASTICSEARCH ANDRII VOZNIUK REACT-EPFL Elastic Meetup Lausanne, March 2017 1
  2. WHY RECOMMENDATIONS • Increase engagement • Address information overload •

    Improve information findability • Not aware of its existence • Do not know particular keywords • New content appearing • Facilitate discovery of relevant content • Not only search or tags 3
  3. TYPES OF RECOMMENDERS Content-based 4 Collaborative filtering Hybrid approaches recommend

    interacts similar interacts recommend interacts interacts similar
  4. A COLLABORATIVE KNOWLEDGE SHARING ENVIRONMENT graasp.net GRAASP 5 A SOCIAL

    MEDIA PLATFORM AN ADVANCED CONTENT MANAGEMENT SYSTEM
  5. GRAASP IS A MEAN WEB APP M MongoDB E Express.js

    A AngularJs N Node.js Front-end mongoose express Server Database 6
  6. GOALS • Provide contextually relevant recommendations • Should work for

    individual items and for spaces (collections of items) • Will allow the user to discover contextually relevant content items or users 10
  7. BRINGING DATA TO ELASTICSEARCH Front-end mongoose express Server Database mongoosastic

    mongoosastic is a mongoose plugin updating ES on mongoose events 11
  8. ELASTICSEARCH COMPUTING RELEVANCE 12 STEP 1. Represent each content item

    using the document vector model STEP 0. Compute TF-IDF for each term in the vectors STEP 2. Use vector cosine similarity for scoring and ranking
  9. ELASTICSEARCH RELEVANCE, VISUALLY 13 Source: https://www.elastic.co/guide/en/elasticsearch/guide/current/scoring-theory.html Query: happy hippopotamus 1.

    I am happy in summer. 2. After Christmas I’m a hippopotamus. 3. The happy hippopotamus helped Harry Three documents 1.Document 1: (happy,____________)—[2,0] 2.Document 2: ( ___ ,hippopotamus)—[0,5] 3.Document 3: (happy,hippopotamus)—[2,5] TFIDF
  10. ELASTICSEARCH
 MORE LIKE THIS (MLT) QUERY 14 Source: More Like

    This Query https://www.elastic.co/guide/en/elasticsearch/reference/2.0/query-dsl-mlt-query.html Text-based Can be a combination of both Document Id-based “The MLT query simply extracts the text from the input document, analyzes it, usually using the same analyzer at the field, then selects the top K terms with highest tf-idf to form a disjunctive query of these terms.”
  11. ELASTICSEARCH
 MORE LIKE THIS (MLT) LIMITATIONS 15 Source: Lucene MoreLikeThis.java

    • Earlier, in 2016 when the doc id is supplied, the text content was concatenated, the search was done over all specified fields • No way to boost individual fields. Matching on title can be more important than on content • Now, the query is done field-by- field. Cannot boost, or match desc field with the content field. • We wanted to do cross-field matching with boosting
  12. 16 USING SEARCH FOR RECOMMENDATIONS Decided to concat fields manually

    and use the match query +can boost fields +can do cross-field matching +can do cross-type matching - slower
  13. GOALS • Recommendations matching the user interests rather than the

    context • The user should understand the recommender model (interpretability) • The user should be able to adjust the recommender (interactive) • In general, we wanted the user to understand and control the recommendations when needed 18
  14. PROPOSAL RECOMMENDATION MODEL Provide
 Recommendations Record
 User- Content
 Interactions Extract

    Concepts from the
 Content Build
 User
 Interests
 Profile Interpretable Interactive 19
  15. CONCEPT IDENTIFICATION PIPELINE 20 Extracted Text Content Items on platform

    Binary Text File .pdf .docx Image with text .png .jpg .tiff Image Audio Video Content Extraction Plain Text File Optical Character Recognition Speech-To- Text Visual Image Recognition Visual Video Recognition Content Analysis Content and Concepts Indexing Identified Concepts Indexed Identified Concepts and Text Content Recommender System Leptonica Tesseract
  16. Pdf Report Powerpoint Presentation Image with Text Youtube Video Σw*UA

    *DC accessed rated commented downloaded Education Educational psychology Knowledge Learning Knowledge Management Human-Computer Interaction Interdisciplinarity Academia Systems thinking Scientific method Educational technology Virtual learning environment User Identified Concepts (DC) Identified User Concepts (UC) Tracked Activities (UA) Education Educational psychology Knowledge Learning Knowledge Management Systems thinking Scientific method Educational technology Virtual learning environment Learning Knowledge Management Human-Computer Interaction Interdisciplinarity Education Educational psychology Academia 21 PROPOSAL INTERESTS PROFILE
  17. SUMMARY 24 DEMONSTRATED HOW TO USE ELASTICSEARCH FOR • Contextual

    recommendations (relevant to the context) • Personalized recommendations (relevant to the user) • More LikeThis vs Common queries (e.g., match) POSSIBLE EXTENSIONS • Displaying highlights to explain the recommendations • Using the Percolator to notify the user about new relevant content as it gets uploaded • Alternative ways of constructing the user profile • Trying collaborative filtering, user-user similarity can be implemented with Elasticsearch