Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Using Taxonomies for Product Recommendation

Using Taxonomies for Product Recommendation

In this work we take advantage of valuable information encoded in taxonomies to improve the quality of content-based recommender systems. We present three strategies that explore the use of taxonomies: (i) category descriptors, (ii) classification features and (iii) category filters. We provide a real-case study over the book domain, in which the recommendation target is a set of 100 news page from The New York Times and the items to be recommended are 1,499,792 books distributed in 1,621 category nodes from a taxonomy, both crawled from Amazon.com. In strategy (i), term descriptors of each category are combined with text descriptions of the books assigned to the category and terms that are representative of the category are added to the target page. In strategy (ii), categories that are strongly related to the target page are put together by a classifier that plays the role of a feature generator and these features are then used in the recommendation process. In strategy (iii), the output of the two strategies previously described are filtered so that only books from the same categories as the ones assigned to the target page are kept in it. We implement several methods that apply the three strategies individually and in combination. Experimental results indicate that our strategies can be successfully applied to improving traditional content-based recommender systems. In particular, when the target page is automatically assigned to a category, we obtain gains close to 13% in average precision. On the other hand, if such an assignment is made a priori, e.g., by the author or by a content editor, the gains are close to 20% in average precision.

Osvaldo Matos Júnior

October 18, 2012

More Decks by Osvaldo Matos Júnior

Other Decks in Research


  1. Using Taxonomies for Product Recommendation Osvaldo Matos-Junior1, Nivio Ziviani1, Fabiano

    Botelho1, Anísio Lacerda1, Altigran Silva2 and Marco Cristo2 1- Universidade Federal de Minas Gerais (UFMG) 2- Universidade Federal do Amazonas (UFAM)
  2. Representação Vetorial d = {w1, w2, w3, ..., wn} d

    = representação do documento (livro ou notícia) w = peso do termo no documento (TF-IDF) notícia = título + corpo do texto livro = título + autores + sinopse
  3. Abordagem simplista - baseline Ranking com livros mais similares. -

    Livros do topo correspondem à lista de recomendação Similaridade Vetorial
  4. Vocabulário Insuficiente Pousadas com desconto na América do Latina Descontos

    nos Estados Unidos da América A América para os latinos depois de Bush Catálogo de Hotéis no Brasil Conheça o México
  5. Contexto do Usuário Final da Liga: Barcelona vs Real Madrid

    Guia de baladas de Barcelona Os maiores pintores de Madrid Real Madrid, o time do século Puyol - a história do ídolo do Barcelona vs
  6. Descritores de Categoria Termos que se destacam nas categorias. Kullback-Leibler

    divergence (KLD) Pearson's Chi-Squared (CHI2) Dice's coefficient (DICE) Document Frequency (DF) Combined Measures: (ALL)
  7. Gabrilovich (2005) e Anagnostopoulos (2007) - combinar um novo espaço

    de características sim = palavras + novas características base de conhecimento = taxonomia conceitos = categorias Características de Classificação
  8. Características de Classificação dcat = {c1, c2, c3, ..., cn}

    vetor de palavras novas características dcat = nova representação do documento c = categoria na taxonomia de livros
  9. Julgamento Incompleto 1 0 0.2 0.4 0.6 0.8 1 0

    0.2 0.4 0.6 0.8 actual map inferred map 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 actual map bpref-10 bpref x infAP (Yilmas & Aslam, 2006)
  10. 1 0 0.2 0.4 0.6 0.8 0.9 0.4 0.5 0.6

    0.7 0.8 Revocação Precisão CLF-EC CLF-SE BOW 1 0 0.2 0.4 0.6 0.8 0.9 0.5 0.6 0.7 0.8 Revocação Precisão BOW CTF-1A CTF-5A CTF-10A CTF-M
  11. Ganhos em infAP Descritores de Categoria: - manual: 13,5% -

    automático: 7,5% Filtro de Categorias: - manual: 13,5% - automático: 7,5% Características de Classificação: - automático: 10,4%
  12. 1 0 0.25 0.5 0.75 1 0 0.25 0.5 0.75

    BOW HYBRID-M Q1 Q2 Q3 Q4 Impacto da Taxonomia
  13. Referências Gabrilovich, E. & Markovitch, S. (2005). Feature generation for

    text categorization using world knowledge. In Proceedings of the 19th International Joint Conference on Artificial Intelligence, volume 19, pp. 1048– 1053. Anagnostopoulos, A.; Broder, A. Z.; Gabrilovich, E.; Josifovski, V. & Riedel, L. (2007). Just-in-time Contextual Advertising. In Proceedings of the Sixteenth ACM Confe- rence on Information and Knowledge Management, pp. 331–340. Carpineto, C. & Romano, G. (1999). Towards more effective techniques for automatic query expansion. Proceedings of the Third European Conference on Research and Advanced Technology for Digital Libraries, pp. 126–141. Yilmaz, E. & Aslam, J. (2006). Estimating average precision with incomplete and imperfect judgments. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management, pp. 102–111.