Collaborative Topic Models for Users and Texts

Collaborative Topic Models for Users and Texts Chong Wang Work
done when at Princeton Univ. with David Blei Current Afﬁlication: Baidu AI Lab Mar 26, 2015 Some materials were adapted from David Blei’s slides Chong Wang Collaborative Topic Models for Users and Texts 1

Outline Introduction to Topic Modeling Collaborative Topic Models for Users
and Texts An Interactive Demonstration Chong Wang Collaborative Topic Models for Users and Texts 2

Introduction to Topic Modeling Chong Wang Collaborative Topic Models for
Users and Texts 3

Lots of data Chong Wang Collaborative Topic Models for Users
and Texts 4

Topic modeling ORGANIZE SUMMARIZE DISCOVER VISUALIZE Chong Wang Collaborative Topic
Models for Users and Texts 5

Some of My Work related to Topic Modeling Chong Wang
Collaborative Topic Models for Users and Texts 6

Text data—hierarchical organization
(Wang & Blei, NIPS 2009) (Paisley, Wang, Blei & Jordan, PAMI 2014) Chong Wang Collaborative Topic Models for Users and Texts 7

Image data—classiﬁcation and annotation (Wang, Blei & Fei-Fei, CVPR 2009)
class: snowboarding annotations: skier, ski, tree, water, boat, building, sky, residential area predicted class: snowboarding predicted annotations: athlete, sky, tree, water, plant, ski, skier (b) of our model. Nodes represent random variables; edges denote possible depende d structure. Note that in this model, the image class c and image annotation wm Chong Wang Collaborative Topic Models for Users and Texts 8

Usage data—document/music recommendation (Wang & Blei, KDD 2011) Document recommendation
(Weston,Wang, Weiss & Berenzweig, ICML 2012) Music recommendation Chong Wang Collaborative Topic Models for Users and Texts 9

Network data—community detection (Gopalan, Wang & Blei, NIPS 2013) BARABASI,
A JEONG, H NEWMAN, M SOLE, R PASTORSATORRAS, R HOLME, P NETSCIENCE COLLABORATION NETWORK POLITICAL BLOG NETWORK AMP MMSB AMP Figure 1: We visualize the discovered community structure and node popularities in a giant component of the netscience collaboration network [22] (Left). Each link denotes a collaboration between two authors, colored Chong Wang Collaborative Topic Models for Users and Texts 10

Topic Modeling Basics Chong Wang Collaborative Topic Models for Users
and Texts 11

Topic modeling Documents exhibit multiple topics (themes). gene 0.04 dna
0.02 genetic 0.01 … gene 0.04 dna 0.02 genetic 0.01 .,, life 0.02 evolve 0.01 organism 0.01 .,, brain 0.04 neuron 0.02 nerve 0.01 ... data 0.02 number 0.02 computer 0.01 .,, Topics Documents Topic proportions and assignments Figure 1: The intuitions behind latent Dirichlet allocation. We assume that some number of “topics,” which are distributions over words, exist for the whole collection (far left). Each document is assumed to be generated as follows. First choose a distribution over the topics (the histogram at right); then, for each word, choose a topic assignment (the colored coins) and choose the word from the corresponding topic. The topics and topic assignments life 0.02 evolve 0.01 organism 0.01 … data 0.02 number 0.02 computer 0.01 … Topics Documents Topic proportions Chong Wang Collaborative Topic Models for Users and Texts 12

Latent Dirichlet allocation (LDA), Blei, et al., 2003 b K
Bayesian model inference ….. input output system ….. cortex cortical areas ….. Topics Topic proportions Chong Wang Collaborative Topic Models for Users and Texts 13

Latent Dirichlet allocation (LDA), Blei, et al., 2003 b a
✓ K Bayesian model inference ….. input output system ….. cortex cortical areas ….. Topics Topic proportions D Chong Wang Collaborative Topic Models for Users and Texts 13

Latent Dirichlet allocation (LDA), Blei, et al., 2003 z w
b N a ✓ K Bayesian model inference ….. input output system ….. cortex cortical areas ….. Topics Topic proportions Topic assignments D Chong Wang Collaborative Topic Models for Users and Texts 13

LDA model: inference z w b N a ✓ K
Topics Topic proportions Topic assignments D Chong Wang Collaborative Topic Models for Users and Texts 14

LDA model: inference z w b N a ✓ K
Bayesian model inference ….. input output system ….. cortex cortical areas ….. Topics Topic proportions Topic assignments D Chong Wang Collaborative Topic Models for Users and Texts 14

Example: a 200-topic LDA model Data: article titles+abstracts from CiteUlike.
16,980 articles 1.6M words 8K unique terms Chong Wang Collaborative Topic Models for Users and Texts 15

Learned topics gene genes expression tissues regulation coexpression tissuespecific expressed
tissue regulatory nodes wireless protocol routing protocols node sensor peertopeer scalable hoc distribution random probability distributions sampling stochastic markov density estimation statistics learning machine training vector learn machines kernel learned classifiers classifier wireless gene probability classifier Chong Wang Collaborative Topic Models for Users and Texts 16

Learned topic proportions for one article relative importance give original
respect obtain ranking large small numbers larger extremely amounts smaller web semantic pages page metadata standards rdf xml topic proportions Chong Wang Collaborative Topic Models for Users and Texts 17

Learned topic proportions for another one estimate estimates likelihood maximum
estimated missing distribution random probability distributions sampling algorithm signal input signals output exact performs topic proportions Maximum Likelihood from Incomplete Data via the EM Algorithm By A. P. DEMPSTER, N. M. LAIRD and D. B. RDIN Harvard University and Educational Testing Service [Read before the ROYAL STATISTICAL at a meeting organized by the RESEARCH SOCIETY SECTION on Wednesday, December 8th, 1976, Professor S. D. SILVEY in the Chair] A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data is presented at various levels of generality. Theory showing the monotone behaviour of the likelihood and convergence of the algorithm is derived. Many examples are sketched, including missing value situations, applications to grouped, censored or truncated data, finite mixture models, variance component estimation, hyperparameter estimation, iteratively reweighted least squares and factor analysis. Keywords : MAXIMUM LIKELIHOOD ;INCOMPLETE DATA ;EM ALGORITHM ;POSTERIOR MODE 1. INTRODUCTION THIS paper presents a general approach to iterative computation of maximum-likelihood estimates when the observations can be viewed as incomplete data. Since each iteration of the algorithm consists of an expectation step followed by a maximization step we call it the EM algorithm. The EM process is remarkable in part because of the simplicity and generality of the associated theory, and in part because of the wide range of examples which fall under its umbrella. When the underlying complete data come from an exponential family whose maximum-likelihood estimates are easily computed, then each maximization step of an EM algorithm is likewise easily computed. Chong Wang Collaborative Topic Models for Users and Texts 18

Collaborative Topic Models for Users and Texts Chong Wang Collaborative
Topic Models for Users and Texts 19

People read documents These user-text data tell us how people
read documents. Chong Wang Collaborative Topic Models for Users and Texts 20

People read documents These user-text data tell us how people
read documents. Given these data, we hope to Help people ﬁnd documents that they are interested in Learn about what the documents mean to the people who read them Learn about the people reading the documents. Chong Wang Collaborative Topic Models for Users and Texts 20

CiteULike Chong Wang Collaborative Topic Models for Users and Texts
21

Mendeley Chong Wang Collaborative Topic Models for Users and Texts
22

Collaborative topic models (Wang & Blei, KDD 2011) STATS VISION
Maximum Likelihood from Incomplete Data via the EM Algorithm By A. P. DEMPSTER, N. M. LAIRD and D. B. RDIN Harvard University and Educational Testing Service [Read before the ROYAL STATISTICAL at a meeting organized by the RESEARCH SOCIETY SECTION on Wednesday, December 8th, 1976, Professor S. D. SILVEY in the Chair] A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data is presented at various levels of generality. Theory showing the monotone behaviour of the likelihood and convergence of the algorithm is derived. Many examples are sketched, including missing value situations, applications to grouped, censored or truncated data, finite mixture models, variance component estimation, hyperparameter estimation, iteratively reweighted least squares and factor analysis. Keywords : MAXIMUM LIKELIHOOD ;INCOMPLETE DATA ;EM ALGORITHM ;POSTERIOR MODE 1. INTRODUCTION THIS paper presents a general approach to iterative computation of maximum-likelihood estimates when the observations can be viewed as incomplete data. Since each iteration of the algorithm consists of an expectation step followed by a maximization step we call it the EM algorithm. The EM process is remarkable in part because of the simplicity and generality of the associated theory, and in part because of the wide range of examples which fall under its umbrella. When the underlying complete data come from an exponential family whose maximum-likelihood estimates are easily computed, then each maximization step of an EM algorithm is likewise easily computed. The term "incomplete data" in its general form implies the existence of two sample spaces %Y and X and a many-one mapping from3 to Y. The observed data y are a realization from C Y . The corresponding x in X is not observed directly, but only indirectly through y. More specifically, we assume there is a mapping x+ y(x) from X to Y, and that x is known only to lie in X(y), the subset of X determined by the equation y = y(x), where y is the observed data. We refer to x as the complete data even though in certain examples x includes what are traditionally called parameters. We postulate a family of sampling densities f(x I +) depending on parameters and derive its corresponding family of sampling densities g(y[+). The complete-data specification f(...1 ...) is related to the incomplete-data specification g( ...I ...) by (1.1) The EM algorithm is directed at finding a value of + which maximizes g(y 1 +) g'iven an observed y, but it does so by making essential use of the associated family f(xl+). Notice that given the incomplete-data specification g(y1 +), there are many possible complete-data specificationsf(x)+) that will generate g(y 1 +). Sometimes a natural choice will be obvious, at other times there may be several different ways of defining the associated f(xl+). Each iteration of the EM algorithm involves two steps which we call the expectation step (E-step) and the maximization step (M-step). The precise definitions of these steps, and their associated heuristic interpretations, are given in Section 2 for successively more general types of models. Here we shall present only a simple numerical example to give the flavour of the method. The EM paper Chong Wang Collaborative Topic Models for Users and Texts 23

Maximum Likelihood from Incomplete Data via the EM Algorithm By A. P. DEMPSTER, N. M. LAIRD and D. B. RDIN Harvard University and Educational Testing Service [Read before the ROYAL STATISTICAL at a meeting organized by the RESEARCH SOCIETY SECTION on Wednesday, December 8th, 1976, Professor S. D. SILVEY in the Chair] A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data is presented at various levels of generality. Theory showing the monotone behaviour of the likelihood and convergence of the algorithm is derived. Many examples are sketched, including missing value situations, applications to grouped, censored or truncated data, finite mixture models, variance component estimation, hyperparameter estimation, iteratively reweighted least squares and factor analysis. Keywords : MAXIMUM LIKELIHOOD ;INCOMPLETE DATA ;EM ALGORITHM ;POSTERIOR MODE 1. INTRODUCTION THIS paper presents a general approach to iterative computation of maximum-likelihood estimates when the observations can be viewed as incomplete data. Since each iteration of the algorithm consists of an expectation step followed by a maximization step we call it the EM algorithm. The EM process is remarkable in part because of the simplicity and generality of the associated theory, and in part because of the wide range of examples which fall under its umbrella. When the underlying complete data come from an exponential family whose maximum-likelihood estimates are easily computed, then each maximization step of an EM algorithm is likewise easily computed. The term "incomplete data" in its general form implies the existence of two sample spaces %Y and X and a many-one mapping from3 to Y. The observed data y are a realization from C Y . The corresponding x in X is not observed directly, but only indirectly through y. More specifically, we assume there is a mapping x+ y(x) from X to Y, and that x is known only to lie in X(y), the subset of X determined by the equation y = y(x), where y is the observed data. We refer to x as the complete data even though in certain examples x includes what are traditionally called parameters. We postulate a family of sampling densities f(x I +) depending on parameters and derive its corresponding family of sampling densities g(y[+). The complete-data specification f(...1 ...) is related to the incomplete-data specification g( ...I ...) by (1.1) The EM algorithm is directed at finding a value of + which maximizes g(y 1 +) g'iven an observed y, but it does so by making essential use of the associated family f(xl+). Notice that given the incomplete-data specification g(y1 +), there are many possible complete-data specificationsf(x)+) that will generate g(y 1 +). Sometimes a natural choice will be obvious, at other times there may be several different ways of defining the associated f(xl+). Each iteration of the EM algorithm involves two steps which we call the expectation step (E-step) and the maximization step (M-step). The precise definitions of these steps, and their associated heuristic interpretations, are given in Section 2 for successively more general types of models. Here we shall present only a simple numerical example to give the flavour of the method. Recommend to STATS people The EM paper Chong Wang Collaborative Topic Models for Users and Texts 23

STATS VISION Maximum Likelihood from Incomplete Data via the EM Algorithm By A. P. DEMPSTER, N. M. LAIRD and D. B. RDIN Harvard University and Educational Testing Service [Read before the ROYAL STATISTICAL at a meeting organized by the RESEARCH SOCIETY SECTION on Wednesday, December 8th, 1976, Professor S. D. SILVEY in the Chair] A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data is presented at various levels of generality. Theory showing the monotone behaviour of the likelihood and convergence of the algorithm is derived. Many examples are sketched, including missing value situations, applications to grouped, censored or truncated data, finite mixture models, variance component estimation, hyperparameter estimation, iteratively reweighted least squares and factor analysis. Keywords : MAXIMUM LIKELIHOOD ;INCOMPLETE DATA ;EM ALGORITHM ;POSTERIOR MODE 1. INTRODUCTION THIS paper presents a general approach to iterative computation of maximum-likelihood estimates when the observations can be viewed as incomplete data. Since each iteration of the algorithm consists of an expectation step followed by a maximization step we call it the EM algorithm. The EM process is remarkable in part because of the simplicity and generality of the associated theory, and in part because of the wide range of examples which fall under its umbrella. When the underlying complete data come from an exponential family whose maximum-likelihood estimates are easily computed, then each maximization step of an EM algorithm is likewise easily computed. The term "incomplete data" in its general form implies the existence of two sample spaces %Y and X and a many-one mapping from3 to Y. The observed data y are a realization from C Y . The corresponding x in X is not observed directly, but only indirectly through y. More specifically, we assume there is a mapping x+ y(x) from X to Y, and that x is known only to lie in X(y), the subset of X determined by the equation y = y(x), where y is the observed data. We refer to x as the complete data even though in certain examples x includes what are traditionally called parameters. We postulate a family of sampling densities f(x I +) depending on parameters and derive its corresponding family of sampling densities g(y[+). The complete-data specification f(...1 ...) is related to the incomplete-data specification g( ...I ...) by (1.1) The EM algorithm is directed at finding a value of + which maximizes g(y 1 +) g'iven an observed y, but it does so by making essential use of the associated family f(xl+). Notice that given the incomplete-data specification g(y1 +), there are many possible complete-data specificationsf(x)+) that will generate g(y 1 +). Sometimes a natural choice will be obvious, at other times there may be several different ways of defining the associated f(xl+). Each iteration of the EM algorithm involves two steps which we call the expectation step (E-step) and the maximization step (M-step). The precise definitions of these steps, and their associated heuristic interpretations, are given in Section 2 for successively more general types of models. Here we shall present only a simple numerical example to give the flavour of the method. Recommend to STATS people The EM paper Chong Wang Collaborative Topic Models for Users and Texts 23

STATS VISION Maximum Likelihood from Incomplete Data via the EM Algorithm By A. P. DEMPSTER, N. M. LAIRD and D. B. RDIN Harvard University and Educational Testing Service [Read before the ROYAL STATISTICAL at a meeting organized by the RESEARCH SOCIETY SECTION on Wednesday, December 8th, 1976, Professor S. D. SILVEY in the Chair] A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data is presented at various levels of generality. Theory showing the monotone behaviour of the likelihood and convergence of the algorithm is derived. Many examples are sketched, including missing value situations, applications to grouped, censored or truncated data, finite mixture models, variance component estimation, hyperparameter estimation, iteratively reweighted least squares and factor analysis. Keywords : MAXIMUM LIKELIHOOD ;INCOMPLETE DATA ;EM ALGORITHM ;POSTERIOR MODE 1. INTRODUCTION THIS paper presents a general approach to iterative computation of maximum-likelihood estimates when the observations can be viewed as incomplete data. Since each iteration of the algorithm consists of an expectation step followed by a maximization step we call it the EM algorithm. The EM process is remarkable in part because of the simplicity and generality of the associated theory, and in part because of the wide range of examples which fall under its umbrella. When the underlying complete data come from an exponential family whose maximum-likelihood estimates are easily computed, then each maximization step of an EM algorithm is likewise easily computed. The term "incomplete data" in its general form implies the existence of two sample spaces %Y and X and a many-one mapping from3 to Y. The observed data y are a realization from C Y . The corresponding x in X is not observed directly, but only indirectly through y. More specifically, we assume there is a mapping x+ y(x) from X to Y, and that x is known only to lie in X(y), the subset of X determined by the equation y = y(x), where y is the observed data. We refer to x as the complete data even though in certain examples x includes what are traditionally called parameters. We postulate a family of sampling densities f(x I +) depending on parameters and derive its corresponding family of sampling densities g(y[+). The complete-data specification f(...1 ...) is related to the incomplete-data specification g( ...I ...) by (1.1) The EM algorithm is directed at finding a value of + which maximizes g(y 1 +) g'iven an observed y, but it does so by making essential use of the associated family f(xl+). Notice that given the incomplete-data specification g(y1 +), there are many possible complete-data specificationsf(x)+) that will generate g(y 1 +). Sometimes a natural choice will be obvious, at other times there may be several different ways of defining the associated f(xl+). Each iteration of the EM algorithm involves two steps which we call the expectation step (E-step) and the maximization step (M-step). The precise definitions of these steps, and their associated heuristic interpretations, are given in Section 2 for successively more general types of models. Here we shall present only a simple numerical example to give the flavour of the method. Recommend to STATS people Recommend to both STATS and VISION people The EM paper Chong Wang Collaborative Topic Models for Users and Texts 23

Collaborative topic models STATS VISION STATS VISION Topic proportions +
Corrections = Article representation The user behavior changes the way we should look at the data. Chong Wang Collaborative Topic Models for Users and Texts 24

Collaborative topic models z w b N a ✓ I
J ✏ u r v u K Topic proportions Corrections User preferences Document words Ratings Topic Modeling Matrix Factorization Topics C. Wang and D. Blei, KDD 2011 P. Gopalan, L. Charlin and D. Blei, NIPS 2014 (a better formulation) Chong Wang Collaborative Topic Models for Users and Texts 25

The data From citeulike.org 5.5K users and 17K research articles
with abstracts From mendeley.com 80K users and 261K research articles with abstracts Chong Wang Collaborative Topic Models for Users and Texts 26

Two types of recommendations Users Articles Maximum likelihood from incomplete
data via the EM algorithm Conditional random fields Introduction to variational methods for graphical models The mathematics of statistical machine translation Your new article In-matrix prediction Out-of-matrix prediction Chong Wang Collaborative Topic Models for Users and Texts 27

Recommendation performance—CiteULike in−matrix out−of−matrix 0.2 0.4 0.6 0.8 q q
q q q q q q q q q q q q q q q q q q 50 100 150 200 50 100 150 200 number of recommended articles recall method q CoTM LDA MF Chong Wang Collaborative Topic Models for Users and Texts 28

Recommendation performance—Mendeley in−matrix out−of−matrix 0.00 0.05 0.10 0.15 0.20 0.25
q q q q q q q q q q q q q q q q q q q q 50 100 150 200 50 100 150 200 number of recommended articles recall method q CoTM LDA MF Chong Wang Collaborative Topic Models for Users and Texts 29

More than recommendation Maximum Likelihood from Incomplete Data via the
EM Algorithm By A. P. DEMPSTER, N. M. LAIRD and D. B. RDIN Harvard University and Educational Testing Service [Read before the ROYAL STATISTICAL at a meeting organized by the RESEARCH SOCIETY SECTION on Wednesday, December 8th, 1976, Professor S. D. SILVEY in the Chair] A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data is presented at various levels of generality. Theory showing the monotone behaviour of the likelihood and convergence of the algorithm is derived. Many examples are sketched, including missing value situations, applications to grouped, censored or truncated data, finite mixture models, variance component estimation, hyperparameter estimation, iteratively reweighted least squares and factor analysis. Keywords : MAXIMUM LIKELIHOOD ;INCOMPLETE DATA ;EM ALGORITHM ;POSTERIOR MODE 1. INTRODUCTION THIS paper presents a general approach to iterative computation of maximum-likelihood Chong Wang Collaborative Topic Models for Users and Texts 30

Before users read it, CiteULike 0.0 0.1 0.2 0.3 0
100 200 300 400 500 Topic Weight estimation, likelihood, maximum, parameters, methods, estimators algorithm, algorithms, optimization, problem, efﬁcient, problems Chong Wang Collaborative Topic Models for Users and Texts 31

After users read it, CiteULike 0.0 0.1 0.2 0.3 0
100 200 300 400 500 Topic Weight estimation, likelihood, maximum, parameters, methods, estimators algorithm, algorithms, optimization, problem, efﬁcient, problems image, images, segmentation, algorithm, registration, camera bayesian, model, inference, models, probability, probabilistic Chong Wang Collaborative Topic Models for Users and Texts 32

Another article from CiteULike Phase-of-ﬁring coding of natural visual stimuli
in primary visual cortex. Topic Weight 0.0 0.1 0.2 0.3 0.4 0.5 • • • • • • • • • • • • • • • • • • • • • • • • •• • • • • • • 0 50 100 150 200 neurons, responses, neuronal, spike, cortical, stimuli, stimulus Chong Wang Collaborative Topic Models for Users and Texts 33

More than recommendation Users Articles Maximum likelihood from incomplete data
via the EM algorithm Conditional Random Fields Introduction to Variational Methods for Graphical Models The Mathematics of Statistical Machine Translation We can look at posterior estimates to find Widely read articles in a field Articles in a field that are widely read in other fields Articles from other fields that are widely read in a field These are possible through interpretable latent topics. Chong Wang Collaborative Topic Models for Users and Texts 34

Topic: Maximum Likelihood Topic estimates, likelihood, maximum, parameters, method about
this topic, popular in this topic Maximum Likelihood Estimation of Population Parameters Bootstrap Methods: Another Look at the Jackknife R. A. Fisher and the Making of Maximum Likelihood about this topic, popular in other topics Maximum Likelihood from Incomplete Data with the EM Algorithm Bootstrap Methods: Another Look at the Jackknife Tutorial on Maximum Likelihood Estimation NOT about this topic, popular in this topic Random Forests Identiﬁcation of Causal Effects Using Instrumental Variables Matrix Computations Chong Wang Collaborative Topic Models for Users and Texts 35

Topic: Network science Topic networks, topology, connected, nodes, links, degree
about this topic, popular in this topic Assortative Mixing in Networks Characterizing the Dynamical Importance of Network Nodes and Links Subgraph Centrality in Complex Networks about this topic, popular in other topics Assortative Mixing in Networks The Structure and Function of Complex Networks Statistical Mechanics of Complex Networks NOT about this topic, popular in this topic Power Law Distributions in Empirical Data Graph Structure in the Web The Orgins of Bursts and Heavy Tails in Human Dynamics Chong Wang Collaborative Topic Models for Users and Texts 36

The “corrections” phenomenon is not alone (Neiswanger, Wang, Ho &
Xing, UAI 2014) Initial Topics built, side, large, design italy, italian, china, russian church, christ, jesus, god 0 20 40 60 80 100 120 140 160 0.05 0.00 0.05 0.10 0.15 0.20 0.25 church, christ, jesus, god Topics after Random Offsets built, side, large, design english, knight, translated, restoration italy, italian, china, russian 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Italy italy, italian, china, russian building, built, tower, architecture built, side, large, design church, christ, jesus, god built, side, large, design church, christ, jesus, god Offsets Learned from Links (Random Offsets) english, knight, translated, restoration 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Chapel built, side, large, design church, christ, jesus, god building, built, tower, architecture Text: "The Sistine Chapel is a large chapel in the Vatican Palace, the place in Italy where the Pope lives. The Chapel was built between 1473 and 1481 by Giovanni dei Dolci for Pope Sistus IV...The Sistine Chapel is famous for its fresco paintings by the Renaissance painter Michelangelo..." Sistine Chapel (Simple English Wikipedia) In-Links (Citing Documents): (1) Raphael, (2) Ten Commandments, (3) Chapel, (4) Apostolic Palace, (5) St. Peter's Basilica Predicted Links: (1) Chapel, (2) Christian, (3) Italy 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Christian philosophy, ideas, study, knowledge people, make, place, live church, christ, jesus, god Similar observation in citation networks. Chong Wang Collaborative Topic Models for Users and Texts 37

Summary Collaborative topic models : Blend content-based and rating-based recommendations
Discover patterns in how people read / how documents are read Suggest new ways of doing document recommendations Chong Wang Collaborative Topic Models for Users and Texts 38

Demo Demo Chong Wang Collaborative Topic Models for Users and
Texts 39

Thank you! BTW: Baidu AI lab is hiring research scientists
and software engineers! [email protected] Chong Wang Collaborative Topic Models for Users and Texts 40

Collaborative Topic Models for Users and Texts

Collaborative Topic Models for Users and Texts

More Decks by Hakka Labs

Other Decks in Research

Featured

Transcript