Slide 1

Slide 1 text

Collaborative Topic Models for Users and Texts Chong Wang Work done when at Princeton Univ. with David Blei Current Affilication: Baidu AI Lab Mar 26, 2015 Some materials were adapted from David Blei’s slides Chong Wang Collaborative Topic Models for Users and Texts 1

Slide 2

Slide 2 text

Outline Introduction to Topic Modeling Collaborative Topic Models for Users and Texts An Interactive Demonstration Chong Wang Collaborative Topic Models for Users and Texts 2

Slide 3

Slide 3 text

Introduction to Topic Modeling Chong Wang Collaborative Topic Models for Users and Texts 3

Slide 4

Slide 4 text

Lots of data Chong Wang Collaborative Topic Models for Users and Texts 4

Slide 5

Slide 5 text

Topic modeling ORGANIZE SUMMARIZE DISCOVER VISUALIZE Chong Wang Collaborative Topic Models for Users and Texts 5

Slide 6

Slide 6 text

Some of My Work related to Topic Modeling Chong Wang Collaborative Topic Models for Users and Texts 6

Slide 7

Slide 7 text

Text data—hierarchical organization (Wang & Blei, NIPS 2009) (Paisley, Wang, Blei & Jordan, PAMI 2014) Chong Wang Collaborative Topic Models for Users and Texts 7

Slide 8

Slide 8 text

Image data—classification and annotation (Wang, Blei & Fei-Fei, CVPR 2009) class: snowboarding annotations: skier, ski, tree, water, boat, building, sky, residential area predicted class: snowboarding predicted annotations: athlete, sky, tree, water, plant, ski, skier (b) of our model. Nodes represent random variables; edges denote possible depende d structure. Note that in this model, the image class c and image annotation wm Chong Wang Collaborative Topic Models for Users and Texts 8

Slide 9

Slide 9 text

Usage data—document/music recommendation (Wang & Blei, KDD 2011) Document recommendation (Weston,Wang, Weiss & Berenzweig, ICML 2012) Music recommendation Chong Wang Collaborative Topic Models for Users and Texts 9

Slide 10

Slide 10 text

Network data—community detection (Gopalan, Wang & Blei, NIPS 2013) BARABASI, A JEONG, H NEWMAN, M SOLE, R PASTORSATORRAS, R HOLME, P NETSCIENCE COLLABORATION NETWORK POLITICAL BLOG NETWORK AMP MMSB AMP Figure 1: We visualize the discovered community structure and node popularities in a giant component of the netscience collaboration network [22] (Left). Each link denotes a collaboration between two authors, colored Chong Wang Collaborative Topic Models for Users and Texts 10

Slide 11

Slide 11 text

Topic Modeling Basics Chong Wang Collaborative Topic Models for Users and Texts 11

Slide 12

Slide 12 text

Topic modeling Documents exhibit multiple topics (themes). gene 0.04 dna 0.02 genetic 0.01 … gene 0.04 dna 0.02 genetic 0.01 .,, life 0.02 evolve 0.01 organism 0.01 .,, brain 0.04 neuron 0.02 nerve 0.01 ... data 0.02 number 0.02 computer 0.01 .,, Topics Documents Topic proportions and assignments Figure 1: The intuitions behind latent Dirichlet allocation. We assume that some number of “topics,” which are distributions over words, exist for the whole collection (far left). Each document is assumed to be generated as follows. First choose a distribution over the topics (the histogram at right); then, for each word, choose a topic assignment (the colored coins) and choose the word from the corresponding topic. The topics and topic assignments life 0.02 evolve 0.01 organism 0.01 … data 0.02 number 0.02 computer 0.01 … Topics Documents Topic proportions Chong Wang Collaborative Topic Models for Users and Texts 12

Slide 13

Slide 13 text

Latent Dirichlet allocation (LDA), Blei, et al., 2003 b K Bayesian model inference ….. input output system ….. cortex cortical areas ….. Topics Topic proportions Chong Wang Collaborative Topic Models for Users and Texts 13

Slide 14

Slide 14 text

Latent Dirichlet allocation (LDA), Blei, et al., 2003 b a ✓ K Bayesian model inference ….. input output system ….. cortex cortical areas ….. Topics Topic proportions D Chong Wang Collaborative Topic Models for Users and Texts 13

Slide 15

Slide 15 text

Latent Dirichlet allocation (LDA), Blei, et al., 2003 z w b N a ✓ K Bayesian model inference ….. input output system ….. cortex cortical areas ….. Topics Topic proportions Topic assignments D Chong Wang Collaborative Topic Models for Users and Texts 13

Slide 16

Slide 16 text

LDA model: inference z w b N a ✓ K Topics Topic proportions Topic assignments D Chong Wang Collaborative Topic Models for Users and Texts 14

Slide 17

Slide 17 text

LDA model: inference z w b N a ✓ K Bayesian model inference ….. input output system ….. cortex cortical areas ….. Topics Topic proportions Topic assignments D Chong Wang Collaborative Topic Models for Users and Texts 14

Slide 18

Slide 18 text

Example: a 200-topic LDA model Data: article titles+abstracts from CiteUlike. 16,980 articles 1.6M words 8K unique terms Chong Wang Collaborative Topic Models for Users and Texts 15

Slide 19

Slide 19 text

Learned topics gene genes expression tissues regulation coexpression tissuespecific expressed tissue regulatory nodes wireless protocol routing protocols node sensor peertopeer scalable hoc distribution random probability distributions sampling stochastic markov density estimation statistics learning machine training vector learn machines kernel learned classifiers classifier wireless gene probability classifier Chong Wang Collaborative Topic Models for Users and Texts 16

Slide 20

Slide 20 text

Learned topic proportions for one article relative importance give original respect obtain ranking large small numbers larger extremely amounts smaller web semantic pages page metadata standards rdf xml topic proportions Chong Wang Collaborative Topic Models for Users and Texts 17

Slide 21

Slide 21 text

Learned topic proportions for another one estimate estimates likelihood maximum estimated missing distribution random probability distributions sampling algorithm signal input signals output exact performs topic proportions Maximum Likelihood from Incomplete Data via the EM Algorithm By A. P. DEMPSTER, N. M. LAIRD and D. B. RDIN Harvard University and Educational Testing Service [Read before the ROYAL STATISTICAL at a meeting organized by the RESEARCH SOCIETY SECTION on Wednesday, December 8th, 1976, Professor S. D. SILVEY in the Chair] A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data is presented at various levels of generality. Theory showing the monotone behaviour of the likelihood and convergence of the algorithm is derived. Many examples are sketched, including missing value situations, applications to grouped, censored or truncated data, finite mixture models, variance component estimation, hyperparameter estimation, iteratively reweighted least squares and factor analysis. Keywords : MAXIMUM LIKELIHOOD ;INCOMPLETE DATA ;EM ALGORITHM ;POSTERIOR MODE 1. INTRODUCTION THIS paper presents a general approach to iterative computation of maximum-likelihood estimates when the observations can be viewed as incomplete data. Since each iteration of the algorithm consists of an expectation step followed by a maximization step we call it the EM algorithm. The EM process is remarkable in part because of the simplicity and generality of the associated theory, and in part because of the wide range of examples which fall under its umbrella. When the underlying complete data come from an exponential family whose maximum-likelihood estimates are easily computed, then each maximization step of an EM algorithm is likewise easily computed. Chong Wang Collaborative Topic Models for Users and Texts 18

Slide 22

Slide 22 text

Collaborative Topic Models for Users and Texts Chong Wang Collaborative Topic Models for Users and Texts 19

Slide 23

Slide 23 text

People read documents These user-text data tell us how people read documents. Chong Wang Collaborative Topic Models for Users and Texts 20

Slide 24

Slide 24 text

People read documents These user-text data tell us how people read documents. Given these data, we hope to Help people find documents that they are interested in Learn about what the documents mean to the people who read them Learn about the people reading the documents. Chong Wang Collaborative Topic Models for Users and Texts 20

Slide 25

Slide 25 text

CiteULike Chong Wang Collaborative Topic Models for Users and Texts 21

Slide 26

Slide 26 text

Mendeley Chong Wang Collaborative Topic Models for Users and Texts 22

Slide 27

Slide 27 text

Collaborative topic models (Wang & Blei, KDD 2011) STATS VISION Maximum Likelihood from Incomplete Data via the EM Algorithm By A. P. DEMPSTER, N. M. LAIRD and D. B. RDIN Harvard University and Educational Testing Service [Read before the ROYAL STATISTICAL at a meeting organized by the RESEARCH SOCIETY SECTION on Wednesday, December 8th, 1976, Professor S. D. SILVEY in the Chair] A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data is presented at various levels of generality. Theory showing the monotone behaviour of the likelihood and convergence of the algorithm is derived. Many examples are sketched, including missing value situations, applications to grouped, censored or truncated data, finite mixture models, variance component estimation, hyperparameter estimation, iteratively reweighted least squares and factor analysis. Keywords : MAXIMUM LIKELIHOOD ;INCOMPLETE DATA ;EM ALGORITHM ;POSTERIOR MODE 1. INTRODUCTION THIS paper presents a general approach to iterative computation of maximum-likelihood estimates when the observations can be viewed as incomplete data. Since each iteration of the algorithm consists of an expectation step followed by a maximization step we call it the EM algorithm. The EM process is remarkable in part because of the simplicity and generality of the associated theory, and in part because of the wide range of examples which fall under its umbrella. When the underlying complete data come from an exponential family whose maximum-likelihood estimates are easily computed, then each maximization step of an EM algorithm is likewise easily computed. The term "incomplete data" in its general form implies the existence of two sample spaces %Y and X and a many-one mapping from3 to Y. The observed data y are a realization from C Y . The corresponding x in X is not observed directly, but only indirectly through y. More specifically, we assume there is a mapping x+ y(x) from X to Y, and that x is known only to lie in X(y), the subset of X determined by the equation y = y(x), where y is the observed data. We refer to x as the complete data even though in certain examples x includes what are traditionally called parameters. We postulate a family of sampling densities f(x I +) depending on parameters and derive its corresponding family of sampling densities g(y[+). The complete-data specification f(...1 ...) is related to the incomplete-data specification g( ...I ...) by (1.1) The EM algorithm is directed at finding a value of + which maximizes g(y 1 +) g'iven an observed y, but it does so by making essential use of the associated family f(xl+). Notice that given the incomplete-data specification g(y1 +), there are many possible complete-data specificationsf(x)+) that will generate g(y 1 +). Sometimes a natural choice will be obvious, at other times there may be several different ways of defining the associated f(xl+). Each iteration of the EM algorithm involves two steps which we call the expectation step (E-step) and the maximization step (M-step). The precise definitions of these steps, and their associated heuristic interpretations, are given in Section 2 for successively more general types of models. Here we shall present only a simple numerical example to give the flavour of the method. The EM paper Chong Wang Collaborative Topic Models for Users and Texts 23

Slide 28

Slide 28 text

Collaborative topic models (Wang & Blei, KDD 2011) STATS VISION Maximum Likelihood from Incomplete Data via the EM Algorithm By A. P. DEMPSTER, N. M. LAIRD and D. B. RDIN Harvard University and Educational Testing Service [Read before the ROYAL STATISTICAL at a meeting organized by the RESEARCH SOCIETY SECTION on Wednesday, December 8th, 1976, Professor S. D. SILVEY in the Chair] A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data is presented at various levels of generality. Theory showing the monotone behaviour of the likelihood and convergence of the algorithm is derived. Many examples are sketched, including missing value situations, applications to grouped, censored or truncated data, finite mixture models, variance component estimation, hyperparameter estimation, iteratively reweighted least squares and factor analysis. Keywords : MAXIMUM LIKELIHOOD ;INCOMPLETE DATA ;EM ALGORITHM ;POSTERIOR MODE 1. INTRODUCTION THIS paper presents a general approach to iterative computation of maximum-likelihood estimates when the observations can be viewed as incomplete data. Since each iteration of the algorithm consists of an expectation step followed by a maximization step we call it the EM algorithm. The EM process is remarkable in part because of the simplicity and generality of the associated theory, and in part because of the wide range of examples which fall under its umbrella. When the underlying complete data come from an exponential family whose maximum-likelihood estimates are easily computed, then each maximization step of an EM algorithm is likewise easily computed. The term "incomplete data" in its general form implies the existence of two sample spaces %Y and X and a many-one mapping from3 to Y. The observed data y are a realization from C Y . The corresponding x in X is not observed directly, but only indirectly through y. More specifically, we assume there is a mapping x+ y(x) from X to Y, and that x is known only to lie in X(y), the subset of X determined by the equation y = y(x), where y is the observed data. We refer to x as the complete data even though in certain examples x includes what are traditionally called parameters. We postulate a family of sampling densities f(x I +) depending on parameters and derive its corresponding family of sampling densities g(y[+). The complete-data specification f(...1 ...) is related to the incomplete-data specification g( ...I ...) by (1.1) The EM algorithm is directed at finding a value of + which maximizes g(y 1 +) g'iven an observed y, but it does so by making essential use of the associated family f(xl+). Notice that given the incomplete-data specification g(y1 +), there are many possible complete-data specificationsf(x)+) that will generate g(y 1 +). Sometimes a natural choice will be obvious, at other times there may be several different ways of defining the associated f(xl+). Each iteration of the EM algorithm involves two steps which we call the expectation step (E-step) and the maximization step (M-step). The precise definitions of these steps, and their associated heuristic interpretations, are given in Section 2 for successively more general types of models. Here we shall present only a simple numerical example to give the flavour of the method. Recommend to STATS people The EM paper Chong Wang Collaborative Topic Models for Users and Texts 23

Slide 29

Slide 29 text

Collaborative topic models (Wang & Blei, KDD 2011) STATS VISION Maximum Likelihood from Incomplete Data via the EM Algorithm By A. P. DEMPSTER, N. M. LAIRD and D. B. RDIN Harvard University and Educational Testing Service [Read before the ROYAL STATISTICAL at a meeting organized by the RESEARCH SOCIETY SECTION on Wednesday, December 8th, 1976, Professor S. D. SILVEY in the Chair] A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data is presented at various levels of generality. Theory showing the monotone behaviour of the likelihood and convergence of the algorithm is derived. Many examples are sketched, including missing value situations, applications to grouped, censored or truncated data, finite mixture models, variance component estimation, hyperparameter estimation, iteratively reweighted least squares and factor analysis. Keywords : MAXIMUM LIKELIHOOD ;INCOMPLETE DATA ;EM ALGORITHM ;POSTERIOR MODE 1. INTRODUCTION THIS paper presents a general approach to iterative computation of maximum-likelihood estimates when the observations can be viewed as incomplete data. Since each iteration of the algorithm consists of an expectation step followed by a maximization step we call it the EM algorithm. The EM process is remarkable in part because of the simplicity and generality of the associated theory, and in part because of the wide range of examples which fall under its umbrella. When the underlying complete data come from an exponential family whose maximum-likelihood estimates are easily computed, then each maximization step of an EM algorithm is likewise easily computed. The term "incomplete data" in its general form implies the existence of two sample spaces %Y and X and a many-one mapping from3 to Y. The observed data y are a realization from C Y . The corresponding x in X is not observed directly, but only indirectly through y. More specifically, we assume there is a mapping x+ y(x) from X to Y, and that x is known only to lie in X(y), the subset of X determined by the equation y = y(x), where y is the observed data. We refer to x as the complete data even though in certain examples x includes what are traditionally called parameters. We postulate a family of sampling densities f(x I +) depending on parameters and derive its corresponding family of sampling densities g(y[+). The complete-data specification f(...1 ...) is related to the incomplete-data specification g( ...I ...) by (1.1) The EM algorithm is directed at finding a value of + which maximizes g(y 1 +) g'iven an observed y, but it does so by making essential use of the associated family f(xl+). Notice that given the incomplete-data specification g(y1 +), there are many possible complete-data specificationsf(x)+) that will generate g(y 1 +). Sometimes a natural choice will be obvious, at other times there may be several different ways of defining the associated f(xl+). Each iteration of the EM algorithm involves two steps which we call the expectation step (E-step) and the maximization step (M-step). The precise definitions of these steps, and their associated heuristic interpretations, are given in Section 2 for successively more general types of models. Here we shall present only a simple numerical example to give the flavour of the method. Recommend to STATS people The EM paper Chong Wang Collaborative Topic Models for Users and Texts 23

Slide 30

Slide 30 text

Collaborative topic models (Wang & Blei, KDD 2011) STATS VISION STATS VISION Maximum Likelihood from Incomplete Data via the EM Algorithm By A. P. DEMPSTER, N. M. LAIRD and D. B. RDIN Harvard University and Educational Testing Service [Read before the ROYAL STATISTICAL at a meeting organized by the RESEARCH SOCIETY SECTION on Wednesday, December 8th, 1976, Professor S. D. SILVEY in the Chair] A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data is presented at various levels of generality. Theory showing the monotone behaviour of the likelihood and convergence of the algorithm is derived. Many examples are sketched, including missing value situations, applications to grouped, censored or truncated data, finite mixture models, variance component estimation, hyperparameter estimation, iteratively reweighted least squares and factor analysis. Keywords : MAXIMUM LIKELIHOOD ;INCOMPLETE DATA ;EM ALGORITHM ;POSTERIOR MODE 1. INTRODUCTION THIS paper presents a general approach to iterative computation of maximum-likelihood estimates when the observations can be viewed as incomplete data. Since each iteration of the algorithm consists of an expectation step followed by a maximization step we call it the EM algorithm. The EM process is remarkable in part because of the simplicity and generality of the associated theory, and in part because of the wide range of examples which fall under its umbrella. When the underlying complete data come from an exponential family whose maximum-likelihood estimates are easily computed, then each maximization step of an EM algorithm is likewise easily computed. The term "incomplete data" in its general form implies the existence of two sample spaces %Y and X and a many-one mapping from3 to Y. The observed data y are a realization from C Y . The corresponding x in X is not observed directly, but only indirectly through y. More specifically, we assume there is a mapping x+ y(x) from X to Y, and that x is known only to lie in X(y), the subset of X determined by the equation y = y(x), where y is the observed data. We refer to x as the complete data even though in certain examples x includes what are traditionally called parameters. We postulate a family of sampling densities f(x I +) depending on parameters and derive its corresponding family of sampling densities g(y[+). The complete-data specification f(...1 ...) is related to the incomplete-data specification g( ...I ...) by (1.1) The EM algorithm is directed at finding a value of + which maximizes g(y 1 +) g'iven an observed y, but it does so by making essential use of the associated family f(xl+). Notice that given the incomplete-data specification g(y1 +), there are many possible complete-data specificationsf(x)+) that will generate g(y 1 +). Sometimes a natural choice will be obvious, at other times there may be several different ways of defining the associated f(xl+). Each iteration of the EM algorithm involves two steps which we call the expectation step (E-step) and the maximization step (M-step). The precise definitions of these steps, and their associated heuristic interpretations, are given in Section 2 for successively more general types of models. Here we shall present only a simple numerical example to give the flavour of the method. Recommend to STATS people The EM paper Chong Wang Collaborative Topic Models for Users and Texts 23

Slide 31

Slide 31 text

Collaborative topic models (Wang & Blei, KDD 2011) STATS VISION STATS VISION Maximum Likelihood from Incomplete Data via the EM Algorithm By A. P. DEMPSTER, N. M. LAIRD and D. B. RDIN Harvard University and Educational Testing Service [Read before the ROYAL STATISTICAL at a meeting organized by the RESEARCH SOCIETY SECTION on Wednesday, December 8th, 1976, Professor S. D. SILVEY in the Chair] A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data is presented at various levels of generality. Theory showing the monotone behaviour of the likelihood and convergence of the algorithm is derived. Many examples are sketched, including missing value situations, applications to grouped, censored or truncated data, finite mixture models, variance component estimation, hyperparameter estimation, iteratively reweighted least squares and factor analysis. Keywords : MAXIMUM LIKELIHOOD ;INCOMPLETE DATA ;EM ALGORITHM ;POSTERIOR MODE 1. INTRODUCTION THIS paper presents a general approach to iterative computation of maximum-likelihood estimates when the observations can be viewed as incomplete data. Since each iteration of the algorithm consists of an expectation step followed by a maximization step we call it the EM algorithm. The EM process is remarkable in part because of the simplicity and generality of the associated theory, and in part because of the wide range of examples which fall under its umbrella. When the underlying complete data come from an exponential family whose maximum-likelihood estimates are easily computed, then each maximization step of an EM algorithm is likewise easily computed. The term "incomplete data" in its general form implies the existence of two sample spaces %Y and X and a many-one mapping from3 to Y. The observed data y are a realization from C Y . The corresponding x in X is not observed directly, but only indirectly through y. More specifically, we assume there is a mapping x+ y(x) from X to Y, and that x is known only to lie in X(y), the subset of X determined by the equation y = y(x), where y is the observed data. We refer to x as the complete data even though in certain examples x includes what are traditionally called parameters. We postulate a family of sampling densities f(x I +) depending on parameters and derive its corresponding family of sampling densities g(y[+). The complete-data specification f(...1 ...) is related to the incomplete-data specification g( ...I ...) by (1.1) The EM algorithm is directed at finding a value of + which maximizes g(y 1 +) g'iven an observed y, but it does so by making essential use of the associated family f(xl+). Notice that given the incomplete-data specification g(y1 +), there are many possible complete-data specificationsf(x)+) that will generate g(y 1 +). Sometimes a natural choice will be obvious, at other times there may be several different ways of defining the associated f(xl+). Each iteration of the EM algorithm involves two steps which we call the expectation step (E-step) and the maximization step (M-step). The precise definitions of these steps, and their associated heuristic interpretations, are given in Section 2 for successively more general types of models. Here we shall present only a simple numerical example to give the flavour of the method. Recommend to STATS people Recommend to both STATS and VISION people The EM paper Chong Wang Collaborative Topic Models for Users and Texts 23

Slide 32

Slide 32 text

Collaborative topic models STATS VISION STATS VISION Topic proportions + Corrections = Article representation The user behavior changes the way we should look at the data. Chong Wang Collaborative Topic Models for Users and Texts 24

Slide 33

Slide 33 text

Collaborative topic models z w b N a ✓ I J ✏ u r v u K Topic proportions Corrections User preferences Document words Ratings Topic Modeling Matrix Factorization Topics C. Wang and D. Blei, KDD 2011 P. Gopalan, L. Charlin and D. Blei, NIPS 2014 (a better formulation) Chong Wang Collaborative Topic Models for Users and Texts 25

Slide 34

Slide 34 text

The data From citeulike.org 5.5K users and 17K research articles with abstracts From mendeley.com 80K users and 261K research articles with abstracts Chong Wang Collaborative Topic Models for Users and Texts 26

Slide 35

Slide 35 text

Two types of recommendations Users Articles Maximum likelihood from incomplete data via the EM algorithm Conditional random fields Introduction to variational methods for graphical models The mathematics of statistical machine translation Your new article In-matrix prediction Out-of-matrix prediction Chong Wang Collaborative Topic Models for Users and Texts 27

Slide 36

Slide 36 text

Recommendation performance—CiteULike in−matrix out−of−matrix 0.2 0.4 0.6 0.8 q q q q q q q q q q q q q q q q q q q q 50 100 150 200 50 100 150 200 number of recommended articles recall method q CoTM LDA MF Chong Wang Collaborative Topic Models for Users and Texts 28

Slide 37

Slide 37 text

Recommendation performance—Mendeley in−matrix out−of−matrix 0.00 0.05 0.10 0.15 0.20 0.25 q q q q q q q q q q q q q q q q q q q q 50 100 150 200 50 100 150 200 number of recommended articles recall method q CoTM LDA MF Chong Wang Collaborative Topic Models for Users and Texts 29

Slide 38

Slide 38 text

More than recommendation Maximum Likelihood from Incomplete Data via the EM Algorithm By A. P. DEMPSTER, N. M. LAIRD and D. B. RDIN Harvard University and Educational Testing Service [Read before the ROYAL STATISTICAL at a meeting organized by the RESEARCH SOCIETY SECTION on Wednesday, December 8th, 1976, Professor S. D. SILVEY in the Chair] A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data is presented at various levels of generality. Theory showing the monotone behaviour of the likelihood and convergence of the algorithm is derived. Many examples are sketched, including missing value situations, applications to grouped, censored or truncated data, finite mixture models, variance component estimation, hyperparameter estimation, iteratively reweighted least squares and factor analysis. Keywords : MAXIMUM LIKELIHOOD ;INCOMPLETE DATA ;EM ALGORITHM ;POSTERIOR MODE 1. INTRODUCTION THIS paper presents a general approach to iterative computation of maximum-likelihood Chong Wang Collaborative Topic Models for Users and Texts 30

Slide 39

Slide 39 text

Before users read it, CiteULike 0.0 0.1 0.2 0.3 0 100 200 300 400 500 Topic Weight estimation, likelihood, maximum, parameters, methods, estimators algorithm, algorithms, optimization, problem, efficient, problems Chong Wang Collaborative Topic Models for Users and Texts 31

Slide 40

Slide 40 text

After users read it, CiteULike 0.0 0.1 0.2 0.3 0 100 200 300 400 500 Topic Weight estimation, likelihood, maximum, parameters, methods, estimators algorithm, algorithms, optimization, problem, efficient, problems image, images, segmentation, algorithm, registration, camera bayesian, model, inference, models, probability, probabilistic Chong Wang Collaborative Topic Models for Users and Texts 32

Slide 41

Slide 41 text

Another article from CiteULike Phase-of-firing coding of natural visual stimuli in primary visual cortex. Topic Weight 0.0 0.1 0.2 0.3 0.4 0.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● 0 50 100 150 200 neurons, responses, neuronal, spike, cortical, stimuli, stimulus Chong Wang Collaborative Topic Models for Users and Texts 33

Slide 42

Slide 42 text

More than recommendation Users Articles Maximum likelihood from incomplete data via the EM algorithm Conditional Random Fields Introduction to Variational Methods for Graphical Models The Mathematics of Statistical Machine Translation We can look at posterior estimates to find Widely read articles in a field Articles in a field that are widely read in other fields Articles from other fields that are widely read in a field These are possible through interpretable latent topics. Chong Wang Collaborative Topic Models for Users and Texts 34

Slide 43

Slide 43 text

Topic: Maximum Likelihood Topic estimates, likelihood, maximum, parameters, method about this topic, popular in this topic Maximum Likelihood Estimation of Population Parameters Bootstrap Methods: Another Look at the Jackknife R. A. Fisher and the Making of Maximum Likelihood about this topic, popular in other topics Maximum Likelihood from Incomplete Data with the EM Algorithm Bootstrap Methods: Another Look at the Jackknife Tutorial on Maximum Likelihood Estimation NOT about this topic, popular in this topic Random Forests Identification of Causal Effects Using Instrumental Variables Matrix Computations Chong Wang Collaborative Topic Models for Users and Texts 35

Slide 44

Slide 44 text

Topic: Network science Topic networks, topology, connected, nodes, links, degree about this topic, popular in this topic Assortative Mixing in Networks Characterizing the Dynamical Importance of Network Nodes and Links Subgraph Centrality in Complex Networks about this topic, popular in other topics Assortative Mixing in Networks The Structure and Function of Complex Networks Statistical Mechanics of Complex Networks NOT about this topic, popular in this topic Power Law Distributions in Empirical Data Graph Structure in the Web The Orgins of Bursts and Heavy Tails in Human Dynamics Chong Wang Collaborative Topic Models for Users and Texts 36

Slide 45

Slide 45 text

The “corrections” phenomenon is not alone (Neiswanger, Wang, Ho & Xing, UAI 2014) Initial Topics built, side, large, design italy, italian, china, russian church, christ, jesus, god 0 20 40 60 80 100 120 140 160 0.05 0.00 0.05 0.10 0.15 0.20 0.25 church, christ, jesus, god Topics after Random Offsets built, side, large, design english, knight, translated, restoration italy, italian, china, russian 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Italy italy, italian, china, russian building, built, tower, architecture built, side, large, design church, christ, jesus, god built, side, large, design church, christ, jesus, god Offsets Learned from Links (Random Offsets) english, knight, translated, restoration 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Chapel built, side, large, design church, christ, jesus, god building, built, tower, architecture Text: "The Sistine Chapel is a large chapel in the Vatican Palace, the place in Italy where the Pope lives. The Chapel was built between 1473 and 1481 by Giovanni dei Dolci for Pope Sistus IV...The Sistine Chapel is famous for its fresco paintings by the Renaissance painter Michelangelo..." Sistine Chapel (Simple English Wikipedia) In-Links (Citing Documents): (1) Raphael, (2) Ten Commandments, (3) Chapel, (4) Apostolic Palace, (5) St. Peter's Basilica Predicted Links: (1) Chapel, (2) Christian, (3) Italy 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Christian philosophy, ideas, study, knowledge people, make, place, live church, christ, jesus, god Similar observation in citation networks. Chong Wang Collaborative Topic Models for Users and Texts 37

Slide 46

Slide 46 text

Summary Collaborative topic models : Blend content-based and rating-based recommendations Discover patterns in how people read / how documents are read Suggest new ways of doing document recommendations Chong Wang Collaborative Topic Models for Users and Texts 38

Slide 47

Slide 47 text

Demo Demo Chong Wang Collaborative Topic Models for Users and Texts 39

Slide 48

Slide 48 text

Thank you! BTW: Baidu AI lab is hiring research scientists and software engineers! [email protected] Chong Wang Collaborative Topic Models for Users and Texts 40