Slide 1

Slide 1 text

Learning User Preferences from Multi-Modal Data Hady W. Lauw and Максим Ткаченко

Slide 2

Slide 2 text

2

Slide 3

Slide 3 text

3

Slide 4

Slide 4 text

4

Slide 5

Slide 5 text

5

Slide 6

Slide 6 text

Singapore Management University • 1 of 4 major research universities in Singapore • City university in downtown Singapore • Established in 2000 • 10 thousand students (20% postgraduates) 6

Slide 7

Slide 7 text

School of Information Systems 7

Slide 8

Slide 8 text

Web Mining Group 8 designing algorithms for mining user-generated data of various modalities for understanding the behaviors and preferences of users, individually and collectively, and applying the mined knowledge to develop user-centric applications Hady Максим

Slide 9

Slide 9 text

LEARNING USER PREFERENCES FROM MULTI-MODAL DATA

Slide 10

Slide 10 text

So many choices … 133,520 results for ”Men’s Shoes" 220,721 results for ”iPhone 7 case" 1,627,213 results for ”Kindle eBooks"

Slide 11

Slide 11 text

… so little time (and space) https://www.statista.com/statistics/274774/forecast-of-mobile-phone-users-worldwide/

Slide 12

Slide 12 text

Personalized Recommendation

Slide 13

Slide 13 text

Netflix Prize https://bits.blogs.nytimes.com/2009/09/21/netflix-awards-1-million-prize-and-starts-a-new-contest/

Slide 14

Slide 14 text

Matrix Factorization

Slide 15

Slide 15 text

Rating-based Preferences • Models – Matrix Factorization • Gaussian • Poisson – Probabilistic Latent Semantic Analysis – Restricted Boltzmann Machines – Neighborhood-based recommendation • Sparsity – Most users have very little recorded interactions – Newly launched items have no history • Over-reliance on pointwise observations – Model overfitting – “More of the same” problem 15 Strategy: Going beyond ratings

Slide 16

Slide 16 text

Multi-Modal Preference Signals 16 User metadata (structured text) review (unstructured text) rating (numerical) social network similarity photos (e.g., instagram) similarity collabo- rative filtering images

Slide 17

Slide 17 text

Preferred.AI Preferences and Recommendations from Data & AI 17 Data Infrastructure & Representation Learning • Focused crawling framework • Unified product catalogue • Pre-trained features & resources Preference Learning Algorithms • Multi-modal • Multi-relational • Multi-faceted Recommendation Retrieval Engine • Real-time personalization • Indexable representations • Sessionization Apps ThriftCity global search engine for offers FoodRecce food recommendation for groups end-to-end recommendation framework 5-year funding from Singapore National Research Foundation (NRF) Fellowship

Slide 18

Slide 18 text

LEARNING USER PREFERENCES FROM MULTI-MODAL DATA Preference Signal from Review Images Preference Signal from Review Text Preference Signal from Social Networks 18 review text images network Reference: • Quoc-Tuan Truong and Hady W. Lauw, “Visual Sentiment Analysis for Review Images with Item-Oriented and User-Oriented CNN”, ACM Multimedia (ACM MM'17), Oct 2017

Slide 19

Slide 19 text

Review Images

Slide 20

Slide 20 text

Preference Signal from Sentiment 20 Sentiment Analysis Akamaru Modern with Kakuni (braised pork belly) topping - Hands down THE best bowl of ramen I've had in my life! Positive Negative or Visual Sentiment Analysis Positive Negative or Image Classification Problem

Slide 21

Slide 21 text

Neural Network 21 https://ujjwalkarn.me/2016/08/09/quick-intro-neural-networks/

Slide 22

Slide 22 text

Deep Neural Network 22 http://neuralnetworksanddeeplearning.com/chap5.html

Slide 23

Slide 23 text

Convolutional Neural Network 23 https://www.mathworks.com/discovery/convolutional-neural-network.html

Slide 24

Slide 24 text

Convolution 24 https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/ 3 x 3 filter

Slide 25

Slide 25 text

Pooling 25 https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/

Slide 26

Slide 26 text

VS-CNN Architecture 227 227 55 55 3 48 128 192 192 128 2048 2048 2048 2048 2 27 27 13 13 13 13 13 13 input conv1 conv2 conv3 conv4 conv5 fc6 fc7 fc8 Visual Sentiment CNN (VS-CNN) Architecture

Slide 27

Slide 27 text

Experiments on Yelp Dataset • Size: – 96 thousand images – 8 thousand businesses – 27 thousand users • Coverage: – Boston, Chicago, Houston, Los Angeles, New York, San Francisco, Seattle • Sentiment classes: – Negative: ratings 1 and 2 – Positive: ratings 4 and 5 Random Naïve Bayes VS-CNN Pointwise Accuracy 0.500 0.539 0.544 Pairwise Accuracy 0.500 0.551 0.572 27

Slide 28

Slide 28 text

Positive Sentiment Images 28

Slide 29

Slide 29 text

Negative Sentiment Images 29

Slide 30

Slide 30 text

Item-oriented Parameters Convolutional Layer VS- CNN Item-oriented VS-CNN conv1 conv3 conv5 Pointwise Accuracy 0.544 0.563 0.610 0.612 Pairwise Accuracy 0.572 0.592 0.655 0.660 227 227 55 55 3 48 128 192 192 128 2048 2048 2048 2048 2 27 27 13 13 13 13 13 13 input conv1 conv2 conv3 conv4 conv5 fc6 fc7 fc8

Slide 31

Slide 31 text

Item-oriented Parameters Fully-Connected Layer VS- CNN Item-oriented VS-CNN conv1 conv3 conv5 fc7 Pointwise Accuracy 0.544 0.563 0.610 0.612 0.620 Pairwise Accuracy 0.572 0.592 0.655 0.660 0.678 227 227 55 55 3 48 128 192 192 128 2048 2048 2048 2048 2 27 27 13 13 13 13 13 13 input conv1 conv2 conv3 conv4 conv5 fc6 fc7 fc8

Slide 32

Slide 32 text

User-oriented Parameters VS-CNN User-oriented VS-CNN conv1 conv3 conv5 fc7 Pointwise Accuracy 0.539 0.596 0.638 0.646 0.649 Pairwise Accuracy 0.556 0.639 0.686 0.706 0.743

Slide 33

Slide 33 text

LEARNING USER PREFERENCES FROM MULTI-MODAL DATA Preference Signal from Review Images Preference Signal from Review Text Preference Signal from Social Networks 33 review text images network Reference: • Maksim Tkachenko and Hady W. Lauw, "Comparative Relation Generative Model,” IEEE Transactions on Knowledge and Data Engineering (TKDE), 2017

Slide 34

Slide 34 text

Preference Signal from Comparison Which camera has better image quality? 34 Canon EOS 7D Nikon D300S

Slide 35

Slide 35 text

Turn to Review Text 35 Identify and interpret comparisons expressed in texts “Compared to the Canon 7D the Nikon D300s gives sharper pictures with less noise and great details over iso 400.”

Slide 36

Slide 36 text

Comparison Mining ≠ Sentiment Mining 36

Slide 37

Slide 37 text

Questions Given a set of comparative sentences: • Each about two products (e.g., 7D vs. D300S) • On a specific aspect (e.g., image quality) 1. How can we understand the comparative direction in each sentence? 2. Overall, taking into account all sentences, which entity is better? 37

Slide 38

Slide 38 text

Insight: Better Together 38 Assignment: complete the ranking if you do not know what “superior” means. Corpus of Comparisons:

Slide 39

Slide 39 text

Generative Model for Comparative Sentences 39 Generation of comparison outcomes (which entity is better) Generation of words describing the comparison. Related to Competition Models Related to Naïve Bayes

Slide 40

Slide 40 text

Relation to Competition Model • Player has latent ability # • Probability that wins over in a match: 40 • In our context: – Each comparative sentence simulates a match between two entities (players), with the outcome that one entity wins (is better). – The outcome itself is not given. It needs to be determined. – The outcome depends on the text of the comparative sentence. Bradley-Terry-Luce (BTL) ≻ = (# − +) = σ

Slide 41

Slide 41 text

Relation to Naïve Bayes 41 #2 is favored #1 .. #2 .. better #1 .. #2 .. sharper #1 is favored #1 .. better .. #2 #1 .. sharper .. #2 • The meaning of a sentence changes if: – Words are different (better vs. worse) – Word order is different • “A is better than B” vs. “B is better than A” • We distinguish whether a word appears before the first-mentioned entity (#1), in between, or after the second-mentioned entity (#2):

Slide 42

Slide 42 text

CompareGem 42 Generation of comparative sentences Latent parameters: • Entity rank: # • Comparison direction: 1 • Feature distributions: 3,5 Observations: • Features: COMPArative RElation GEnerative Model

Slide 43

Slide 43 text

43

Slide 44

Slide 44 text

Dateset • Amazon reviews for 180 digital cameras • Supervised settings: 50% - training, 50% - testing Aspect #sentences #1 entity is favored #2 entity is favored Functionality 457 38.5% 61.5% Form Factor 78 61.3% 38.7% Image Quality 129 58.1% 41.9% Price 165 52.1% 47.9%

Slide 45

Slide 45 text

Comparative Direction • Binary classification of each sentence (#1 entity is better or worse) Aspect CompareGem SVM Naïve Bayes Functionality 89.0% 76.6% 74.4% Form Factor 71.5% 57.8% 62.8% Image Quality 73.8% 65.4% 64.5% Price 68.7% 52.8% 55.2%

Slide 46

Slide 46 text

Entity Ranking • Pairwise ranking of entities, with majority votes as ground truth. Aspect CompareGem SVM + BTL Naïve Bayes + BTL Functionality 89.7% 88.6% 88.8% Form Factor 82.7% 79.8% 82.7% Image Quality 80.7% 78.7% 80.6% Price 79.0% 75.8% 76.7%

Slide 47

Slide 47 text

LEARNING USER PREFERENCES FROM MULTI-MODAL DATA Preference Signal from Review Images Preference Signal from Review Text Preference Signal from Social Networks 47 review text images network Reference: • Trong T. Nguyen and Hady W. Lauw, "Representation Learning for Homophilic Preferences,” ACM Conference on Recommender Systems (RecSys'16), Sep 2016

Slide 48

Slide 48 text

Preference Signal from Social Links 48 Lauw et al. Internet Computing 2010 A C B Social Network Adoptions birds of a feather flock together рыбак рыбака видит издалека

Slide 49

Slide 49 text

Restricted Boltzmann Machines • Let x be binary vector of visible units • Let h be binary vector of hidden units • a, b are biases, W are weights • Energy function: • Likelihood: • Individual activation probabilities 49 https://en.wikipedia.org/wiki/Restrict ed_Boltzmann_machine stochastic generative artificial neural networks

Slide 50

Slide 50 text

RBM for Collaborative Filtering • Each item corresponds to a visible unit • Value of visible units may be ratings (from 1 to 5) – softmax instead of sigmoid – for simplicity, subsequent discussion is on binary adoption • Each user corresponds to an RBM instance – parameter sharing across users 50 Salakhutdinov et al. ICML 2007 Latent user-representation

Slide 51

Slide 51 text

Integrating the social network via hidden layers/ representations in a RBM-based approach. – No user-specific parameter for social-network constraints. – In the context of item-adoptions prediction task. 51 SocialRBM U I Explore both user-item (UI) vs. user-user (UU) connections

Slide 52

Slide 52 text

Model 1: SocialRBM-Wing 52 activation probabilities energy function social connections and adoptions play a role as observations encoded jointly through a shared hidden layer social network as observation

Slide 53

Slide 53 text

Model 2: SocialRBM-Deep 53 energy function top layer h2 has U hidden units, corresponding to U users, and each user is represented by a single hidden unit on the top layer with weights shared with their friends activation probabilities social network as sharing of hidden units

Slide 54

Slide 54 text

Experiments: Datasets • Two public-datasets: Delicious vs. Last.FM 54

Slide 55

Slide 55 text

Model Comparison (Delicious) • Task: item-adoption prediction • Metric: Recall@[20…200] 55

Slide 56

Slide 56 text

Network Randomization (Delicious) social network vs. random network(*) comparison in the prediction task 56 (*) By exchanging edges/links in the network while preserving node degrees.

Slide 57

Slide 57 text

Results for LastFM 57

Slide 58

Slide 58 text

Conclusion • Harnessing multi-modal preference signals – Images, text, social networks in addition to ratings/adoptions • Work-in-progress – Still far from full personalization of user experiences • Future work – Additional modalities (e.g., metadata), joint modalities – End-to-end recommendation framework • Opportunities to get involved http://hadylauw.com http://mtkachenko.info

Slide 59

Slide 59 text

THANK YOU Contact US: http://hadylauw.com http://mtkachenko.info