Slide 1

Slide 1 text

I Im mp pr ro ov vi in ng g R Re ec co om mm me en nd da at ti io on n S Sy ys st te em ms s w wi it th h U Us se er r P Pe er rs so on na al li it ty y I In nf fe er rr re ed d f fr ro om m P Pr ro od du uc ct t R Re ev vi ie ew ws s Lu Xinyuan (Presenter)1,2 Kan Min-Yen2 IRS Workshop in WSDM’23 March 3 1 ISEP Program, NUS Graduate School 2 School of Computing, National University of Singapore

Slide 2

Slide 2 text

RecSys: Item recommendations to the end users 2 Recommendation System (RecSys) movies, music, products… Search Recommendations

Slide 3

Slide 3 text

RecSys were designed to stimulate user’s consumption behavior. Such user behaviors are largely influenced by user’s profile 3 Recommendation System (RecSys) movies, music, products… Search Recommendations age education demographic information user profile

Slide 4

Slide 4 text

Traditional RecSys focuses on a user’s static profile User’s psychology – e.g., personality, emotion – can help model a user’s dynamic profile. 4 Recommendation System (RecSys) age education demographic information user’s static profile user’s dynamic profile personality emotion

Slide 5

Slide 5 text

5 Personality has been shown to be directly related to user preference Example: o Open people are more likely to watch comedy movies [Ivan et al. 2013] o Open people favor energetic music genres [Mariappan et al. 2012] Why we need personality in RecSys Personality affects user preference

Slide 6

Slide 6 text

6 The recommendation should depend on user’s current emotion state. Example: o The same user is likely to watch comedy movies when he/she is happy while watching tragedy movies when he/she is sad. Why we need emotion in RecSys Emotion state can influence people’s decisions

Slide 7

Slide 7 text

Privacy of Personality Information. 7 Challenges • Personality information can be misused by malicious users to cause undesirable outcomes. [Hinds et al. 2020] • A challenging balance: utilizing information vs protecting privacy

Slide 8

Slide 8 text

Lack of Large Datasets 8 Challenges • Ground-truth psychology information is expensive to collect from users. • Currently, only small-scale datasets were built in existing works. • In 2018, a larger dataset myPersonality has been stopped sharing https://sites.google.com/michalkosinski.com/mypersonality

Slide 9

Slide 9 text

Subjectivity of Personality Measurement 9 Challenges • The measurement of personality can be very subjective. • The reference-group effect often occurs. [Wu et al. 2017] • The inaccurate measurement of users’ personality trait is likely to bring more noise.

Slide 10

Slide 10 text

10 Personality Model • Openness to experience: conventional vs creative thinking • Conscientiousness: disorganized vs organized • Extraversion: engagement with the external world • Agreeableness: need for social harmony • Neuroticism: emotional instability q OCEAN (Big 5)

Slide 11

Slide 11 text

11 • 10-item Big Five Inventory (BFI) test • 5-level Likert scale (Strongly agree, agree, neutral, disagree, strongly disagree) • Example: I am outgoing, sociable[1 2 3 4 5] (Extraversion related) • Time consuming Explicit Method: Questionnaire Personality Detection

Slide 12

Slide 12 text

12 • Language use has an individual difference • Infer from texts, social media posts Implicit Methods : Automatic personality detection Personality Detection • APIs: 1) IBM Personality Insights: discontinued after 2021 2) Receptiviti: sentence level 3) SenticNet: lexicon-based approach

Slide 13

Slide 13 text

Receptiviti API 13 • Receptiviti API is a computational language psychology platform for understanding human behavior. • Receptiviti was co-founded by Prof. James W. Pennebaker, the former Chair of the Department of Psychology, and the inventor of LIWC -- the gold-standard algorithm in the field of language psychology. https://www.receptiviti.com/

Slide 14

Slide 14 text

Receptiviti API: Personality API package 14 • We use Personality API Package • Our budget: $250 USD/month includes 500,000 words. 6-month subscription. • Ongoing work: We’ll discuss our own methods to replicate a personality API. https://www.receptiviti.com/personality

Slide 15

Slide 15 text

Receptiviti example 15 • Input: Pieces of texts. The more words in the text, the higher the accuracy. • In Receptiviti, more than 300 words are needed. • Output: Big 5 categorypersonality score

Slide 16

Slide 16 text

16 • Serendipity 2018 o A version of MovieLens dataset. o It is used for serendipity in RecSys. o There are 10 million ratings. Drawback: It is basically offline evaluation of recommendation algorithms. It did not contain real-time feedback (online) evaluation. • Personality 2018 o A version of MovieLens dataset. o Includes: personality information of the users + movie ratings Drawback: It only contains the Big 5 score of 1,834 users along with the movie rating that were given by these users Current Datasets

Slide 17

Slide 17 text

Taobao Serendipity 17 Datasets • A user survey on Mobile Taobao • The users first received a recommended product, then completed a questionnaire that assessed immediate feedback. • Fill in two psychological quizzes: 1) 10-item Curiosity and Exploration Inventory-II (CEI-II) 2) 10-Item Personality Inventory (TIPI) • This dataset contains 11,383 users’ feedback in the user survey. Drawback: Due to the commercial privacy concerns, the Taobao item descriptions and item category information are not public available

Slide 18

Slide 18 text

18 My Work How can we acquire personality data for RecSys? How can we explore the impact of personality on RecSys?

Slide 19

Slide 19 text

19 My Work How can we acquire personality data for RecSys? How can we explore the impact of personality on RecSys?

Slide 20

Slide 20 text

Amazon Review dataset (updated version in 2018) 2014 This is a large crawl of product reviews from Amazon. This dataset contains 82.83 million unique reviews, from around 20 million users. Metadata ○ reviews and ratings ○ item-to-item relationships (e.g. "people who bought X also bought Y") ○ timestamps ○ helpfulness votes ○ product image (and CNN features) ○ Price ○ Product descriptions ○ category ○ Sales Rank 20 download: https://nijianmo.github.io/amazon/index.html Infor:https://cseweb.ucsd.edu/~jmcauley/datasets.html#amazon_reviews 2018 • More reviews: • The total number of reviews is 233.1 million (142.8 million in 2014). • Newer reviews: • Current data includes reviews in the range May 1996 - Oct 2018.

Slide 21

Slide 21 text

Amazon Review dataset 21 user ID item ID rating score review text

Slide 22

Slide 22 text

Personality Data Preparation • To study whether personality has different influences on users’ behaviours for different domains, we choose All-Beauty and Music as 2 domains. • Input: Each user's review text (more than 300 words) • Output: Each user’s big 5 personality score. 22 Dataset # of items # of users # of ratings % of interaction Avg. Words per user Avg. Words per review Amazon-beauty 85 991 5269 6.26% 990.48 466.43 Amazon-Music 8,895 1,791 28,399 0.18% 51.01 51.18 • Sample dataset after filtering: • 80% training / 20% testing

Slide 23

Slide 23 text

Personality Data Preparation • To study the difference between questionnaire-based personality trait scores with our review-based automatic personality trait detection scores, we also include an existing dataset : Personality 2018. • User: 1,834; • MovieID [1-197,529] • Raw Ratings: 1,028,751 (scores 1-7) 23 # of items # of users # of ratings % of interaction 197,529 1,834 339,000 0.28%

Slide 24

Slide 24 text

24 How can we acquire personality data for RecSys? How can we explore the impact of personality on RecSys? My Work

Slide 25

Slide 25 text

Models • Baseline Models: • (1) Neural Collaborative Filtering (NCF) • (2) NCF + Random: randomly assign personality label • (3) NCF + Same: assign same personality label • Personality-based Models: • (1) NCF + Most salient personality: assign most salient personality label 25 Single personality

Slide 26

Slide 26 text

Models • Baseline Models: • (1) Neural Collaborative Filtering (NCF) • (2) NCF + Random: randomly assign personality label • (3) NCF + Same: assign same personality label • Personality-based Models: • (1) NCF + Most salient personality: assign most salient personality label • (2) NCF + Soft-labeled: take all personality scores and obtain a personality distribution with softmax. • (3) NCF + Hard-coded: directly add all personality scores as additional feature vector in the network 26 Multi personality distribution Single personality

Slide 27

Slide 27 text

Model 27

Slide 28

Slide 28 text

RQ1: Can we accurately detect personality from texts? 28 • To evaluate whether we can accurately detect personality traits from texts, we analyze the personality scores inferred by the Receptiviti API for each user. • We select the users that receive the top 10 highest scores for each personality type, in a total of 100 samples. • Two graduates are given review texts and personality. We ask them to choose whether the sampled review texts accurately match their inferred personality, choosing between three options of yes, no, or not sure.

Slide 29

Slide 29 text

RQ1: Can we accurately detect personality from texts? 29 • We find that the inferred personality matches with the review text in 81% of the Amazon-beauty samples, and 79% of the samples from Amazon-music. The average Cohen’s Kappa is 0.70. Personality Type Score Review Texts Extroversion 75.06 Love this shampoo! Recommended by a friend! The color really lasts!!! Agreeable 80.06 Great product - my wife loves it Agreeable 78.18 Great deal and leaves my kids smelling awesome! I bought a box of them years ago and we still have some left!!! Neuroticism 62.28 Nope. It smells like artificial bananas, and this smell does linger. It’s pure liquid, there is no thickness to it at all, it’s like pouring banana water on your head that lathers. It does not help with an itchy scalp either.

Slide 30

Slide 30 text

RQ2: What is the distribution of users’ personalities? 30 • We further analyze the personality distribution for all users by plotting the score histograms for each personality trait in the Amazon- beauty dataset and the Amazon-music dataset.

Slide 31

Slide 31 text

RQ2: What is the distribution of users’ personalities? 31 Summary • the personality traits of users are not evenly distributed. There are more instances of people with certain personality traits (e.g., agree- ableness) than others (e.g., neuroticism). A possible reason is that people with certain personalities are more willing to write product reviews. • The distributions for the two domains are generally the same, with higher agreeable scores and lower neurotic scores. However, there is a slight difference. For example, the scores of extroverts in music are generally higher than that in the beauty domain. This could be explained by the possibility that people who are passionate about music may be more emotional.

Slide 32

Slide 32 text

RQ3: Does incorporating personality improve RecSys performance? 32 Model HR@3 NDCG@3 HR@5 NDCG@5 HR@10 NDCG@10 NCF+ Random 0.923 0.675 0.965 0.605 0.975 0.660 NCF + Same 0.918 0.683 0.967 0.630 0.975 0.662 NCF + Most salient personality 0.939 0.714 0.969 0.676 0.977 0.707 NCF + Soft-label 0.936 0.810 0.965 0.867 0.973 0.831 NCF + Hard-coded 0.948 0.849 0.961 0.826 0.977 0.848 Experiment Results: Amazon

Slide 33

Slide 33 text

Model HR@3 NDCG@3 HR@5 NDCG@5 HR@10 NDCG@10 NCF+ Random 0.923 0.675 0.965 0.605 0.975 0.660 NCF + Same 0.918 0.683 0.967 0.630 0.975 0.662 NCF + Most salient personality 0.939 0.714 0.969 0.676 0.977 0.707 NCF + Soft-label 0.936 0.810 0.965 0.867 0.973 0.831 NCF + Hard-coded 0.948 0.849 0.961 0.826 0.977 0.848 33 • Observation 1: NCF + Most salient personality is larger than NCF + Same / Random in terms of NDCG. • Conclusion: adding personality label indeed helps Experiment Results: Amazon RQ3: Does incorporating personality improve RecSys performance?

Slide 34

Slide 34 text

Model HR@3 NDCG@3 HR@5 NDCG@5 HR@10 NDCG@10 NCF+ Random 0.923 0.675 0.965 0.605 0.975 0.660 NCF + Same 0.918 0.683 0.967 0.630 0.975 0.662 NCF + Most salient personality 0.939 0.714 0.969 0.676 0.977 0.707 NCF + Soft-label 0.936 0.810 0.965 0.867 0.973 0.831 NCF + Hard-coded 0.948 0.849 0.961 0.826 0.977 0.848 34 RQ3: Does incorporating personality improve RecSys performance? Experiment Results: Amazon

Slide 35

Slide 35 text

Model HR@3 NDCG@3 HR@5 NDCG@5 HR@10 NDCG@10 NCF+ Random 0.923 0.675 0.965 0.605 0.975 0.660 NCF + Same 0.918 0.683 0.967 0.630 0.975 0.662 NCF + Most salient personality 0.939 0.714 0.969 0.676 0.977 0.707 NCF + Soft-label 0.936 0.810 0.965 0.867 0.973 0.831 NCF + Hard-coded 0.948 0.849 0.961 0.826 0.977 0.848 35 • Observation 2: NCF + Soft-labeled/Hard-coded is larger than NCF + Most Salient in terms of NDCG • Conclusion: using multiple personality features are better than one single personality feature RQ3: Does incorporating personality improve RecSys performance? Experiment Results: Amazon (Beauty)

Slide 36

Slide 36 text

36 Model HR@3 NDCG@3 HR@5 NDCG@5 HR@10 NDCG@10 NCF+ Random 0.510 0.406 0.628 0.454 0.777 0.504 NCF + Same 0.501 0.403 0.622 0.454 0.777 0.502 NCF + Most salient personality 0.516 0.415 0.631 0.463 0.795 0.511 NCF + Soft-label 0.528 0.421 0.656 0.471 0.805 0.511 NCF + Hard-coded 0.503 0.398 0.622 0.447 0.758 0.498 Experiment Results: Personality2018 RQ3: Does incorporating personality improve RecSys performance?

Slide 37

Slide 37 text

Model HR@3 NDCG@3 HR@5 NDCG@5 HR@10 NDCG@10 NCF+ Random 0.510 0.406 0.628 0.454 0.777 0.504 NCF + Same 0.501 0.403 0.622 0.454 0.777 0.502 NCF + Most salient personality 0.516 0.415 0.631 0.463 0.795 0.511 NCF + Soft-label 0.528 0.421 0.656 0.471 0.805 0.511 NCF + Hard-coded 0.503 0.398 0.622 0.447 0.758 0.498 37 • Observation: NCF + Soft-labeled model outperforms the other models. • Conclusion 1: adding personality label indeed helps RQ3: Does incorporating personality improve RecSys performance? Experiment Results: Personality2018

Slide 38

Slide 38 text

Model HR@3 NDCG@3 HR@5 NDCG@5 HR@10 NDCG@10 NCF+ Random 0.510 0.406 0.628 0.454 0.777 0.504 NCF + Same 0.501 0.403 0.622 0.454 0.777 0.502 NCF + Most salient personality 0.516 0.415 0.631 0.463 0.795 0.511 NCF + Soft-label 0.528 0.421 0.656 0.471 0.805 0.511 NCF + Hard-coded 0.503 0.398 0.622 0.447 0.758 0.498 38 • Observation: NCF + Soft-labeled model outperforms the other models. • Conclusion 1: adding personality label indeed helps Conclusion 2: the improvement in Personality 2018 is less obvious than in Amazon Beauty dataset RQ3: Does incorporating personality improve RecSys performance? Experiment Results: Personality2018

Slide 39

Slide 39 text

RQ4: How does personality information improve the RecSys performance? • HR and NDCG group by 5 personalities : Amazon (Beauty) 39 Group OPEN NEU CON EXT AGR + - + - + - + - + - HR 0.833 (+11%) 0.750 0.933 (+12%) 0.833 0.883 (+21%) 0.727 0.970 (+11%) 0.872 0.968 (+12%) 0.864 NDCG 0.729 (+34%) 0.545 0.835 (+56%) 0.536 0.769 (+57%) 0.490 0.882 (+47%) 0.600 0.878 (+48%) 0.593 +: w/ personality -: w/o personality

Slide 40

Slide 40 text

• HR and NDCG group by 5 personalities : Amazon (Beauty) 40 Group OPEN NEU CON EXT AGR + - + - + - + - + - HR 0.833 (+11%) 0.75 0.933 (+12%) 0.833 0.883 (+21%) 0.727 0.97 (+11%) 0.872 0.968 (+12%) 0.864 NDCG 0.729 (+34%) 0.545 0.835 (+56%) 0.536 0.769 (+57%) 0.490 0.882 (+47%) 0.600 0.878 (+48%) 0.593 +: w/ personality -: w/o personality • Observation : CON has the largest improvement, OPEN has the least improvement • Conclusion: CON users have the largest impact, OPEN users have the least impact RQ4: How does personality information improve the RecSys performance?

Slide 41

Slide 41 text

• HR and NDCG group by 5 personalities : Personality2018 41 Group OPEN NEU CON EXT AGR + - + - + - + - + - HR 0.535 (-2%) 0.547 0.489 (-4%) 0.511 0.475 (+8%) 0.441 0.611 (+10%) 0.556 0.621 (+13%) 0.552 NDCG 0.420 (-0.4%) 0.422 0.390 (-6%) 0.415 0.358 (-0.8%) 0.361 0.412 (+0.2%) 0.411 0.512 (+19%) 0.430 +: w/ personality -: w/o personality RQ4: How does personality information improve the RecSys performance?

Slide 42

Slide 42 text

RQ4: How does personality information improve the RecSys performance? • HR and NDCG group by 5 personalities : Personality2018 42 Group OPEN NEU CON EXT AGR + - + - + - + - + - HR 0.535 (-2%) 0.547 0.489 (-4%) 0.511 0.475 (+8%) 0.441 0.611 (+10%) 0.556 0.621 (+13%) 0.552 NDCG 0.420 (-0.4%) 0.422 0.390 (-6%) 0.415 0.358 (-0.8%) 0.361 0.412 (+0.2%) 0.411 0.512 (+19%) 0.430 +: w/ personality -: w/o personality • Observation : AGR has the largest improvement • Conclusion: AGR has the largest impact; the results are not consistent with Amazon Beauty dataset

Slide 43

Slide 43 text

43 Conclusion and Limitations In this work, we make a preliminary attempt to explore how to automatically infer users’ personality traits from product reviews and how the inferred traits can benefit the state-of-the-art automated recommendation processes. We observe that recommendation performance is indeed boosted by incorporating personality information.

Slide 44

Slide 44 text

44 Conclusion and Limitations Limitations: 1. Capturing personality from the review texts may lead to selective bias. 2. More in-depth investigation is necessary on how personality affects recommendation and users’ behavior. 3. Openness, conscientiousness and neuroticism features do not have an obvious impact on the recommendation performance. 4. The 5 personalities are encoded independently of each other in our model. But there is a correlation between these personality traits in real life.

Slide 45

Slide 45 text

Thank you! Any questions? 45 Lu Xinyuan 📧📧 luxinyuan@u.nus.edu

Slide 46

Slide 46 text

References • Ivan et al. 2013 Relating personality types with user preferences in multiple entertainment domains • M. B. Mariappan et al. 2012. Facefetch: A user emotion driven multimedia content recommendation system based on facial expression recognition. • Joanne Hinds, Emma J.Williams, and Adam N. Joinson. “it wouldn’t happen to me”: Privacy concerns and perspectives following the cambridge analytica scandal. International Journal of Human-Computer Studies, 143:102498, 2020. • Wu Youyou, David Stillwell, H. Andrew Schwartz, and Michal Kosinski. Birds of a feather do flock together: Behavior-based personality-assessment method reveals personality similarity among couples and friends. Psychological Science, 28(3):276–284, 2017. PMID: 28059682. • Hsin-Chang Yang and Zi-Rui Huang. Mining personality traits from social messages for game recommender systems. Knowledge-Based Systems, 165:157–168, 2019. 86 • Nana Yaw Asabere, Amevi Acakpovi, and Mathias Bennet Michael. Improving socially aware recommendation accuracy through personality. IEEE Transactions on Affective Computing, 9(3):351–361, 2017. 84 • W. Wu, L. Chen, and Y. Zhao, “Personalizing recommendation diversity based on user personality,” User Modeling and User-Adapted Interaction, vol. 28, no. 3, pp. 237–276, aug 2018. • Ignacio Fernandez-Tobıas, Matthias Braunhofer, Mehdi Elahi, Francesco Ricci, and Iv´an Cantador. Alleviating the new user problem in collaborative filtering by exploiting personality information. User Modeling and User- Adapted Interaction, 26(2):221–255, 2016. 46

Slide 47

Slide 47 text

47 Supplementary Slides

Slide 48

Slide 48 text

active users all users active users B A Personality score B Personality Detector C Personality score C C’ A’ Step 1 Train Validation/Test Personality Detector A’ Step 2 Step 3 Step 4 Step 5 Data Collection Pipeline 48

Slide 49

Slide 49 text

Data Collection Pipeline ◎ Step 1: Filtering “active” users from raw data ◎ Step 2: In respect to active users, randomly select part of the data as A, the rest of the data as B. Annotating A with a personality score by Receptiviti API. We got A’ after the annotation. Size of A: 683 users. Size of A’ [Currently 500 users] Size of B: 10268-683 users = 9585 users ◎ Step 3: Train and test a Personality Detector in A’. 537 for training, 136 for testing. (80% for training, 20% for testing) ◎ Step 4: Apply the Personality Detector in B, to select the active users in B, which we call as C. (B->C 9585 users ->2345 users) ◎ Step 5: Annotating C with a personality score by Receptiviti. We got C’ after the annotation. Size of C’ =~ 3028-683=2345 users Output: A’+C’=3 million words ~3028 users. 49

Slide 50

Slide 50 text

Data Collection Pipeline Step 1: active users satisfied the following conditions: 1. Each user purchased at least 10 items. 2. Each item contains 30~80 words review. 3. After filtering, a total of 10,268 users are left. Total words are 10,170,213. The average number of words for each user is 990.48. The average number of words for each review is 51.01. 50 Max Words Min Words Min Items No. of Users After Filtering Total Words Average Words for Each User Average Words for Each Review 80 30 10 10,268 10,170,213 990.48 51.01