Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Improving Recommendation Systems with User Personality Inferred from Product Reviews

wing.nus
March 03, 2023

Improving Recommendation Systems with User Personality Inferred from Product Reviews

Personality is a psychological factor that reflects people’s preferences, which in turn influences their decision-making. We hypothesize that accurate modeling of users’ personalities improves recommendation systems’ performance. However, acquiring such personality profiles is both sensitive and expensive. We address this problem by introducing a novel method to automatically extract personality profiles from public product review text. We then design and assess three context-aware recommendation architectures that leverage the profiles to test our hypothesis.

Experiments on our two newly contributed personality datasets — Amazon-beauty and Amazon-music — validate our hypothesis, showing performance boosts of 3–28%. Our analysis uncovers that varying personality types contribute differently to recommendation performance: open and extroverted personalities are most helpful in music recommendation, while a conscientious personality is most helpful in beauty product recommendation.

wing.nus

March 03, 2023
Tweet

More Decks by wing.nus

Other Decks in Education

Transcript

  1. I
    Im
    mp
    pr
    ro
    ov
    vi
    in
    ng
    g R
    Re
    ec
    co
    om
    mm
    me
    en
    nd
    da
    at
    ti
    io
    on
    n S
    Sy
    ys
    st
    te
    em
    ms
    s w
    wi
    it
    th
    h U
    Us
    se
    er
    r
    P
    Pe
    er
    rs
    so
    on
    na
    al
    li
    it
    ty
    y I
    In
    nf
    fe
    er
    rr
    re
    ed
    d f
    fr
    ro
    om
    m P
    Pr
    ro
    od
    du
    uc
    ct
    t R
    Re
    ev
    vi
    ie
    ew
    ws
    s
    Lu Xinyuan (Presenter)1,2 Kan Min-Yen2
    IRS Workshop in WSDM’23
    March 3
    1 ISEP Program, NUS Graduate School
    2 School of Computing, National University of Singapore

    View Slide

  2. RecSys: Item recommendations to the end users
    2
    Recommendation System (RecSys)
    movies, music, products…
    Search
    Recommendations

    View Slide

  3. RecSys were designed to stimulate user’s consumption behavior. Such user
    behaviors are largely influenced by user’s profile
    3
    Recommendation System (RecSys)
    movies, music, products…
    Search
    Recommendations
    age
    education
    demographic
    information
    user profile

    View Slide

  4. Traditional RecSys focuses on a user’s static profile
    User’s psychology – e.g., personality, emotion – can help model a user’s
    dynamic profile.
    4
    Recommendation System (RecSys)
    age
    education
    demographic
    information
    user’s static profile
    user’s dynamic profile
    personality
    emotion

    View Slide

  5. 5
    Personality has been shown to be directly related to user preference
    Example:
    o Open people are more likely to watch comedy movies [Ivan et al. 2013]
    o Open people favor energetic music genres [Mariappan et al. 2012]
    Why we need personality in RecSys
    Personality affects user preference

    View Slide

  6. 6
    The recommendation should depend on user’s current emotion state.
    Example:
    o The same user is likely to watch comedy movies when he/she is
    happy while watching tragedy movies when he/she is sad.
    Why we need emotion in RecSys
    Emotion state can influence people’s decisions

    View Slide

  7. Privacy of Personality Information.
    7
    Challenges
    • Personality information can be misused by malicious users to cause
    undesirable outcomes. [Hinds et al. 2020]
    • A challenging balance: utilizing information vs protecting privacy

    View Slide

  8. Lack of Large Datasets
    8
    Challenges
    • Ground-truth psychology
    information is expensive to collect
    from users.
    • Currently, only small-scale datasets
    were built in existing works.
    • In 2018, a larger dataset
    myPersonality has been stopped
    sharing
    https://sites.google.com/michalkosinski.com/mypersonality

    View Slide

  9. Subjectivity of Personality Measurement
    9
    Challenges
    • The measurement of personality
    can be very subjective.
    • The reference-group effect often
    occurs. [Wu et al. 2017]
    • The inaccurate measurement of
    users’ personality trait is likely to
    bring more noise.

    View Slide

  10. 10
    Personality Model
    • Openness to experience: conventional vs creative
    thinking
    • Conscientiousness: disorganized vs organized
    • Extraversion: engagement with the external world
    • Agreeableness: need for social harmony
    • Neuroticism: emotional instability
    q OCEAN (Big 5)

    View Slide

  11. 11
    • 10-item Big Five Inventory (BFI) test
    • 5-level Likert scale (Strongly agree,
    agree, neutral, disagree, strongly disagree)
    • Example: I am outgoing, sociable[1 2
    3 4 5] (Extraversion related)
    • Time consuming
    Explicit Method: Questionnaire
    Personality Detection

    View Slide

  12. 12
    • Language use has an individual
    difference
    • Infer from texts, social media posts
    Implicit Methods : Automatic personality detection
    Personality Detection
    • APIs:
    1) IBM Personality Insights: discontinued after
    2021
    2) Receptiviti: sentence level
    3) SenticNet: lexicon-based approach

    View Slide

  13. Receptiviti API
    13
    • Receptiviti API is a computational language psychology platform for understanding
    human behavior.
    • Receptiviti was co-founded by Prof. James W. Pennebaker, the former Chair of the
    Department of Psychology, and the inventor of LIWC -- the gold-standard algorithm
    in the field of language psychology.
    https://www.receptiviti.com/

    View Slide

  14. Receptiviti API: Personality API package
    14
    • We use Personality API Package
    • Our budget: $250 USD/month includes 500,000 words. 6-month subscription.
    • Ongoing work: We’ll discuss our own methods to replicate a personality API.
    https://www.receptiviti.com/personality

    View Slide

  15. Receptiviti example
    15
    • Input: Pieces of texts. The more
    words in the text, the higher the
    accuracy.
    • In Receptiviti, more than 300 words
    are needed.
    • Output: Big 5 categorypersonality score

    View Slide

  16. 16
    • Serendipity 2018
    o A version of MovieLens dataset.
    o It is used for serendipity in RecSys.
    o There are 10 million ratings.
    Drawback: It is basically offline evaluation of recommendation algorithms. It
    did not contain real-time feedback (online) evaluation.
    • Personality 2018
    o A version of MovieLens dataset.
    o Includes: personality information of the users + movie ratings
    Drawback: It only contains the Big 5 score of 1,834 users along with the movie
    rating that were given by these users
    Current Datasets

    View Slide

  17. Taobao Serendipity
    17
    Datasets
    • A user survey on Mobile Taobao
    • The users first received a recommended product, then completed a
    questionnaire that assessed immediate feedback.
    • Fill in two psychological quizzes:
    1) 10-item Curiosity and Exploration Inventory-II (CEI-II)
    2) 10-Item Personality Inventory (TIPI)
    • This dataset contains 11,383 users’ feedback in the user survey.
    Drawback: Due to the commercial privacy concerns, the Taobao item
    descriptions and item category information are not public available

    View Slide

  18. 18
    My Work
    How can we acquire personality data for RecSys?
    How can we explore the impact of personality on RecSys?

    View Slide

  19. 19
    My Work
    How can we acquire personality data for RecSys?
    How can we explore the impact of personality on RecSys?

    View Slide

  20. Amazon Review dataset (updated version in 2018)
    2014
    This is a large crawl of product reviews from Amazon. This dataset contains 82.83 million unique
    reviews, from around 20 million users.
    Metadata
    ○ reviews and ratings
    ○ item-to-item relationships (e.g. "people who bought X also bought Y")
    ○ timestamps
    ○ helpfulness votes
    ○ product image (and CNN features)
    ○ Price
    ○ Product descriptions
    ○ category
    ○ Sales Rank
    20
    download: https://nijianmo.github.io/amazon/index.html
    Infor:https://cseweb.ucsd.edu/~jmcauley/datasets.html#amazon_reviews
    2018
    • More reviews:
    • The total number of reviews is 233.1 million (142.8 million in 2014).
    • Newer reviews:
    • Current data includes reviews in the range May 1996 - Oct 2018.

    View Slide

  21. Amazon Review dataset
    21
    user ID
    item ID
    rating score
    review text

    View Slide

  22. Personality Data Preparation
    • To study whether personality has different influences on users’ behaviours
    for different domains, we choose All-Beauty and Music as 2 domains.
    • Input: Each user's review text (more than 300 words)
    • Output: Each user’s big 5 personality score.
    22
    Dataset # of items # of users # of ratings % of
    interaction
    Avg. Words
    per user
    Avg. Words
    per review
    Amazon-beauty 85 991 5269 6.26% 990.48 466.43
    Amazon-Music 8,895 1,791 28,399 0.18% 51.01 51.18
    • Sample dataset after filtering:
    • 80% training / 20% testing

    View Slide

  23. Personality Data Preparation
    • To study the difference between questionnaire-based personality trait
    scores with our review-based automatic personality trait detection
    scores, we also include an existing dataset : Personality 2018.
    • User: 1,834;
    • MovieID [1-197,529]
    • Raw Ratings: 1,028,751 (scores 1-7)
    23
    # of items # of users # of ratings % of interaction
    197,529 1,834 339,000 0.28%

    View Slide

  24. 24
    How can we acquire personality data for RecSys?
    How can we explore the impact of personality on RecSys?
    My Work

    View Slide

  25. Models
    • Baseline Models:
    • (1) Neural Collaborative Filtering (NCF)
    • (2) NCF + Random: randomly assign personality
    label
    • (3) NCF + Same: assign same personality label
    • Personality-based Models:
    • (1) NCF + Most salient personality: assign most
    salient personality label
    25
    Single
    personality

    View Slide

  26. Models
    • Baseline Models:
    • (1) Neural Collaborative Filtering (NCF)
    • (2) NCF + Random: randomly assign personality
    label
    • (3) NCF + Same: assign same personality label
    • Personality-based Models:
    • (1) NCF + Most salient personality: assign most
    salient personality label
    • (2) NCF + Soft-labeled: take all personality
    scores and obtain a personality distribution
    with softmax.
    • (3) NCF + Hard-coded: directly add all
    personality scores as additional feature vector
    in the network
    26
    Multi
    personality
    distribution
    Single
    personality

    View Slide

  27. Model
    27

    View Slide

  28. RQ1: Can we accurately detect personality from texts?
    28
    • To evaluate whether we can accurately detect personality traits from
    texts, we analyze the personality scores inferred by the Receptiviti
    API for each user.
    • We select the users that receive the top 10 highest scores for each
    personality type, in a total of 100 samples.
    • Two graduates are given review texts and personality. We ask them
    to choose whether the sampled review texts accurately match
    their inferred personality, choosing between three options of yes,
    no, or not sure.

    View Slide

  29. RQ1: Can we accurately detect personality from texts?
    29
    • We find that the inferred personality matches with the review text in 81% of
    the Amazon-beauty samples, and 79% of the samples from Amazon-music.
    The average Cohen’s Kappa is 0.70.
    Personality
    Type
    Score Review Texts
    Extroversion 75.06 Love this shampoo! Recommended by a friend! The color really lasts!!!
    Agreeable 80.06 Great product - my wife loves it
    Agreeable 78.18 Great deal and leaves my kids smelling awesome!
    I bought a box of them years ago and we still have some left!!!
    Neuroticism 62.28 Nope. It smells like artificial bananas, and this smell does linger.
    It’s pure liquid, there is no thickness to it at all, it’s like pouring banana
    water on your head that lathers.
    It does not help with an itchy scalp either.

    View Slide

  30. RQ2: What is the distribution of users’ personalities?
    30
    • We further analyze the personality distribution for all users by plotting the
    score histograms for each personality trait in the Amazon- beauty dataset and
    the Amazon-music dataset.

    View Slide

  31. RQ2: What is the distribution of users’ personalities?
    31
    Summary
    • the personality traits of users are not evenly distributed. There are more instances
    of people with certain personality traits (e.g., agree- ableness) than others (e.g.,
    neuroticism). A possible reason is that people with certain personalities are more
    willing to write product reviews.
    • The distributions for the two domains are generally the same, with higher
    agreeable scores and lower neurotic scores. However, there is a slight difference.
    For example, the scores of extroverts in music are generally higher than that in the
    beauty domain. This could be explained by the possibility that people who are
    passionate about music may be more emotional.

    View Slide

  32. RQ3: Does incorporating personality improve RecSys performance?
    32
    Model HR@3 NDCG@3 HR@5 NDCG@5 HR@10 NDCG@10
    NCF+ Random 0.923 0.675 0.965 0.605 0.975 0.660
    NCF + Same 0.918 0.683 0.967 0.630 0.975 0.662
    NCF + Most salient
    personality
    0.939 0.714 0.969 0.676 0.977 0.707
    NCF + Soft-label 0.936 0.810 0.965 0.867 0.973 0.831
    NCF + Hard-coded 0.948 0.849 0.961 0.826 0.977 0.848
    Experiment Results: Amazon

    View Slide

  33. Model HR@3 NDCG@3 HR@5 NDCG@5 HR@10 NDCG@10
    NCF+ Random 0.923 0.675 0.965 0.605 0.975 0.660
    NCF + Same 0.918 0.683 0.967 0.630 0.975 0.662
    NCF + Most salient
    personality
    0.939 0.714 0.969 0.676 0.977 0.707
    NCF + Soft-label 0.936 0.810 0.965 0.867 0.973 0.831
    NCF + Hard-coded 0.948 0.849 0.961 0.826 0.977 0.848
    33
    • Observation 1: NCF + Most salient personality is larger
    than NCF + Same / Random in terms of NDCG.
    • Conclusion: adding personality label indeed helps
    Experiment Results: Amazon
    RQ3: Does incorporating personality improve RecSys performance?

    View Slide

  34. Model HR@3 NDCG@3 HR@5 NDCG@5 HR@10 NDCG@10
    NCF+ Random 0.923 0.675 0.965 0.605 0.975 0.660
    NCF + Same 0.918 0.683 0.967 0.630 0.975 0.662
    NCF + Most salient
    personality
    0.939 0.714 0.969 0.676 0.977 0.707
    NCF + Soft-label 0.936 0.810 0.965 0.867 0.973 0.831
    NCF + Hard-coded 0.948 0.849 0.961 0.826 0.977 0.848
    34
    RQ3: Does incorporating personality improve RecSys performance?
    Experiment Results: Amazon

    View Slide

  35. Model HR@3 NDCG@3 HR@5 NDCG@5 HR@10 NDCG@10
    NCF+ Random 0.923 0.675 0.965 0.605 0.975 0.660
    NCF + Same 0.918 0.683 0.967 0.630 0.975 0.662
    NCF + Most salient
    personality
    0.939 0.714 0.969 0.676 0.977 0.707
    NCF + Soft-label 0.936 0.810 0.965 0.867 0.973 0.831
    NCF + Hard-coded 0.948 0.849 0.961 0.826 0.977 0.848
    35
    • Observation 2: NCF + Soft-labeled/Hard-coded is
    larger than NCF + Most Salient in terms of NDCG
    • Conclusion: using multiple personality features are
    better than one single personality feature
    RQ3: Does incorporating personality improve RecSys performance?
    Experiment Results: Amazon (Beauty)

    View Slide

  36. 36
    Model HR@3 NDCG@3 HR@5 NDCG@5 HR@10 NDCG@10
    NCF+ Random 0.510 0.406 0.628 0.454 0.777 0.504
    NCF + Same 0.501 0.403 0.622 0.454 0.777 0.502
    NCF + Most salient
    personality
    0.516 0.415 0.631 0.463 0.795 0.511
    NCF + Soft-label 0.528 0.421 0.656 0.471 0.805 0.511
    NCF + Hard-coded 0.503 0.398 0.622 0.447 0.758 0.498
    Experiment Results: Personality2018
    RQ3: Does incorporating personality improve RecSys performance?

    View Slide

  37. Model HR@3 NDCG@3 HR@5 NDCG@5 HR@10 NDCG@10
    NCF+ Random 0.510 0.406 0.628 0.454 0.777 0.504
    NCF + Same 0.501 0.403 0.622 0.454 0.777 0.502
    NCF + Most salient
    personality
    0.516 0.415 0.631 0.463 0.795 0.511
    NCF + Soft-label 0.528 0.421 0.656 0.471 0.805 0.511
    NCF + Hard-coded 0.503 0.398 0.622 0.447 0.758 0.498
    37
    • Observation: NCF + Soft-labeled model outperforms
    the other models.
    • Conclusion 1: adding personality label indeed helps
    RQ3: Does incorporating personality improve RecSys performance?
    Experiment Results: Personality2018

    View Slide

  38. Model HR@3 NDCG@3 HR@5 NDCG@5 HR@10 NDCG@10
    NCF+ Random 0.510 0.406 0.628 0.454 0.777 0.504
    NCF + Same 0.501 0.403 0.622 0.454 0.777 0.502
    NCF + Most salient
    personality
    0.516 0.415 0.631 0.463 0.795 0.511
    NCF + Soft-label 0.528 0.421 0.656 0.471 0.805 0.511
    NCF + Hard-coded 0.503 0.398 0.622 0.447 0.758 0.498
    38
    • Observation: NCF + Soft-labeled model outperforms
    the other models.
    • Conclusion 1: adding personality label indeed helps
    Conclusion 2: the improvement in Personality 2018 is
    less obvious than in Amazon Beauty dataset
    RQ3: Does incorporating personality improve RecSys performance?
    Experiment Results: Personality2018

    View Slide

  39. RQ4: How does personality information improve
    the RecSys performance?
    • HR and NDCG group by 5 personalities : Amazon (Beauty)
    39
    Group OPEN NEU CON EXT AGR
    + - + - + - + - + -
    HR 0.833
    (+11%) 0.750
    0.933
    (+12%) 0.833
    0.883
    (+21%) 0.727
    0.970
    (+11%) 0.872
    0.968
    (+12%) 0.864
    NDCG 0.729
    (+34%)
    0.545 0.835
    (+56%)
    0.536 0.769
    (+57%)
    0.490 0.882
    (+47%)
    0.600 0.878
    (+48%)
    0.593
    +: w/ personality
    -: w/o personality

    View Slide

  40. • HR and NDCG group by 5 personalities : Amazon (Beauty)
    40
    Group OPEN NEU CON EXT AGR
    + - + - + - + - + -
    HR 0.833
    (+11%) 0.75
    0.933
    (+12%) 0.833
    0.883
    (+21%) 0.727
    0.97
    (+11%) 0.872
    0.968
    (+12%) 0.864
    NDCG 0.729
    (+34%)
    0.545 0.835
    (+56%)
    0.536 0.769
    (+57%)
    0.490 0.882
    (+47%)
    0.600 0.878
    (+48%)
    0.593
    +: w/ personality
    -: w/o personality
    • Observation : CON has the largest improvement,
    OPEN has the least improvement
    • Conclusion: CON users have the largest impact, OPEN
    users have the least impact
    RQ4: How does personality information improve
    the RecSys performance?

    View Slide

  41. • HR and NDCG group by 5 personalities : Personality2018
    41
    Group OPEN NEU CON EXT AGR
    + - + - + - + - + -
    HR 0.535
    (-2%)
    0.547 0.489
    (-4%) 0.511
    0.475
    (+8%) 0.441
    0.611
    (+10%) 0.556
    0.621
    (+13%) 0.552
    NDCG 0.420
    (-0.4%)
    0.422 0.390
    (-6%)
    0.415 0.358
    (-0.8%)
    0.361 0.412
    (+0.2%)
    0.411 0.512
    (+19%)
    0.430
    +: w/ personality
    -: w/o personality
    RQ4: How does personality information improve
    the RecSys performance?

    View Slide

  42. RQ4: How does personality information improve
    the RecSys performance?
    • HR and NDCG group by 5 personalities : Personality2018
    42
    Group OPEN NEU CON EXT AGR
    + - + - + - + - + -
    HR 0.535
    (-2%)
    0.547 0.489
    (-4%) 0.511
    0.475
    (+8%) 0.441
    0.611
    (+10%) 0.556
    0.621
    (+13%) 0.552
    NDCG 0.420
    (-0.4%)
    0.422 0.390
    (-6%)
    0.415 0.358
    (-0.8%)
    0.361 0.412
    (+0.2%)
    0.411 0.512
    (+19%)
    0.430
    +: w/ personality
    -: w/o personality
    • Observation : AGR has the largest improvement
    • Conclusion: AGR has the largest impact; the results are
    not consistent with Amazon Beauty dataset

    View Slide

  43. 43
    Conclusion and Limitations
    In this work, we make a preliminary attempt to explore how to
    automatically infer users’ personality traits from product reviews
    and how the inferred traits can benefit the state-of-the-art
    automated recommendation processes. We observe that
    recommendation performance is indeed boosted by incorporating
    personality information.

    View Slide

  44. 44
    Conclusion and Limitations
    Limitations:
    1. Capturing personality from the review texts may lead to selective bias.
    2. More in-depth investigation is necessary on how personality affects
    recommendation and users’ behavior.
    3. Openness, conscientiousness and neuroticism features do not have an obvious
    impact on the recommendation performance.
    4. The 5 personalities are encoded independently of each other in our model. But
    there is a correlation between these personality traits in real life.

    View Slide

  45. Thank you!
    Any questions?
    45
    Lu Xinyuan
    📧📧 [email protected]

    View Slide

  46. References
    • Ivan et al. 2013 Relating personality types with user preferences in multiple entertainment domains
    • M. B. Mariappan et al. 2012. Facefetch: A user emotion driven multimedia content recommendation system
    based on facial expression recognition.
    • Joanne Hinds, Emma J.Williams, and Adam N. Joinson. “it wouldn’t happen to me”: Privacy concerns and
    perspectives following the cambridge analytica scandal. International Journal of Human-Computer Studies,
    143:102498, 2020.
    • Wu Youyou, David Stillwell, H. Andrew Schwartz, and Michal Kosinski. Birds of a feather do flock together:
    Behavior-based personality-assessment method reveals personality similarity among couples and friends.
    Psychological Science, 28(3):276–284, 2017. PMID: 28059682.
    • Hsin-Chang Yang and Zi-Rui Huang. Mining personality traits from social messages for game recommender
    systems. Knowledge-Based Systems, 165:157–168, 2019. 86
    • Nana Yaw Asabere, Amevi Acakpovi, and Mathias Bennet Michael. Improving socially aware recommendation
    accuracy through personality. IEEE Transactions on Affective Computing, 9(3):351–361, 2017. 84
    • W. Wu, L. Chen, and Y. Zhao, “Personalizing recommendation diversity based on user personality,” User
    Modeling and User-Adapted Interaction, vol. 28, no. 3, pp. 237–276, aug 2018.
    • Ignacio Fernandez-Tobıas, Matthias Braunhofer, Mehdi Elahi, Francesco Ricci, and Iv´an Cantador. Alleviating the
    new user problem in collaborative filtering by exploiting personality information. User Modeling and User-
    Adapted Interaction, 26(2):221–255, 2016.
    46

    View Slide

  47. 47
    Supplementary Slides

    View Slide

  48. active
    users
    all users
    active users
    B
    A
    Personality score
    B
    Personality
    Detector
    C
    Personality score
    C
    C’
    A’
    Step 1
    Train Validation/Test
    Personality
    Detector
    A’
    Step 2
    Step 3 Step 4
    Step 5
    Data Collection Pipeline
    48

    View Slide

  49. Data Collection Pipeline
    ◎ Step 1: Filtering “active” users from raw data
    ◎ Step 2: In respect to active users, randomly select part of the data as A, the rest of
    the data as B. Annotating A with a personality score by Receptiviti API. We got A’
    after the annotation.
    Size of A: 683 users. Size of A’ [Currently 500 users]
    Size of B: 10268-683 users = 9585 users
    ◎ Step 3: Train and test a Personality Detector in A’. 537 for training, 136 for testing.
    (80% for training, 20% for testing)
    ◎ Step 4: Apply the Personality Detector in B, to select the active users in B, which we
    call as C. (B->C 9585 users ->2345 users)
    ◎ Step 5: Annotating C with a personality score by Receptiviti. We got C’ after the
    annotation. Size of C’ =~ 3028-683=2345 users
    Output: A’+C’=3 million words ~3028 users.
    49

    View Slide

  50. Data Collection Pipeline
    Step 1: active users satisfied the following conditions:
    1. Each user purchased at least 10 items.
    2. Each item contains 30~80 words review.
    3. After filtering, a total of 10,268 users are left. Total words are 10,170,213. The
    average number of words for each user is 990.48. The average number of words for
    each review is 51.01.
    50
    Max Words Min Words Min Items No. of Users
    After
    Filtering
    Total Words Average
    Words for
    Each User
    Average
    Words for
    Each Review
    80 30 10 10,268 10,170,213 990.48 51.01

    View Slide