Slide 1

Slide 1 text

LLM-generated Explanations for Recommender Systems Symbol Emergence System Lab. Journal Clab Calendar 16 June 2025 Yusuke Sasai 1

Slide 2

Slide 2 text

Paper Information LLM-generated Explanations for Recommender Systems • Authors: Sebastian Lubos, Thi Ngoc Trang Tran, Alexander Felfernig, Seda Polat Erdeniz, Viet-Man Le • ACM UMAP 2024: 28 June 2024 • https://dl.acm.org/doi/abs/10.1145/3631700.3665185 2

Slide 3

Slide 3 text

Contents 1. Background / Research Goal 2. Explanation Type 3. Experiment 4. Result 5. Conclusion 3

Slide 4

Slide 4 text

Contents 1. Background / Research Goal 2. Explanation Type 3. Experiment 4. Result 5. Conclusion 4

Slide 5

Slide 5 text

Background • A recommender system is a system that proposes relevant items to a user, considering their individual preferences. • Providing “explanations” and clearly communicating the reasons for recommendations contributes to improving the overall user experience. 5 (Created by Gemini)

Slide 6

Slide 6 text

Problem • Recommender systems often lack transparency and understandability. • Transparency: How clearly the reasons for the recommendation are disclosed. • Understandability: How well the user can comprehend the explanation. • The generation of "robust and sound natural language explanations" is an ongoing research topic. • Large Language Models (LLMs) are considered a very effective approach to this problem. • Advanced NLP capabilities 6

Slide 7

Slide 7 text

Research Goal To validate the effectiveness of personalized explanations generated by Large Language Models (LLMs) for items recommended by a recommender system. They define the following three research questions (RQs): 1. Do users prefer LLM-generated explanations compared to those from existing methods? 2. How do users rate the quality of LLM-generated explanations compared to those from existing methods? 3. What features of LLM-generated explanations are valued by users? 7

Slide 8

Slide 8 text

Contents 1. Background / Research Goal 2. Explanation Type 3. Experiment 4. Result 5. Conclusion 8

Slide 9

Slide 9 text

Explanation Types for Recommendations (Baselines) Explanation types can be divided into three categories based on the recommendation algorithm [Tran 21]: • Feature-based Explanations • Item-based Explanations • Knowledge-based Explanations 9 [Tran 21 ]Thi Ngoc Trang Tran, Viet Man Le, Muesluem Atas, Alexander Felfernig, Martin Stettinger, and Andrei Popescu. 2021. Do Users Appreciate Explanations of Recommendations? An Analysis in the Movie Domain. In Fifteenth ACM Conference on Recommender Systems. 645–650.

Slide 10

Slide 10 text

Feature-based Explanations (FBExp) • Example: The movie "Legends of the Fall" is recommended because you like the Romance, Drama, War, and Western genres. Additionally, "Legends of the Fall" is similar to movies you have liked in the past. 10 Similarity • Feature-based recommendations • A method that recommends based on the similarity between user preferences and item features. I like dinosaurs. Recommend Dinosaurs movie User

Slide 11

Slide 11 text

Item-based Explanations (IBExp) • Item-based (collaborative filtering) recommendations • A recommendation method that uses past user ratings to recommend similar items 11 • Example: "The Shawshank Redemption" is recommended to you based on your ratings of "Forrest Gump", "The Hateful Eight", and "Up", because other users with similar preferences rated this movie positively. Similarity review recommend User

Slide 12

Slide 12 text

Knowledge-based Explanations (KBExp) • Knowledge-based recommendations • A method that recommends items based on preferences explicitly specified by the user. 12 • Example: The movie "Indiana Jones and the Kingdom of the Crystal Skull" is recommended because you want to watch an Action movie directed by Steven Spielberg and starring Harrison Ford. I want to watch action movie. Recommend Action movie Similarity User

Slide 13

Slide 13 text

Compare the 3 Types of Explanations 13 Explanation types Features Evaluation criteria Feature-based Explanations (FBExp) Calculate recommendations based on the similarity between user preferences and item characteristics. Transparency: Explain how the system works. Trust: Increase user trust in the system. Item-based Explanations (IBExp) Recommend similar items based on ratings given by users to movies they have watched in the past. Efficiency: Helps users make decisions faster. Knowledge-based Explanations (KBExp) Recommend items using explicitly specified user preferences Persuasiveness: How well it matches the preferences of the specified user. Satisfaction: Increases satisfaction with recommended items.

Slide 14

Slide 14 text

Contents 1. Background / Research Goal 2. Explanation Type 3. Experiment 4. Result 5. Conclusion 14

Slide 15

Slide 15 text

Experiment Setup • Objective: Compare explanations based on each recommendation method with LLM-generated explanations to validate the research questions. • Research Questions 1. Do users prefer LLM-generated explanations compared to those from existing methods? 2. How do users rate the quality of LLM-generated explanations compared to those from existing methods? 3. What features of LLM-generated explanations are valued by users? 15

Slide 16

Slide 16 text

Experiment Setup • Dataset • Movie information + reviews: MovieLens Latest Small • Detailed movie information: TMDB API • Generation of Recommendations • Recommended items are determined by a simple algorithm based on the baseline methods. • The LLM is not involved in the selection of recommended items. 16 Screenshots of the user study

Slide 17

Slide 17 text

Experiment Setup • LLM used in the experiment: Llama2-13B • An open-source LLM with high benchmark scores when the research was conducted. • Temperature=0.01, Max Token=2000 • Generation of Explanations • Baseline methods: Explanations reflecting the features of each recommendation method are generated based on a template. • LLM: Generated using individual prompt templates (figure on the right) corresponding to each recommendation method. 17

Slide 18

Slide 18 text

Experiment Setup • Participants: 97 • Participants were divided into 3 groups to compare explanations. • FBExp • IBExp • KBExp • Subjects compare the following two items: • Baseline: Explanation based on the template • LLM-Exp: Explanation generated by LLM based on template input 18

Slide 19

Slide 19 text

Evaluation Metrics All questions were evaluated on a 5-point Likert scale. • Evaluation metrics common to all explanation types: • Understandability of the explanation • Overall satisfaction with the recommendation • Effectiveness of the explanation in helping to evaluate the movie • Evaluation metrics specific to each explanation type: • Feature-based (FBExp): Transparency of the recommendation, Trust in the system • Item-based (IBExp): Efficiency in supporting decision-making • Knowledge-based (KBExp): Persuasiveness of the recommendation, Satisfaction with the item 19

Slide 20

Slide 20 text

Evaluation Metrics • Explanation Preference • Which of the two displayed explanations do you prefer (or do you prefer neither)? • If you have a preferred explanation, rate its characteristics on a 5-point scale. • Clarity, Creativity, level of detail Time, General, Quality, Consideration of preferences, Length of the presented explanation 20 FBExp Example explanation Baseline The movie ’Legends of the Fall’ is recommended to you because you like Romance, Drama, War, and Western genres, and the movie ’Legends of the Fall’ is similar to the ones you liked before. LLM We recommend ’Legends of the Fall’ as it aligns with your preferred genres of Drama, Romance, and War. This epic tale set in the early 20th century follows the lives of two brothers and their families, exploring themes of love, loss, and loyalty amidst the backdrop of World War I. With its sweeping landscapes and emotional depth, this film is sure to captivate you with its timeless storytelling.

Slide 21

Slide 21 text

Contents 1. Background / Research Goal 2. Explanation Type 3. Experiment 4. Results 5. Conclusion 21

Slide 22

Slide 22 text

Results - RQ1 Q: Compared to existing methods, do users prefer LLM explanations? A: Users clearly tend to prefer LLM-generated explanations. 22 Selection of preference between the baseline explanation and the LLM-generated explanation.

Slide 23

Slide 23 text

Results - RQ2 Q: How do users rate the quality of LLM-generated explanations compared to those from existing methods? A: LLM-generated explanations received higher ratings on almost all metrics. 23 Mean ratings for each evaluation metric (5-point scale).

Slide 24

Slide 24 text

Results - RQ3 • Q: What characteristics of LLM-generated explanations are valued by users? • A: • Significantly higher ratings were observed for many characteristics. • The "Length" characteristic received a low rating. Mean ratings for each characteristic. p-values from a binomial test on the likelihood of receiving a high rating for each characteristic. 24

Slide 25

Slide 25 text

Discussion • The explanations generated by LLM were highly evaluated, particularly in terms of creativity, detail, and personalization. • The vast background knowledge possessed by LLM may have influenced the addition of information not found in the baseline to the explanation. • Limitation • The single domain of cinema • Fixed single LLM model and prompt • Not using external knowledge such as RAG 25

Slide 26

Slide 26 text

Conclusion • They investigated the effectiveness of post-hoc explanations generated by LLMs in recommender systems. • The experimental results showed that compared to baseline explanations, LLM explanations were preferred more and their quality was rated significantly higher. • In particular, LLM explanations tended to be highly rated for creativity and detail, suggesting that the LLM's inherent knowledge and natural language capabilities have the potential to enhance the user experience. 26