LLMs for Social Simulation: Progress, Opportunities and Challenges

RESEARCH TRACK LLMs for Social Simulation: Progress, Opportunities and Challenges
Jiaying Wu 25 Sep 2024

1 The scaling of LLMs unlocks new capabilities https://research.google/blog/pathways-language-model-palm-scaling-to-540-billion-parameters-for-breakthrough-performance/

2 The scaling of LLMs unlocks new capabilities … and
social intelligence? https://research.google/blog/pathways-language-model-palm-scaling-to-540-billion-parameters-for-breakthrough-performance/

3 Deriving predictions from LLM social simulations: an example Predicting
Results of Social Science Experiments Using Large Language Models (Ashokkumar et al., Working Paper, version: 8 Aug 2024)

4 LLMs predict social science experiment results (even unpublished studies)
Predicting Results of Social Science Experiments Using Large Language Models (Ashokkumar et al., Working Paper, version: 8 Aug 2024) Data: 70 text-based social science experiments conducted in the US (50 from TESS project, 20 unpublished), totaling 476 treatment effects Method: prompt GPT-4 to simulate over 100k research participants’ responses to experimental stimuli, given specific individual profiles Observation: LLM estimates of treatment effects (pooled across prompts) are strongly correlated w/ original treatment effects (*𝑟𝑎𝑑𝑗 : disattenuated correlations)

5 Road Map Leveraging LLMs for social simulation - Individual-level:
persona simulation - Population-level: social behavior & interaction simulation Benefits of LLM simulation in practical applications - LLM-simulated patients facilitate health professional training - LLM-simulated user comments inform fake news detection - LLM-simulated personas enable scalable synthetic data generation Opportunities & challenges: an iGyro perspective

6 Prompting / instruction-tuning LLMs to role-play different characters RoleLLM:
Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models (Wang et al., ACL Findings 2024) Nonparametric prompting - in-context demonstrations / retrieval augmentation with top-k role-specific dialogues

7 Prompting / instruction-tuning LLMs to role-play different characters RoleLLM:
Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models (Wang et al., ACL Findings 2024) Parametric tuning - supervised fine-tuning on role-play corpus Generate questions to ask simulated roles -> obtain more role-specific knowledge response becomes role-specific

8 Effectiveness of employing retrieval augmentation / system instructions *
Smaller RoleLLaMA and RoleGLM models (both reaug and sys) are instruction-tuned to the task. * RoleGPT is an audited GPT-4 model, i.e., verified by human annotators that the model knows well about the characters RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models (Wang et al., ACL Findings 2024) (ROUGE-L scores)

9 Effectiveness of employing retrieval augmentation / system instructions *
Smaller RoleLLaMA and RoleGLM models (both reaug and sys) are instruction-tuned to the task. * RoleGPT is an audited GPT-4 model, i.e., verified by human annotators that the model knows well about the characters Smaller LLMs (LLaMA-7B, GLM-6B): - sys > reaug - retrieved examples can be noisy and sparse Larger LLM (GPT-4): - sys ≈ reaug - model exhibits greater robustness → invariability in performance Observation: Smaller LLMs tuned on abundant task- specific data can outperform larger closed-source LLMs in persona simulation RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models (Wang et al., ACL Findings 2024) (ROUGE-L scores)

10 Leveraging LLMs to simulate social media interactions S3: Social-network
Simulation System with Large Language Model-Empowered Agents. (Gao et al., arXiv: 2307.14984) Step 1: Based on user demographics and initialized real- world social environment, utilize an LLM agent to predict user internal states including emotion (calm, moderate, intense) and attitude (support / oppose)

Simulation System with Large Language Model-Empowered Agents. (Gao et al., arXiv: 2307.14984) Step 2: Based on user emotion and attitudes, content generation behavior is simulated through an LLM- based generation module. Then, upon receiving a message from one of user’s followees, user interactive behavior is further simulated.

Simulation System with Large Language Model-Empowered Agents. (Gao et al., arXiv: 2307.14984) Step 3: For each simulation step, environment and memory are updated with simulated user-generated messages.

Simulation System with Large Language Model-Empowered Agents. (Gao et al., arXiv: 2307.14984) Focuses on 2 scenarios: - nuclear energy - gender discrimination Simulates 3 social phenomena: - info propagation - attitude - emotion (* Data source is unspecified in the paper. From descriptions and event selection, it seems that data is obtained from Weibo)

14 LLMs demonstrate individual-level simulation capabilities S3: Social-network Simulation System
with Large Language Model-Empowered Agents. (Gao et al., arXiv: 2307.14984) LLM-empowered simulation can predict - emotions (calm, moderate, intense) - attitudes (support / oppose) - behaviors (forward / post relevant content)

15 LLMs demonstrate individual-level simulation capabilities S3: Social-network Simulation System
with Large Language Model-Empowered Agents. (Gao et al., arXiv: 2307.14984) LLM-empowered simulation can predict - emotions (calm, moderate, intense) - attitudes (support / oppose) - behaviors (forward / post relevant content) Content generated by LLM simulations is of reasonable quality

16 LLMs demonstrate population-level simulation capabilities S3: Social-network Simulation System
with Large Language Model-Empowered Agents. (Gao et al., arXiv: 2307.14984) As the event slowly reaches a larger community, a second peak in emotion intensity is created. (gender discrimination scenario) Real Simulated Real Simulated

18 #1: LLM-simulated patients facilitate health professional training PATIENT-Ψ: Using
Large Language Models to Simulate Patients for Training Mental Health Professionals (Wang et al., arXiv: 2405.19660)

19 #2: LLM-simulated users inject crowd wisdom into fake news
detection Let Silence Speak: Enhancing Fake News Detection with Generated Comments from Large Language Models (Nan et al., CIKM 2024)

20 #3: LLM-simulated personas enable scalable synthetic data generation Scaling
Synthetic Data Creation with 1,000,000,000 Personas (Chen et al., Technical Report 2024) Text-to-Persona: can use any text as input to obtain corresponding personas just by prompting the LLM “Who is likely to [read|write|like|dislike|...] the text?”

Synthetic Data Creation with 1,000,000,000 Personas (Chen et al., Technical Report 2024) Persona-to-Persona: obtains diverse personas via interpersonal relationships, which can be easily achieved by prompting the LLM “Who is in close relationship with the given persona?”

Synthetic Data Creation with 1,000,000,000 Personas (Chen et al., Technical Report 2024)

25 TD1: distinguishing truth from falsity in digital media Research
Question: How can we better define & inject crowd wisdom to enhance the detection process through LLM-simulated responses? Let Silence Speak: Enhancing Fake News Detection with Generated Comments from Large Language Models (Nan et al., CIKM 2024)

26 TD1: distinguishing truth from falsity in digital media Research
Question: How can we better define & inject crowd wisdom to enhance the detection process through LLM-simulated responses? → Consider designing comprehensive user profiles w/ diverse & relevant set of attributes (e.g., political stances). → Apart from enhancing effectiveness, LLM- simulated reasoning processes can also help evaluate the quality of model-generated debunking explanations. Let Silence Speak: Enhancing Fake News Detection with Generated Comments from Large Language Models (Nan et al., CIKM 2024)

27 TD2: resilience in the dissemination of digital information Research
Question: How can we leverage LLM-simulated social interactions to help inform recommendation & search strategies that are sensitive to the users’ tolerance for diversity?

28 TD2: resilience in the dissemination of digital information LLMs
generate structurally realistic social networks but overestimate political homophily (Chang et al., arXiv: 2408.16629) → Recent finding: LLMs generate structurally realistic social networks → Opportunities: consider simulating the effects of recommendation strategies to reward / penalize the model accordingly. (LLM simulations also help alleviate cold-start issues) Research Question: How can we leverage LLM-simulated social interactions to help inform recommendation & search strategies that are sensitive to the users’ tolerance for diversity?

29 TD3: empowerment of digital resilience through exploration & reasoning
Research Question: How can we elicit personalized messages from LLMs to (1) reduce misleading effects of misinfo & disinfo on individuals; and (2) empower individuals with the capability to perform deeper analysis on issues of interest?

30 TD3: empowerment of digital resilience through exploration & reasoning
Durably reducing conspiracy beliefs through dialogues with AI (Costello et al., Science 2024) Research Question: How can we elicit personalized messages from LLMs to (1) reduce misleading effects of misinfo & disinfo on individuals; and (2) empower individuals with the capability to perform deeper analysis on issues of interest? → Recent finding: 3-round conversation with GPT-4 model reduced participants’ belief in their conspiracy theory by 20% on average. → (1): Boosting effectiveness w/ personalized messages; → (2): simulating individuals w/ LLMs to facilitate system design & evaluation

31 Remaining challenge: LLM persona simulations are prone to bias
Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs (Gupta et al., ICLR 2024) Irrelevant attributes can severely skew LLM predictions (e.g., physical disability & math problems)

32 Remaining challenge: LLM persona simulations are prone to bias
Irrelevant attributes can severely skew LLM predictions (e.g., physical disability & math problems) Potential solution: filter task-relevant attributes & instruction-tune model on synthesized data Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs (Gupta et al., ICLR 2024)

Discussion and collaboration are welcome! :) Contact: [email protected] 33 My
curated reading list: LLMs for social simulation / computational social science [Link]

LLMs for Social Simulation: Progress, Opportuni...

LLMs for Social Simulation: Progress, Opportunities and Challenges

More Decks by wing.nus

Other Decks in Education

Featured

Transcript