$30 off During Our Annual Pro Sale. View Details »

Using AI for User Representation: An Analysis o...

Using AI for User Representation: An Analysis of 83 Persona Prompts

We analyzed 83 persona prompts from 27 research articles that used large language models (LLMs) to generate user personas. Findings show that the prompts predominantly generate single personas. Several prompts express a desire for short or concise persona descriptions, which deviates from the tradition of creating rich, informative, and rounded persona profiles. Text is the most common format for generated persona attributes, followed by numbers. Text and numbers are often generated together, and demographic attributes are included in nearly all generated personas. Researchers use up to 12 prompts in a single study, though most research uses a small number of prompts. Comparison and testing multiple LLMs is rare. More than half of the prompts require the persona output in structured format, such as JSON, and 74% of the prompts insert data or dynamical variables. We discuss the implications of increased use of computational personas for user representation.

Avatar for Danial Amin

Danial Amin

October 18, 2025
Tweet

Other Decks in Research

Transcript

  1. Using AI for User Representation An Analysis of 83 Persona

    Prompts Joni Salminen, Danial Amin, and Bernard Jansen a a b a: School of Marketing and Communication, University of Vaasa, Finland b: Qatar Computing Research Institute, Hamad Bin Khalifa University, Qatar
  2. Table of contents 01 04 02 05 03 06 Research

    Context Research Gap & RQs Methodology Key Findings Insights and Recommendations Limitations and Future Directions
  3. Manual Persona Development Data-Driven Persona LLM Generated Personas Persona in

    User Representation Fictitious user representations based on real data Central to user-centered design (UCD) and HCI Traditionally created through human analysis of user data Persona Persona Evolution
  4. LLM Generated Personas LLM generated personas (a persona created by

    LLM) are becoming more prominent. LLMs enable near-infinite prompting strategies No systematic analysis of how researchers use LLMs for personas Lack of evidence-based guidelines for safe and productive use
  5. Research Questions Why do researchers use persona prompts? How do

    researchers use persona prompts? What kind of personas do researchers generate with persona prompts? RQ1 RQ2 RQ3
  6. Searched six major academic databases (ACM, IEEE, Web of Science,

    Scopus, arXiv) following SLR guidelines. Article Search Identified 52 relevant articles focused on generative AI’s role in persona development. Selection Extracted 83 usable persona prompts from 27 articles (52%), including prompt text and detailed usage context. Extraction Collaborative coding by two researchers on an established coding framework Data Analysis Research Methodology
  7. Persona Generation 66.7% Prediction 21.2% Evaluation 12.1% RQ1: Why Use

    Persona Prompts? Primary Uses Researchers primarily use persona prompts for generation , with emerging applications in prediction tasks and evaluation across diverse domains from education to climate communication. Primary Domains Education Design Marketing Storytelling Informatics Health Sustainability Communication
  8. GPT 76% Others 18% DALL-E 6% Single LLM 78% Multiple

    LLMs 22% Number of Words Count 0 50 100 150 200 250 300 350 0 2 4 6 8 10 RQ2: How do researchers use persona prompts? (1/2) LLM Usage Researchers predominantly use GPT models (76.1%) with multi-prompt strategies (avg. 3.1 prompts per study, range 1-12), while cross-model testing remains rare (only 22% of studies), and prompt complexity varies widely (22-309 words). Prompt Complexity Min:22 Mdn:67 Mean:107 Max: 309
  9. RQ2: How do researchers use persona prompts? (2/2) Most researchers

    integrate dynamic data or variables into prompts (74.1%), require structured outputs like JSON (51.85%), and use complex prompt orchestration techniques. 74% insert dynamic data or variables into prompts 52% require structured output format 19% assign facilitator roles to LLM 27% disclosed hyperparameter values
  10. RQ3: What kind of personas do researchers generate with persona

    prompts? (1/2) Researchers generate predominantly single, text-based personas with strong demographic emphasis, combining text and numbers while prioritizing brevity through explicit length constraints and rarely including images. Text+N um bers+Im age Text+N um bers Text only N um bers only 0 5 10 15 20 Content 64% specified number to generate Constraints 71% generated single persona only 41% included length constraints
  11. RQ3: What kind of personas do researchers generate with persona

    prompts? (2/2) LLM-generated personas average 5.48 information attributes (SD=3.51), which is 38% fewer than previous generation data-driven personas, with most falling in the “Simple” category (4-7 attributes) rather than information-rich profiles. 0 5 10 15 20 Very Simple Simple Moderate High Information Richness D em ographics C ontextual Inform ation Behaviors Sum m ary Attitudes 0 5 10 15 20 25 Information Categories (0-3 subcategories) (4-7 subcategories) (8-10 subcategories) (11+ subcategories)
  12. Key Implications and Insights Continuation of Traditional Persona Deviations from

    Traditional Persona Emerging Concerns for LLM Generated Personas Demographics remain the dominant information category, appearing in majority of prompt entries. Traditional persona attributes such as behaviors, attitudes, and contextual information continue to be preserved in LLM-generated personas. LLM-generated personas emphasize brevity over rich narratives. Images are rarely generated for personas. The majority of prompts integrate data or dynamic variables directly within the prompt structure. Researchers show a preference for structured outputs like JSON rather than narrative formats. Single persona generation dominates which limits the representation of user population diversity. Complex prompt chains reduce transparency and make systematic evaluation increasingly difficult. Cross-model validation (i.e. testing/using multiple models) remains limited. Personas are increasingly treated as data objects rather than tools for building empathy with users.
  13. Recommendations Include primary user data in prompts (maintain data-driven principle)

    Ground prompting strategies in persona theory Evaluate both individual prompts and system-level effects Generate diverse persona sets, not just single personas Test multiple LLM models for comparison
  14. Empirical studies on prompt design effects on persona outputs 1

    Future Research 2 Evaluation frameworks for multi-prompt systems Algorithmic fairness and bias guidelines 3 4 Cross-model performance comparisons 5 Human-AI collaboration best practices
  15. LLMs predominantly used for generation (81.5%), with emerging prediction use

    (25.9%) Main Takeaways GPT models dominate (76.1%), but cross-model testing is rare Multi-prompt strategies common (avg. 3.1 prompts per study) Generated personas emphasize brevity and structure over rich narratives Demographics remain central (77.8% of entries) in LLM generated personas.