NLP challenges in data-driven personas

Five NLP Challenges in Data-Driven Personas Dr. Joni Salminen October
20, 2021 Nanyang Technological University, Singapore

Meet the APG Team! Professor Jim Jansen The Leader (Principal
Scientist) • Inventor of APG • Leads the project • Customer relationships & management MSc. Soon-gyo Jung The Genius (Software Engineer) • Creator of APG • Front-End / Back-End • Implements like a genius, hence the nickname Dr. Joni Salminen The Handyman (Scientist) • Helps with user studies, system development, etc. • Strategic guy, likes to think the big picture ? YOU?

Why personas? • Summarize relevant user information for decision makers
for doing their jobs better (e.g., creating products that actually serve people’s needs) • Are an alternative (or complement) to numbers • Provide a different way of doing user/customer analytics (more approachable & memorable) • Give faces to user data …are not just about visualization, but empathetic representations of users! [1] [1] Nielsen, L. (2019). Personas—User Focused Design (2nd ed. 2019 edition). Springer.

Why automate persona generation? Personas are usually created with manual
methods (i.e., interviews & ethnography), methods that are expensive and slow to implement, and they can quickly become outdated. Because of the limitations, personas risk being inaccurate representations of the true user base. Better personas Better decisions Better results. In contrast, APG provides personas that are fast to create and updated automatically. This means the cost of persona creation is dramatically reduced, making them available for organizations with limited means (e.g., startups, small businesses). Depending on the underlying dataset, APG can cover a wide range of behaviors and demographics. Manual methods Automation An, J., Kwak, H., Salminen, J., Jung, S., & Jansen, B. J. (2018). Imaginary People Representing Real Numbers: Generating Personas from Online Social Media Data. ACM Transactions on the Web (TWEB), 12(4), 27. https://doi.org/10.1145/3265986

v. 2.0 (2021)

Literally, giving faces to user data! Personification = nameless, faceless
segments are turned into personas that describe a behavioral and demographic pattern in the data [1] Enrichment = enriching the persona profiles with additional information such as sentiment, loyalty, quotes, most viewed content, and topics of interest [1] [1] An, J., Kwak, H., Salminen, J., Jung, S., & Jansen, B. J. (2018). Imaginary People Representing Real Numbers: Generating Personas from Online Social Media Data. ACM Transactions on the Web (TWEB), 12(4), 27. https://doi.org/10.1145/3265986

Requirements: • Enough data (e.g., >100,000 viewers/visitors/users/customers) • Enough content
(e.g., >1000 products/pages/videos/posts) • Large and heterogeneous audience …so, probably not good for most SMEs, startups, micro-organizations (traditional personas work the best for such organizations!) You choose the tool based on the problem! Our ”Client Persona”

Research Roadmap for Automatic Persona Generation [1] Information architecture: How
to determine the relevant persona information for a given user, use case, and industry? (e.g., e-health, e-commerce, politics, gaming…) Quotes: How to find demographically matching, non-toxic comments that describe the persona’s attitudes and are relevant for end users? Temporal analysis: How to analyze change of personas over time? APG is about finding better ways to process and choose useful user information from vast amounts of online data. ”Personas are about giving faces to data.” Image: How to automatically generate, tag, and choose appropriate persona profile pictures? Evaluation: (1) How to ensure personas are of high quality (complete, clear, consistent and credible)? (2) How to measure value of personas for individuals and organizations? Attributes & Topics of Interest: How to automatically infer user attributes, such as interests, needs, wants, goals, political orientation, and brand affinity from social media? [1] Salminen, J., Jansen, B. J., An, J., Kwak, H., & Jung, S. (2019). Automatic Persona Generation for Online Content Creators: Conceptual Rationale and a Research Agenda. In L. Nielsen (Ed.), Personas—User Focused Design (2nd ed., pp. 135–160). Springer London. https://doi.org/10.1007/978-1-4471-7427-1_8 Interactivity: How to design interactive features to make users cope with more personas?

Current NLP techniques in APG • Topic classification: • Current:
Zero-shot classification (à la HuggingFace RoBERTa) for small organizations and supervised ML (XGBoost and TF-IDF) for large clients • Past: LDA (crap!) • Sentiment analysis: • Current: EmoLex (multiple languages, dictionary-based) • Future: SenticNet?

CHA1: Generate Persona Quotes • Objective: Generate artificial quotes that
reflect the persona’s (a) attitudes and (b) demographics. • NLP field: Conditional text generation • Requirements: • Demographically accurate • Attitudinally accurate • Topically accurate (enables searching) The key here is conditional; mere grammaticality is not enough but need to capture the persona’s ”self”. ”Quotes reflect the personas attitudes about given topics and about life in general.”

CHA2: Chat with Personas • Objective: Make it possible for
users to ask things from a persona, and the persona will give answers that, again, reflect who the persona is in terms (a) attitudes, (b) demographics, and (b) topics. • NLP field: Dialogue systems type to ask Ahmed a question… You: Hi Ahmed! What do you think about the elections in Pakistan? Ahmed: I don’t like it [negative sentiment, click to learn more]

CHA3: Frankenstein’s Personas • Objective: solve Bødker’s [1] ”Frankenstein problem”:
inconsistency of persona information • Example cases: man  woman, Indian  Pakistanese, etc. (cultural sensibilities (Häkkilä et al. [2])) • NLP field: supervised ML (language modeling) • How to match the quotes with the personas’ demographics and actual attitudes? (And maximize reflecting all aspects of the persona’s attitudes?) [1] Bødker, S., Christiansen, E., Nyvang, T., & Zander, P.-O. (2012). Personas, people and participation: Challenges from the trenches of local government. Proceedings of the 12th Participatory Design Conference: Research Papers-Volume 1, 91–100. [2] Häkkilä, J., Wiberg, M., Eira, N. J., Seppänen, T., Juuso, I., Mäkikalli, M., & Wolf, K. (2020). Design Sensibilities-Designing for Cultural Sensitivity. Proceedings of the 11th Nordic Conference on Human-Computer Interaction: Shaping Experiences, Shaping Society, 1–3.

CHA4: Drifting Personas • Objective: Identify topical changes in personas
and notify decision makers of these changes. • NLP field: Concept drift / topic drift / model drift… (common issues in ML [1]) • All refer to CHANGE in the underlying user behavior (basically, the data: new categories appear, old ones change, distributions change, etc.) • How often should personas be changed? How should the change be measured / detected? [2] [1] Widmer, G., & Kubat, M. (1996). Learning in the presence of concept drift and hidden contexts. Machine Learning, 23(1), 69–101. [2] Jansen, B. J., Jung, S., & Salminen, J. (2019). Capturing the change in topical interests of personas over time. Proceedings of the Association for Information Science and Technology, 56(1), 127–136.

CHA5: Personas from Text Only • User segmentation / text
analytics / pattern mining • Either for a specific use case (e.g., toxic personas, fake news personas, fandom personas…) or general representations of humanity that can be queried at will (i.e., ”stacking” different user models on top of each other to create truly multifaceted human representations) • Needs data, help from psychologists, etc. How to validate and so on?

Common challenges: • Modeling people based on what they write.
• Lack of resources: • Datasets (need demographically labeled data) • Baselines • Evaluation metrics (have to consider UX / HCI / user feedback; not only technical, but socio-technical problems) • Most importantly, not enough PEOPLE working on these issues

Data is available but what about information? • People’s attitudes,
fears, doubts, hopes, needs, wants… can these be inferred from unstructured (micro-)texts? • Rosetta’s Stone for data-driven personas: user modeling / attribute inference from smartly sampled tweets? • Dictionaries (LIWC, AFINN, EMOLEX) vs. deep learning? …VITALLY important because persona users’ information needs are unique --- need to have flexible tools for them to query persona attitudes in real-time ➔ static data-driven personas won’t do!

Thank you! Questions? Dr. Joni Salminen [email protected] The APG family
(Davao, 2019) Get the book from Amazon! (or your university library)

NLP challenges in data-driven personas

NLP challenges in data-driven personas

Joni

More Decks by Joni

Other Decks in Research

Featured

Transcript

Five NLP Challenges in Data-Driven Personas Dr. Joni Salminen October

Meet the APG Team! Professor Jim Jansen The Leader (Principal

Why personas? • Summarize relevant user information for decision makers

Why automate persona generation? Personas are usually created with manual

v. 2.0 (2021)

Literally, giving faces to user data! Personification = nameless, faceless

Requirements: • Enough data (e.g., >100,000 viewers/visitors/users/customers) • Enough content

Research Roadmap for Automatic Persona Generation [1] Information architecture: How

Current NLP techniques in APG • Topic classification: • Current:

CHA1: Generate Persona Quotes • Objective: Generate artificial quotes that

CHA2: Chat with Personas • Objective: Make it possible for

CHA3: Frankenstein’s Personas • Objective: solve Bødker’s [1] ”Frankenstein problem”:

CHA4: Drifting Personas • Objective: Identify topical changes in personas

CHA5: Personas from Text Only • User segmentation / text

Common challenges: • Modeling people based on what they write.

Data is available but what about information? • People’s attitudes,

Thank you! Questions? Dr. Joni Salminen [email protected] The APG family