Challenges of Online Hate Detection Systems (and Research)

Online Hate Detection Systems: Challenges and Action Points for Developers,
Data Scientists, and Researchers Dr. Joni Salminen1, Maria Jose Linarez2, Soon-gyo Jung1, Bernard J. Jansen1 1Qatar Computing Research Institute, Doha, Qatar 2Universidad Centroccidental Lisandro Alvarado, Barquisimeto, Venezuela The International Conference on Behavioral and Social Computing (BESC 2021)

Towards resolving online hate “Automated online hate detection has garnered
interest from various stakeholders to make online platforms safer. Despite this interest, there remain a plethora of unresolved issues that hinder advancement. We review fourteen state-of-the-art articles discussing these challenges, and present a meta-synthesis. Six themes are identified: (1) Dataset selection, (2) Detection of False Positives and Negatives, (3) Semantic Context of Hate Messages, (4) Privacy and Anonymity, (5) Ethical Considerations, and (6) Minimizing Bias. For each theme, we provide a set of action points to support researchers, data scientists, and developers to improve hate detection systems.”

The Most Important Challenges in Online Hate Research…

ISSUE 1: Most studies focus on detection, not what happens
after that • Most work focuses on algorithmic detection (prediction / classification) of online hate that may be impossible at the current maturity of technology (lack of general-AI) or given the subjective nature of online hate. • Secondly, even if detection is successful, then what? Most work just states the purpose is to ”help moderate,” but in what way? Automatic censorship is out of the question. • Hence, the question: are computational social scientists focused on a ”nice-to-have” aspect of a real problem, while missing the chance of actually having a positive impact?

ISSUE 2: Most studies focus on detection, not what happens
before that (prevention) • Why do we have hate? • What makes people angry? • What are the social conditions and root causes for hate? • How to understand the hateful people? • The current research vilifies ”haters” and does no proper anthropological or scientific inquiry to actually understand the root causes of the problem. Again, working on the ”toy” aspect of the issue (because it is easy). • SOLUTION: Start understanding people, don’t hide behind algorithms (algorithms will not solve this problem).

ISSUE 3: We like technical problems, we don’t like talking
to people! • Computational social scientists see hate as a technical problem, solvable by algorithms and systems. • We use measures like accuracy, precision, recall, F1 (macro/micro), AUC... ➔ these measures distance us from user experience and what’s actually driving hate in the real world. ...is hate a technical problem? Sociological problem? Psychological problem? • I argue it’s a socio-technical problem. In other words, online hate is primarily a social problem that is manifested in technological platforms and that can be mitigated/amplified by the mechanisms of those platforms (btw, very little proof of this mitigation/amplification exists, apart from common sense --- yet, we’re quick to blame algorithms on all the bad things).

ISSUE 4: Hate is very difficult to identify in universal
terms. • Hate is highly subjective. At least following factors can cause variation in hate interpretation [1]: • Demographics (age, gender, country) • Culture (both in traditional sense and community sense) • Sociological standing (does the working guy think differently than an ”elitist” university researcher? It seems so.) • Political affilitation (what’s offensive to democrats is funny to republicans) • Sense of humor (some people don’t take it seriously, others take offense in every turn) • Mood (at the specific moment of annotation, maybe I had a bad day – everything seems toxic due to projecting) • What about sarcasm, irony? Well-known problem since ten years [2], but nobody has solved it thus far!

Cont’d • If hate is subjective, how can we detect
it universally? Should we even try? (The problem of averages [3].) ➔ ALTERNATIVE: User modeling / user features • Virtually non-existent in papers due to tradition of using (anonymous) crowdsourcing. Should we start knowing things about people who label our data? • Most papers know nothing of the raters (age, gender, socio- economic status, beliefs, etc.) ➔ we incorporate hidden biases into training sets without even knowing it ➔ the models will be a mess --- mess = a random mixture of beliefs from random people. • SOLUTION: collect socio-demographic information from the annotators and incorporate that into hate detection modeling. What is hate for Joni is not the same as what is hate for Karen.

ISSUE 5: Wrong labeling paradigm • An example of doing
it wrong: • With two classes and three raters, we can always build a ”ground truth” for a sample! [4] • …but, in reality, we are hiding disagreements (=the true distribution of hate interpretation) • …even two raters can disagree on ”obvious” cases! • Solution: drop this majority vote paradigm for annotation. Either model empirical distribution [5] [hateful = 0.2, neutral = 0.5, non- hateful = 0.3] or built socio-demographic specific models. There is no ground truth when it comes to hate; just a distribution. Want to prove me wrong? Then, build a case for immutably hateful messaging.

Immutable & ”grey zone” toxicity • Immutably toxic = perceived
as toxic across all possible user groups • Grey zone toxic = can be toxic or not toxic depending on the context and person interpreting • How many toxic comments in current SOTA hate datasets would fall into each category? (We don’t know.)

Immutable & ”grey zone” toxicity • Are there immutably toxic
words or phrases? (immutably toxic = no way to use them except in a toxic way) • …if there are, at least these are very rare… • The implications: • Systematically use adverse examples for training (very rare, never seen) ➔ negations, exaggeration. SOLUTION: Active learning instead of cross-sectional one time training. • What are the blind spots of current classifiers (i.e., what comments fool them all?) SOLUTION: error analyses, creating artificial examples or finding real ones to fix the blind spots.

Example, n=26, two raters rating ”immutably” toxic and ”immutably non-toxic”
fake comments. • “Your total percentage agreement is only 53.8%.” • “Your agreement for the Diagnostic comments that were newly created (and that should ALL be considered as non-toxic) is 69.2%.” • “Your agreement for the Hateful comments that were derived from the actual datasets is 38.5%.” “Take out the white trash can” was considered toxic by a person. Even after discussion!

ISSUE 6: Out of context • All hate datasets I’ve
seen are annotated without giving the annotators any context of the messages. Only the message is given, and then asked to label it ”toxic” or ”not toxic”. • Toxic in response to what? In what context? • Solution: I don’t know: maybe LSTM / Transformers for the whole context? The issue is that subjectivity is a source of cross-sectional ”error” and context is a source of temporal (sequential) error ---- error margins from both sides make this hard!

ISSUE 7: “AI” doesn’t exist. • No understanding of human
experience • No concept of what hate is ---- what culture is, what norms are, what society is [6]. • ”AI” is the most misleading word of the century – what we do is statistical learning (counting numbers, mapping words and phrases with some other words and phrases that have meaning for US, but none for the algorithms) • Algorithms don’t create mental models (like people do when we are learning) – they create probabilistic mappings. Totally different from human intelligence. • In consequence, most ML classifiers are glorified word frequency counters that don’t generalize beyond their training sets, precisely because of the above reasons.

This is ”AI” (”glorified techniques for counting words” –Erik Cambria,
2019)

ISSUE 8: Lack of user studies, interventions, and IMPLEMENTATION. •
Intervention, almost totally missing. Why aren’t the models put into practice? Why aren’t they tested with real users? • What can actually be done to mitigate toxicity? • Should we focus on things like plain ol’ good manners? (netiquette) • What’s the role of systems in the first place? • What does human in the loop mean for hate detection? • Anger stems from root causes that may not be computational but social problems. Hence, computational techniques may have very limited means in solving them. Consequently, cross- disciplinary collaboration (sociologists, psychologists, historians) is required (but is way too rare).

Cont’d • Ironically, human-computer interaction (HCI) has great opportunities at
developing interaction techniques that would bring people closer, increase empathy, and improve the health of online communities. They talk about this is a lot. • …but, where are these studies? (It’s 2022, still waiting.) • Some ideas: • Recommending non-hateful ways of saying the same thing. • Cool-off periods for heated discussions (“think 10 seconds before posting”). • Sensory control (introduction of pleasant sounds / pictures to calm down people). • Newsfeed design to REDUCE anxiety and polarization, not increase it. (Facebook, why aren’t you doing this?) • Directing toxic people to human helpers. These are PEOPLE with problems, not problems to hide away.

Some ways forward (for cross-disciplinary collab) • Propaganda (history 1920s
== 2020s?) • Self perpetuating media bias (platform agency, implicit agenda setting, clickbait incentives ➔ toxicity == engagement == money) • Social media addiction, FOMO, impression management ➔ online hate • Neurological reactions to feed content types (political/negative) ➔ toxicity correlations, intervention effects… (eye- tracking, EEG; methodological expansion) ”Those who don’t know history are doomed to repeat it.”

Some ways forward (for ML-based approaches) • User modeling (measure
background variables, not just anonymous “crowd” → Garbage in, garbage out!) • Use a numerical scale, not binary classes (e.g., 10-points, going from low toxicity to high toxicity). • Capture the context, present it to annotators, and make use of it in the modeling. • Sample more than three people per row to build training sets, and predict empirical distribution instead of single class label [4]

Some ways forward (for the field as a whole) •
Forget detection/classification/scoring! (Red herrings) • Get to the ROOT of the problem – why are people angry? • Engage in cross-disciplinary collaboration with social scientists (online hate is not a CS / NLP problem; it’s a human problem)

References • [1]Joni Salminen, Hind Almerekhi, Ahmed Mohamed Kamel, Soon-gyo
Jung, and Bernard J. Jansen. 2019. Online Hate Ratings Vary by Extremes: A Statistical Analysis. In Proceedings of the 2019 Conference on Human Information Interaction and Retrieval (CHIIR ’19), ACM, Glasgow, Scotland, UK, 213–217. DOI: https://doi.org/10.1145/3295750.3298954 • [2] Sara Sood, Judd Antin, and Elizabeth Churchill. 2012. Profanity Use in Online Communities. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’12), ACM, New York, NY, USA, 1481–1490. • [3] Arthur L. Bowley. 1901. Elements of Statistics. Journal of the Institute of Actuaries 2, (1901). • [4] Joni Salminen, Ahmed Mohamed Kamel, Soon-Gyo Jung, and Bernard Jansen. 2021. The Problem of Majority Voting in Crowdsourcing with Binary Classes. In Proceedings of 19th European Conference on Computer-Supported Cooperative Work, European Society for Socially Embedded Technologies (EUSSET), Zurich, Switzerland. DOI: https://doi.org/10.18420/ecscw2021_n12 • [5] Ellery Wulczyn, Nithum Thain, and Lucas Dixon. 2017. Ex Machina: Personal Attacks Seen at Scale. In Proceedings of the 26th International Conference on World Wide Web (WWW ’17), International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 1391–1399. DOI: https://doi.org/10.1145/3038912.3052591 • [6] Jonna Häkkilä, Mikael Wiberg, Nils Johan Eira, Tapio Seppänen, Ilkka Juuso, Maija Mäkikalli, and Katrin Wolf. 2020. Design Sensibilities-Designing for Cultural Sensitivity. In Proceedings of the 11th Nordic Conference on Human- Computer Interaction: Shaping Experiences, Shaping Society, 1–3.

Our Online Hate Research 1. Anatomy of Online Hate: Developing
a Taxonomy and Machine Learning Models for Identifying and Classifying Hate in Online News Media 2. Neural Network Hate Deletion: Developing a Machine Learning Model to Eliminate Hate from Online Comments 3. Online Hate Interpretation Varies by Country, But More by Individual: A Statistical Analysis Using Crowdsourced Ratings 4. Detecting Toxicity Triggers in Online Discussions 5. Online Hate Ratings Vary by Extremes: A Statistical Analysis 6. Exploring the Relationship Between Game Content and Culture-based Toxicity: A Case Study of League of Legends and MENA Players 7. Analyzing Hate Speech Toward Players from the MENA in League of Legends 8. Mapping online hate: A scientometric analysis on research trends and hotspots in research on online hate 9. Developing an online hate classifier for multiple social media platforms 10. Topic-driven toxicity: Exploring the relationship between online toxicity and news topics 11. Four Types of Toxic People: Characterizing Online Users' Toxicity over Time 12. Are These Comments Triggering? Predicting Triggers of Toxicity in Online Discussions

Thanks for listening! Dr. Joni Salminen [email protected] Be in touch
if you’re interested in collaboration on online hate research!

Challenges of Online Hate Detection Systems (an...

Challenges of Online Hate Detection Systems (and Research)

Joni

More Decks by Joni

Other Decks in Research

Featured

Transcript

Online Hate Detection Systems: Challenges and Action Points for Developers,

Towards resolving online hate “Automated online hate detection has garnered

The Most Important Challenges in Online Hate Research…

ISSUE 1: Most studies focus on detection, not what happens

ISSUE 2: Most studies focus on detection, not what happens

ISSUE 3: We like technical problems, we don’t like talking

ISSUE 4: Hate is very difficult to identify in universal

Cont’d • If hate is subjective, how can we detect

ISSUE 5: Wrong labeling paradigm • An example of doing

Immutable & ”grey zone” toxicity • Immutably toxic = perceived

Immutable & ”grey zone” toxicity • Are there immutably toxic

Example, n=26, two raters rating ”immutably” toxic and ”immutably non-toxic”

ISSUE 6: Out of context • All hate datasets I’ve

ISSUE 7: “AI” doesn’t exist. • No understanding of human

This is ”AI” (”glorified techniques for counting words” –Erik Cambria,

ISSUE 8: Lack of user studies, interventions, and IMPLEMENTATION. •

Cont’d • Ironically, human-computer interaction (HCI) has great opportunities at

Some ways forward (for cross-disciplinary collab) • Propaganda (history 1920s

Some ways forward (for ML-based approaches) • User modeling (measure

Some ways forward (for the field as a whole) •

References • [1]Joni Salminen, Hind Almerekhi, Ahmed Mohamed Kamel, Soon-gyo

Our Online Hate Research 1. Anatomy of Online Hate: Developing

Thanks for listening! Dr. Joni Salminen [email protected] Be in touch