Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Challenges of Online Hate Detection Systems (and Research)

Joni
October 31, 2021

Challenges of Online Hate Detection Systems (and Research)

Dr. Joni Salminen
The International Conference on Behavioral and Social Computing (BESC 2021)
October 31, 2021, Doha, Qatar

Joni

October 31, 2021
Tweet

More Decks by Joni

Other Decks in Research

Transcript

  1. Online Hate Detection Systems:
    Challenges and Action Points for
    Developers, Data Scientists, and
    Researchers
    Dr. Joni Salminen1, Maria Jose Linarez2, Soon-gyo
    Jung1, Bernard J. Jansen1
    1Qatar Computing Research Institute, Doha, Qatar
    2Universidad Centroccidental Lisandro Alvarado, Barquisimeto,
    Venezuela
    The International Conference on Behavioral
    and Social Computing (BESC 2021)

    View Slide

  2. Towards resolving online hate
    “Automated online hate detection has garnered interest from various
    stakeholders to make online platforms safer. Despite this interest, there
    remain a plethora of unresolved issues that hinder advancement. We review
    fourteen state-of-the-art articles discussing these challenges, and present a
    meta-synthesis. Six themes are identified: (1) Dataset selection, (2)
    Detection of False Positives and Negatives, (3) Semantic Context of Hate
    Messages, (4) Privacy and Anonymity, (5) Ethical Considerations, and (6)
    Minimizing Bias. For each theme, we provide a set of action points to
    support researchers, data scientists, and developers to improve hate
    detection systems.”

    View Slide

  3. The Most Important Challenges in
    Online Hate Research…

    View Slide

  4. ISSUE 1: Most studies focus on
    detection, not what happens after that
    • Most work focuses on algorithmic detection (prediction /
    classification) of online hate that may be impossible at the
    current maturity of technology (lack of general-AI) or given the
    subjective nature of online hate.
    • Secondly, even if detection is successful, then what? Most work
    just states the purpose is to ”help moderate,” but in what way?
    Automatic censorship is out of the question.
    • Hence, the question: are computational social scientists
    focused on a ”nice-to-have” aspect of a real problem, while
    missing the chance of actually having a positive impact?

    View Slide

  5. ISSUE 2: Most studies focus on detection, not
    what happens before that (prevention)
    • Why do we have hate?
    • What makes people angry?
    • What are the social conditions and root causes for hate?
    • How to understand the hateful people?
    • The current research vilifies ”haters” and does no proper
    anthropological or scientific inquiry to actually understand the
    root causes of the problem. Again, working on the ”toy” aspect
    of the issue (because it is easy).
    • SOLUTION: Start understanding people, don’t hide behind
    algorithms (algorithms will not solve this problem).

    View Slide

  6. ISSUE 3: We like technical problems, we
    don’t like talking to people!
    • Computational social scientists see hate as a technical problem, solvable by algorithms
    and systems.
    • We use measures like accuracy, precision, recall, F1 (macro/micro), AUC... ➔ these
    measures distance us from user experience and what’s actually driving hate in the real
    world.
    ...is hate a technical problem? Sociological problem? Psychological problem?
    • I argue it’s a socio-technical problem. In other words, online hate is primarily a social
    problem that is manifested in technological platforms and that can be mitigated/amplified
    by the mechanisms of those platforms (btw, very little proof of this mitigation/amplification
    exists, apart from common sense --- yet, we’re quick to blame algorithms on all the bad
    things).

    View Slide

  7. ISSUE 4: Hate is very difficult to identify
    in universal terms.
    • Hate is highly subjective. At least following factors can cause
    variation in hate interpretation [1]:
    • Demographics (age, gender, country)
    • Culture (both in traditional sense and community sense)
    • Sociological standing (does the working guy think differently than an ”elitist”
    university researcher? It seems so.)
    • Political affilitation (what’s offensive to democrats is funny to republicans)
    • Sense of humor (some people don’t take it seriously, others take offense in
    every turn)
    • Mood (at the specific moment of annotation, maybe I had a bad day –
    everything seems toxic due to projecting)
    • What about sarcasm, irony? Well-known problem since ten years [2],
    but nobody has solved it thus far!

    View Slide

  8. Cont’d
    • If hate is subjective, how can we detect it universally? Should we
    even try? (The problem of averages [3].)
    ➔ ALTERNATIVE: User modeling / user features
    • Virtually non-existent in papers due to tradition of using (anonymous)
    crowdsourcing. Should we start knowing things about people
    who label our data?
    • Most papers know nothing of the raters (age, gender, socio-
    economic status, beliefs, etc.) ➔ we incorporate hidden biases into
    training sets without even knowing it ➔ the models will be a mess ---
    mess = a random mixture of beliefs from random people.
    • SOLUTION: collect socio-demographic information from the
    annotators and incorporate that into hate detection modeling. What
    is hate for Joni is not the same as what is hate for Karen.

    View Slide

  9. ISSUE 5: Wrong labeling paradigm
    • An example of doing it wrong:
    • With two classes and three raters, we can always build a ”ground
    truth” for a sample! [4]
    • …but, in reality, we are hiding disagreements (=the true distribution
    of hate interpretation)
    • …even two raters can disagree on ”obvious” cases!
    • Solution: drop this majority vote paradigm for annotation. Either
    model empirical distribution [5] [hateful = 0.2, neutral = 0.5, non-
    hateful = 0.3] or built socio-demographic specific models. There
    is no ground truth when it comes to hate; just a distribution.
    Want to prove me wrong? Then, build a case for immutably
    hateful messaging.

    View Slide

  10. Immutable & ”grey zone” toxicity
    • Immutably toxic = perceived as toxic across all possible user
    groups
    • Grey zone toxic = can be toxic or not toxic depending on the
    context and person interpreting
    • How many toxic comments in current SOTA hate datasets
    would fall into each category? (We don’t know.)

    View Slide

  11. Immutable & ”grey zone” toxicity
    • Are there immutably toxic words or phrases? (immutably toxic =
    no way to use them except in a toxic way)
    • …if there are, at least these are very rare…
    • The implications:
    • Systematically use adverse examples for training (very rare, never
    seen) ➔ negations, exaggeration. SOLUTION: Active learning
    instead of cross-sectional one time training.
    • What are the blind spots of current classifiers (i.e., what comments fool
    them all?) SOLUTION: error analyses, creating artificial examples
    or finding real ones to fix the blind spots.

    View Slide

  12. Example, n=26, two raters rating
    ”immutably” toxic and ”immutably non-toxic”
    fake comments.
    • “Your total percentage agreement is only 53.8%.”
    • “Your agreement for the Diagnostic comments that were newly
    created (and that should ALL be considered as non-toxic) is
    69.2%.”
    • “Your agreement for the Hateful comments that were derived
    from the actual datasets is 38.5%.”
    “Take out the white trash can” was considered toxic by a
    person. Even after discussion!

    View Slide

  13. ISSUE 6: Out of context
    • All hate datasets I’ve seen are annotated without giving the
    annotators any context of the messages. Only the message
    is given, and then asked to label it ”toxic” or ”not toxic”.
    • Toxic in response to what? In what context?
    • Solution: I don’t know: maybe LSTM / Transformers for the
    whole context? The issue is that subjectivity is a source of
    cross-sectional ”error” and context is a source of temporal
    (sequential) error ---- error margins from both sides make this
    hard!

    View Slide

  14. ISSUE 7: “AI” doesn’t exist.
    • No understanding of human experience
    • No concept of what hate is ---- what culture is, what norms are, what
    society is [6].
    • ”AI” is the most misleading word of the century – what we do is
    statistical learning (counting numbers, mapping words and
    phrases with some other words and phrases that have meaning
    for US, but none for the algorithms)
    • Algorithms don’t create mental models (like people do when we are
    learning) – they create probabilistic mappings. Totally different from
    human intelligence.
    • In consequence, most ML classifiers are glorified word frequency
    counters that don’t generalize beyond their training sets, precisely
    because of the above reasons.

    View Slide

  15. This is ”AI”
    (”glorified
    techniques for
    counting words”
    –Erik Cambria,
    2019)

    View Slide

  16. ISSUE 8: Lack of user studies,
    interventions, and IMPLEMENTATION.
    • Intervention, almost totally missing. Why aren’t the models put
    into practice? Why aren’t they tested with real users?
    • What can actually be done to mitigate toxicity?
    • Should we focus on things like plain ol’ good manners? (netiquette)
    • What’s the role of systems in the first place?
    • What does human in the loop mean for hate detection?
    • Anger stems from root causes that may not be computational
    but social problems. Hence, computational techniques may
    have very limited means in solving them. Consequently, cross-
    disciplinary collaboration (sociologists, psychologists,
    historians) is required (but is way too rare).

    View Slide

  17. Cont’d
    • Ironically, human-computer interaction (HCI) has great opportunities
    at developing interaction techniques that would bring people closer,
    increase empathy, and improve the health of online communities.
    They talk about this is a lot.
    • …but, where are these studies? (It’s 2022, still waiting.)
    • Some ideas:
    • Recommending non-hateful ways of saying the same thing.
    • Cool-off periods for heated discussions (“think 10 seconds before posting”).
    • Sensory control (introduction of pleasant sounds / pictures to calm down
    people).
    • Newsfeed design to REDUCE anxiety and polarization, not increase it.
    (Facebook, why aren’t you doing this?)
    • Directing toxic people to human helpers. These are PEOPLE with problems,
    not problems to hide away.

    View Slide

  18. Some ways forward (for cross-disciplinary
    collab)
    • Propaganda (history 1920s == 2020s?)
    • Self perpetuating media bias
    (platform agency, implicit agenda setting,
    clickbait incentives ➔ toxicity ==
    engagement == money)
    • Social media addiction, FOMO,
    impression management ➔ online hate
    • Neurological reactions to feed content
    types (political/negative) ➔ toxicity
    correlations, intervention effects… (eye-
    tracking, EEG; methodological expansion)
    ”Those who don’t know history
    are doomed to repeat it.”

    View Slide

  19. Some ways forward (for ML-based approaches)
    • User modeling (measure background variables, not just
    anonymous “crowd” → Garbage in, garbage out!)
    • Use a numerical scale, not binary classes (e.g., 10-points, going
    from low toxicity to high toxicity).
    • Capture the context, present it to annotators, and make use of it
    in the modeling.
    • Sample more than three people per row to build training sets,
    and predict empirical distribution instead of single class label [4]

    View Slide

  20. Some ways forward (for the field as a whole)
    • Forget detection/classification/scoring! (Red herrings)
    • Get to the ROOT of the problem – why are people angry?
    • Engage in cross-disciplinary collaboration with social scientists
    (online hate is not a CS / NLP problem; it’s a human problem)

    View Slide

  21. References
    • [1]Joni Salminen, Hind Almerekhi, Ahmed Mohamed Kamel, Soon-gyo Jung, and Bernard J. Jansen. 2019. Online Hate
    Ratings Vary by Extremes: A Statistical Analysis. In Proceedings of the 2019 Conference on Human Information
    Interaction and Retrieval (CHIIR ’19), ACM, Glasgow, Scotland, UK, 213–217. DOI:
    https://doi.org/10.1145/3295750.3298954
    • [2] Sara Sood, Judd Antin, and Elizabeth Churchill. 2012. Profanity Use in Online Communities. In Proceedings of the
    SIGCHI Conference on Human Factors in Computing Systems (CHI ’12), ACM, New York, NY, USA, 1481–1490.
    • [3] Arthur L. Bowley. 1901. Elements of Statistics. Journal of the Institute of Actuaries 2, (1901).
    • [4] Joni Salminen, Ahmed Mohamed Kamel, Soon-Gyo Jung, and Bernard Jansen. 2021. The Problem of Majority
    Voting in Crowdsourcing with Binary Classes. In Proceedings of 19th European Conference on Computer-Supported
    Cooperative Work, European Society for Socially Embedded Technologies (EUSSET), Zurich, Switzerland. DOI:
    https://doi.org/10.18420/ecscw2021_n12
    • [5] Ellery Wulczyn, Nithum Thain, and Lucas Dixon. 2017. Ex Machina: Personal Attacks Seen at Scale. In Proceedings
    of the 26th International Conference on World Wide Web (WWW ’17), International World Wide Web Conferences
    Steering Committee, Republic and Canton of Geneva, Switzerland, 1391–1399. DOI:
    https://doi.org/10.1145/3038912.3052591
    • [6] Jonna Häkkilä, Mikael Wiberg, Nils Johan Eira, Tapio Seppänen, Ilkka Juuso, Maija Mäkikalli, and Katrin Wolf. 2020.
    Design Sensibilities-Designing for Cultural Sensitivity. In Proceedings of the 11th Nordic Conference on Human-
    Computer Interaction: Shaping Experiences, Shaping Society, 1–3.

    View Slide

  22. Our Online Hate Research
    1. Anatomy of Online Hate: Developing a Taxonomy and Machine Learning Models for Identifying and Classifying Hate
    in Online News Media
    2. Neural Network Hate Deletion: Developing a Machine Learning Model to Eliminate Hate from Online Comments
    3. Online Hate Interpretation Varies by Country, But More by Individual: A Statistical Analysis Using Crowdsourced
    Ratings
    4. Detecting Toxicity Triggers in Online Discussions
    5. Online Hate Ratings Vary by Extremes: A Statistical Analysis
    6. Exploring the Relationship Between Game Content and Culture-based Toxicity: A Case Study of League of Legends
    and MENA Players
    7. Analyzing Hate Speech Toward Players from the MENA in League of Legends
    8. Mapping online hate: A scientometric analysis on research trends and hotspots in research on online hate
    9. Developing an online hate classifier for multiple social media platforms
    10. Topic-driven toxicity: Exploring the relationship between online toxicity and news topics
    11. Four Types of Toxic People: Characterizing Online Users' Toxicity over Time
    12. Are These Comments Triggering? Predicting Triggers of Toxicity in Online Discussions

    View Slide

  23. Thanks for listening!
    Dr. Joni Salminen
    [email protected]
    Be in touch if you’re
    interested in collaboration on
    online hate research!

    View Slide