Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Proactive_Normalization_KDD2022

_themessier
August 22, 2022

 Proactive_Normalization_KDD2022

Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, accepted in ADS track, full paper at KDD 2022.
https://arxiv.org/abs/2206.04007

_themessier

August 22, 2022
Tweet

More Decks by _themessier

Other Decks in Research

Transcript

  1. Proactively Reducing the Hate Intensity of Online Posts via Hate

    Speech Normalization Sarah Masud, Manjot Bedi, Mohammad Aflah Khan, Md Shad Akhtar, Tanmoy Chakraborty IIIT-Delhi, India; Northeastern University, USA
  2. Disclaimer: Subsequent content has extreme language (verbatim from social media),

    which does not reflect the opinions of myself or my collaborators. Reader’s discretion is advised.
  3. Clouded by Malicious Online Content Cyber Bullying Abuse Offense Aggression

    Provocation Toxicity Spam Fake News Rumours Hate Speech Trolling • Anonymity has lead to increase in anti-social behaviour [1], hate speech being one of them. • They can be studied at a macroscopic as well as microscopic level. • Exists in various mediums Fraud [[1]: Super, John, CyberPsychology & Behavior, 2004 Rumours Personal Attacks
  4. Hatred is an age old problem [1]: Wiki [2]: Youtube

    [3], [4]: Anti-Sematics Schooling [5]: Radio and Rawanda, Image Fig 1 : List of Extremist/Controversial SubReddits [1] Fig3, 4: Twitter hate Speech [3] Fig 2: Youtube Video Incident to Violence and Hate Crime [2] Fig 5: Rwanda Genocide, 1994 [5] “I will surely kill thee“ Story of Cain and Abel
  5. Definition of Hate Speech • Hate is subjective, temporal and

    cultural in nature. • UN defines hate speech as “any kind of communication that attacks or uses pejorative or discriminatory language with reference to a person or a group on the basis of who they are.” [1] [1]: UN hate [3]: Pyramid of Hate Fig 1: Pyramid of Hate [2]
  6. Hate Intensity • Intensity/Severity of hate speech captures the explicitness

    of hate speech. • High Intensity hate is more likely to contain offensive lexicon, and offensive spans, direct attacks and mentions of target entity. Consuming Coffee is bad, I hate it! (the world can live with this opinion) Lets bomb every coffee shop and kill all coffee makers (this is a threat) Fig 1: Pyramid of Hate [1] [1]: Pyramid of Hate
  7. How to Combat Hate Speech Reactive countering When a hateful

    post has been made and we are intervening to prevent it further spreading. Proactive countering Intervene before the post goes public [1]: Mudit et al. Fig 1: Strategies for countering hate speech[1]
  8. Literature Overview: Intervention during Tweet creation • 200k users identified

    in the study. 50% randomly assigned to the control group • H1: Are prompted users less likely to post the current offensive content. • H2: Are prompted users less likely to post content in future. [1]: Katsaros et al., ICWSM ‘22 Fig 1: User behaviour statistics as a part of intervention study [1] Fig 2: Twitter reply test for offense replies. [1]
  9. Motivation & Evidence • Reducing intensity is the stepping stone

    towards non-hate. • Does not force to change sentiment or opinion. • Evidently leads to less virality. Fig 1: Difference in predicted number of comments per set per iteration. [1] [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
  10. NACL Dataset • Hateful samples collected from existing Hate Speech

    datasets. • Manually annotated for Hate intensity and hateful spans. • Hate Intensity is marked on a scale of 1-10. • Manual generation of normalised counter-part and its intensity. (k = 0.88) Fig 1: Original and Normalised Intensity Distribution [1] Fig 2: Dataset Stats [1] [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
  11. NACL Dataset: Annotation Guideline • A sexist or racist slur

    term, or an abusive term directly attacking a minority group/individual. • A phrase that advocated violent action or hate crime against a group/individual. • Negatively stereotyping a group/individual with unfounded claims or false criminal accusations. • Hashtag(s) supporting one or more of the points as mentioned earlier. Span Labelling [1] • Score[8 − 10]: The sample promotes hate crime and calls for violence against the individual/group. • Score[6 − 7]: The sample is mainly composed of sexist/racist terms or portrays a sense of gender/racial superiority on the part of the person sharing the sample. • Score[4 − 5]: Mainly consists of offensive hashtags, or most hateful phrases are in the form of offensive hashtags. • Score[1 − 3]: The sample uses dark humor or implicit hateful term. Intensity Labelling [1] [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
  12. Problem Statement For a given hate sample 𝑡, our objective

    is to obtain its normalized (sensitised) form 𝑡` such that the intensity of hatred 𝜙𝑡 is reduced while the meaning still conveys. [1] 𝜙 𝑡` < 𝜙 𝑡 Fig [1]: Example of original high intensity vs normalised sentence [1] [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
  13. Proposed Method: NACL- Neural hAte speeCh normaLizer Fig 1: Flowchart

    of NACL [1] [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
  14. Proposed Method: NACL- Neural hAte speeCh normaLizer [1]: Masud et

    al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022 Fig 1. Architecture overview of NACL [1]
  15. Hate Intensity Prediction (HIP) • Takes the raw tweet as

    inputs and returns the hate intensity of the tweet. • BERT as base model. • Hierarchical BiLSTM with self attention. • Linear activation. [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022 Fig 1. NACL-HIP module workflow [1]
  16. Hate Intensity Prediction (HIP) [1]: Masud et al., Proactively Reducing

    the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022 Fig 1: Baseline comparison for NACL-HIP module. [1]
  17. Hateful Span Identification (HSI) • A sample can have multiple,

    non-overlapping hate spans. • Use the BIO notation for each token in a sentence. • BiLSTM + CRF [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022 Fig 1: Overview of NACL-HSI module. [1]
  18. Hateful Span Identification (HSI) [1]: Masud et al., Proactively Reducing

    the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022 Fig 1: Baseline comparison for NACL-HSI module. [1]
  19. Hate Intensity Reduction Overall Loss Reward Fig 1: Hate Normalization

    Framework [1] [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
  20. Hate Intensity Reduction (HIR) [1]: Masud et al., Proactively Reducing

    the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022 Fig 1: Baseline comparison for NACL-HIR module. [1]
  21. Extrinsic Evaluation Average difference in confidence score of Pre-trained Hate

    speech classifiers (Hate probabilities), for the normalised and hateful samples. [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022 Fig 1: Analysing the reducing in hate confidence for normalised samples vs their non-modified counterpart
  22. Human Evaluation • Employ 20 diverse users to measure the

    quantitativeness of the generated texts. • Also, performed less restrictive human evaluation comparing outputs for texts input by the user's themselves and normalised against our system and a baseline Fig 1: Human evaluation of generated normalised samples on the metrics of intensity reduction, fluency/coherence and adequacy [1]. [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
  23. Cross Platform Evaluation • Pick 100 random samples from Reddit,

    GAB and Facebook HS datasets respectively. • Pass them through our model and baseline models. • Have 2 experts annotate the output for Intensity, Fuency and Adequacy values on Likert-type scale. [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022 Fig 1: Human Evaluation on 3 other platforms [1]
  24. Deployed Tool: Detects Hateful spans and suggests changes as you

    type. [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022 Fig 1: The normalization module kicking into action upon detection of mild or extreme hate [1]
  25. Error Analysis [1]: Masud et al., Proactively Reducing the Hate

    Intensity of Online Posts via Hate Speech Normalization, KDD 2022 • Sometimes the normalised sentence is still highly hateful it is both an error of intensity prediction as well as spans, and sometimes only of individual models. • Cascading failure!l Fig 1: Errors in NACL-HSR (rephrasing module) [1]
  26. Links • Paper: https://arxiv.org/abs/2206.04007 • Code: https://github.com/LCS2-IIITD/Hate_Norm • Queries: Sarah

    sarahm@iiitd.ac.in • Follow LCS2 lab on Twitter: lcs2iiitd [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022