Proactive_Normalization_KDD2022

Proactively Reducing the Hate Intensity of Online Posts via Hate
Speech Normalization Sarah Masud, Manjot Bedi, Mohammad Aflah Khan, Md Shad Akhtar, Tanmoy Chakraborty IIIT-Delhi, India; Northeastern University, USA

Disclaimer: Subsequent content has extreme language (verbatim from social media),
which does not reflect the opinions of myself or my collaborators. Reader’s discretion is advised.

Hate Speech & Hate Intensity

Clouded by Malicious Online Content Cyber Bullying Abuse Offense Aggression
Provocation Toxicity Spam Fake News Rumours Hate Speech Trolling • Anonymity has lead to increase in anti-social behaviour [1], hate speech being one of them. • They can be studied at a macroscopic as well as microscopic level. • Exists in various mediums Fraud [[1]: Super, John, CyberPsychology & Behavior, 2004 Rumours Personal Attacks

Hatred is an age old problem [1]: Wiki [2]: Youtube
[3], [4]: Anti-Sematics Schooling [5]: Radio and Rawanda, Image Fig 1 : List of Extremist/Controversial SubReddits [1] Fig3, 4: Twitter hate Speech [3] Fig 2: Youtube Video Incident to Violence and Hate Crime [2] Fig 5: Rwanda Genocide, 1994 [5] “I will surely kill thee“ Story of Cain and Abel

Definition of Hate Speech • Hate is subjective, temporal and
cultural in nature. • UN defines hate speech as “any kind of communication that attacks or uses pejorative or discriminatory language with reference to a person or a group on the basis of who they are.” [1] [1]: UN hate [3]: Pyramid of Hate Fig 1: Pyramid of Hate [2]

Hate Intensity • Intensity/Severity of hate speech captures the explicitness
of hate speech. • High Intensity hate is more likely to contain offensive lexicon, and offensive spans, direct attacks and mentions of target entity. Consuming Coffee is bad, I hate it! (the world can live with this opinion) Lets bomb every coffee shop and kill all coffee makers (this is a threat) Fig 1: Pyramid of Hate [1] [1]: Pyramid of Hate

Motivation & Evidence for Proactive Nudging

How to Combat Hate Speech Reactive countering When a hateful
post has been made and we are intervening to prevent it further spreading. Proactive countering Intervene before the post goes public [1]: Mudit et al. Fig 1: Strategies for countering hate speech[1]

Literature Overview: Intervention during Tweet creation • 200k users identified
in the study. 50% randomly assigned to the control group • H1: Are prompted users less likely to post the current offensive content. • H2: Are prompted users less likely to post content in future. [1]: Katsaros et al., ICWSM ‘22 Fig 1: User behaviour statistics as a part of intervention study [1] Fig 2: Twitter reply test for offense replies. [1]

Motivation & Evidence • Reducing intensity is the stepping stone
towards non-hate. • Does not force to change sentiment or opinion. • Evidently leads to less virality. Fig 1: Difference in predicted number of comments per set per iteration. [1] [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022

Data Set

NACL Dataset • Hateful samples collected from existing Hate Speech
datasets. • Manually annotated for Hate intensity and hateful spans. • Hate Intensity is marked on a scale of 1-10. • Manual generation of normalised counter-part and its intensity. (k = 0.88) Fig 1: Original and Normalised Intensity Distribution [1] Fig 2: Dataset Stats [1] [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022

NACL Dataset: Annotation Guideline • A sexist or racist slur
term, or an abusive term directly attacking a minority group/individual. • A phrase that advocated violent action or hate crime against a group/individual. • Negatively stereotyping a group/individual with unfounded claims or false criminal accusations. • Hashtag(s) supporting one or more of the points as mentioned earlier. Span Labelling [1] • Score[8 − 10]: The sample promotes hate crime and calls for violence against the individual/group. • Score[6 − 7]: The sample is mainly composed of sexist/racist terms or portrays a sense of gender/racial superiority on the part of the person sharing the sample. • Score[4 − 5]: Mainly consists of offensive hashtags, or most hateful phrases are in the form of offensive hashtags. • Score[1 − 3]: The sample uses dark humor or implicit hateful term. Intensity Labelling [1] [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022

Problem Definition

Problem Statement For a given hate sample 𝑡, our objective
is to obtain its normalized (sensitised) form 𝑡` such that the intensity of hatred 𝜙𝑡 is reduced while the meaning still conveys. [1] 𝜙 𝑡` < 𝜙 𝑡 Fig [1]: Example of original high intensity vs normalised sentence [1] [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022

Proposed Method: NACL- Neural hAte speeCh normaLizer Fig 1: Flowchart
of NACL [1] [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022

Proposed Model

Proposed Method: NACL- Neural hAte speeCh normaLizer [1]: Masud et
al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022 Fig 1. Architecture overview of NACL [1]

Hate Intensity Prediction (HIP) • Takes the raw tweet as
inputs and returns the hate intensity of the tweet. • BERT as base model. • Hierarchical BiLSTM with self attention. • Linear activation. [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022 Fig 1. NACL-HIP module workflow [1]

Hate Intensity Prediction (HIP) [1]: Masud et al., Proactively Reducing
the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022 Fig 1: Baseline comparison for NACL-HIP module. [1]

Hateful Span Identification (HSI) • A sample can have multiple,
non-overlapping hate spans. • Use the BIO notation for each token in a sentence. • BiLSTM + CRF [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022 Fig 1: Overview of NACL-HSI module. [1]

Hateful Span Identification (HSI) [1]: Masud et al., Proactively Reducing
the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022 Fig 1: Baseline comparison for NACL-HSI module. [1]

Hate Intensity Reduction Overall Loss Reward Fig 1: Hate Normalization
Framework [1] [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022

Hate Intensity Reduction (HIR) [1]: Masud et al., Proactively Reducing
the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022 Fig 1: Baseline comparison for NACL-HIR module. [1]

More Evaluations

Extrinsic Evaluation Average difference in confidence score of Pre-trained Hate
speech classifiers (Hate probabilities), for the normalised and hateful samples. [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022 Fig 1: Analysing the reducing in hate confidence for normalised samples vs their non-modified counterpart

Human Evaluation • Employ 20 diverse users to measure the
quantitativeness of the generated texts. • Also, performed less restrictive human evaluation comparing outputs for texts input by the user's themselves and normalised against our system and a baseline Fig 1: Human evaluation of generated normalised samples on the metrics of intensity reduction, fluency/coherence and adequacy [1]. [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022

Cross Platform Evaluation • Pick 100 random samples from Reddit,
GAB and Facebook HS datasets respectively. • Pass them through our model and baseline models. • Have 2 experts annotate the output for Intensity, Fuency and Adequacy values on Likert-type scale. [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022 Fig 1: Human Evaluation on 3 other platforms [1]

Deployed Tool: Detects Hateful spans and suggests changes as you
type. [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022 Fig 1: The normalization module kicking into action upon detection of mild or extreme hate [1]

Error Analysis [1]: Masud et al., Proactively Reducing the Hate
Intensity of Online Posts via Hate Speech Normalization, KDD 2022 • Sometimes the normalised sentence is still highly hateful it is both an error of intensity prediction as well as spans, and sometimes only of individual models. • Cascading failure!l Fig 1: Errors in NACL-HSR (rephrasing module) [1]

Links • Paper: https://arxiv.org/abs/2206.04007 • Code: https://github.com/LCS2-IIITD/Hate_Norm • Queries: Sarah
[email protected] • Follow LCS2 lab on Twitter: lcs2iiitd [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022

Thank You Q & A

Proactive_Normalization_KDD2022

Proactive_Normalization_KDD2022

_themessier

More Decks by _themessier

Other Decks in Research

Featured

Transcript

Proactively Reducing the Hate Intensity of Online Posts via Hate

Disclaimer: Subsequent content has extreme language (verbatim from social media),

Hate Speech & Hate Intensity

Clouded by Malicious Online Content Cyber Bullying Abuse Offense Aggression

Hatred is an age old problem [1]: Wiki [2]: Youtube

Definition of Hate Speech • Hate is subjective, temporal and

Hate Intensity • Intensity/Severity of hate speech captures the explicitness

Motivation & Evidence for Proactive Nudging

How to Combat Hate Speech Reactive countering When a hateful

Literature Overview: Intervention during Tweet creation • 200k users identified

Motivation & Evidence • Reducing intensity is the stepping stone

Data Set

NACL Dataset • Hateful samples collected from existing Hate Speech

NACL Dataset: Annotation Guideline • A sexist or racist slur

Problem Definition

Problem Statement For a given hate sample 𝑡, our objective

Proposed Method: NACL- Neural hAte speeCh normaLizer Fig 1: Flowchart

Proposed Model

Proposed Method: NACL- Neural hAte speeCh normaLizer [1]: Masud et

Hate Intensity Prediction (HIP) • Takes the raw tweet as

Hate Intensity Prediction (HIP) [1]: Masud et al., Proactively Reducing

Hateful Span Identification (HSI) • A sample can have multiple,

Hateful Span Identification (HSI) [1]: Masud et al., Proactively Reducing

Hate Intensity Reduction Overall Loss Reward Fig 1: Hate Normalization

Hate Intensity Reduction (HIR) [1]: Masud et al., Proactively Reducing

More Evaluations

Extrinsic Evaluation Average difference in confidence score of Pre-trained Hate

Human Evaluation • Employ 20 diverse users to measure the

Cross Platform Evaluation • Pick 100 random samples from Reddit,

Deployed Tool: Detects Hateful spans and suggests changes as you

Error Analysis [1]: Masud et al., Proactively Reducing the Hate

Links • Paper: https://arxiv.org/abs/2206.04007 • Code: https://github.com/LCS2-IIITD/Hate_Norm • Queries: Sarah

Thank You Q & A