Proactively Reducing the Hate Intensity of
Online Posts via Hate Speech
Normalization
Sarah Masud, Manjot Bedi, Mohammad Aflah Khan, Md Shad Akhtar, Tanmoy Chakraborty
IIIT-Delhi, India; Northeastern University, USA
Slide 2
Slide 2 text
Disclaimer:
Subsequent content has extreme language
(verbatim from social media), which does not
reflect the opinions of myself or my collaborators.
Reader’s discretion is advised.
Slide 3
Slide 3 text
Hate Speech
&
Hate Intensity
Slide 4
Slide 4 text
Clouded by Malicious Online Content
Cyber Bullying
Abuse
Offense
Aggression Provocation
Toxicity
Spam
Fake News
Rumours
Hate Speech
Trolling
● Anonymity has lead to increase in
anti-social behaviour [1], hate
speech being one of them.
● They can be studied at a
macroscopic as well as microscopic
level.
● Exists in various mediums
Fraud
[[1]: Super, John, CyberPsychology & Behavior, 2004
Rumours
Personal
Attacks
Slide 5
Slide 5 text
Hatred is an age old problem
[1]: Wiki
[2]: Youtube
[3], [4]: Anti-Sematics Schooling
[5]: Radio and Rawanda, Image
Fig 1 : List of Extremist/Controversial SubReddits [1]
Fig3, 4: Twitter hate Speech [3]
Fig 2: Youtube Video Incident to Violence and
Hate Crime [2]
Fig 5: Rwanda Genocide, 1994 [5]
“I will surely kill thee“
Story of Cain and Abel
Slide 6
Slide 6 text
Definition of Hate Speech
● Hate is subjective, temporal and cultural in
nature.
● UN defines hate speech as “any kind of
communication that attacks or uses
pejorative or discriminatory language
with reference to a person or a group on
the basis of who they are.” [1]
[1]: UN hate
[3]: Pyramid of Hate Fig 1: Pyramid of Hate [2]
Slide 7
Slide 7 text
Hate Intensity
● Intensity/Severity of hate speech captures the
explicitness of hate speech.
● High Intensity hate is more likely to contain
offensive lexicon, and offensive spans, direct
attacks and mentions of target entity.
Consuming Coffee is bad, I hate it! (the world
can live with this opinion)
Lets bomb every coffee shop and kill all
coffee makers (this is a threat)
Fig 1: Pyramid of Hate [1]
[1]: Pyramid of Hate
Slide 8
Slide 8 text
Motivation & Evidence for
Proactive Nudging
Slide 9
Slide 9 text
How to Combat Hate Speech
Reactive countering
When a hateful post has been made and we
are intervening to prevent it further spreading.
Proactive countering
Intervene before the post goes public
[1]: Mudit et al.
Fig 1: Strategies for countering hate speech[1]
Slide 10
Slide 10 text
Literature Overview: Intervention
during Tweet creation
● 200k users identified in the study. 50% randomly assigned to the
control group
● H1: Are prompted users less likely to post the current offensive
content.
● H2: Are prompted users less likely to post content in future.
[1]: Katsaros et al., ICWSM ‘22
Fig 1: User behaviour statistics as a part of intervention study [1]
Fig 2: Twitter reply test for offense replies. [1]
Slide 11
Slide 11 text
Motivation & Evidence
● Reducing intensity is the stepping stone towards non-hate.
● Does not force to change sentiment or opinion.
● Evidently leads to less virality.
Fig 1: Difference in predicted
number of comments per set per
iteration. [1]
[1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
Slide 12
Slide 12 text
Data Set
Slide 13
Slide 13 text
NACL Dataset
● Hateful samples collected from existing Hate Speech datasets.
● Manually annotated for Hate intensity and hateful spans.
● Hate Intensity is marked on a scale of 1-10.
● Manual generation of normalised counter-part and its intensity. (k = 0.88)
Fig 1: Original and Normalised Intensity Distribution [1]
Fig 2: Dataset Stats [1]
[1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
Slide 14
Slide 14 text
NACL Dataset: Annotation Guideline
● A sexist or racist slur term, or an abusive
term directly attacking a minority
group/individual.
● A phrase that advocated violent action or
hate crime against a group/individual.
● Negatively stereotyping a group/individual
with unfounded claims or false criminal
accusations.
● Hashtag(s) supporting one or more of the
points as mentioned earlier.
Span Labelling [1]
● Score[8 − 10]: The sample promotes hate
crime and calls for violence against the
individual/group.
● Score[6 − 7]: The sample is mainly composed
of sexist/racist terms or portrays a sense of
gender/racial superiority on the part of the
person sharing the sample.
● Score[4 − 5]: Mainly consists of offensive
hashtags, or most hateful phrases are in the
form of offensive hashtags.
● Score[1 − 3]: The sample uses dark humor or
implicit hateful term.
Intensity Labelling [1]
[1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
Slide 15
Slide 15 text
Problem Definition
Slide 16
Slide 16 text
Problem Statement
For a given hate sample 𝑡, our objective is to obtain its normalized (sensitised) form 𝑡` such
that the intensity of hatred 𝜙𝑡 is reduced while the meaning still conveys. [1]
𝜙
𝑡`
< 𝜙
𝑡
Fig [1]: Example of original high intensity vs normalised sentence [1]
[1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
Slide 17
Slide 17 text
Proposed Method: NACL- Neural hAte speeCh normaLizer
Fig 1: Flowchart of
NACL [1]
[1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
Slide 18
Slide 18 text
Proposed Model
Slide 19
Slide 19 text
Proposed Method: NACL- Neural hAte speeCh normaLizer
[1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
Fig 1. Architecture overview of NACL [1]
Slide 20
Slide 20 text
Hate Intensity Prediction (HIP)
● Takes the raw tweet as inputs and
returns the hate intensity of the
tweet.
● BERT as base model.
● Hierarchical BiLSTM with self
attention.
● Linear activation.
[1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
Fig 1. NACL-HIP module workflow [1]
Slide 21
Slide 21 text
Hate Intensity Prediction (HIP)
[1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
Fig 1:
Baseline
comparison
for
NACL-HIP
module. [1]
Slide 22
Slide 22 text
Hateful Span
Identification (HSI)
● A sample can have multiple,
non-overlapping hate spans.
● Use the BIO notation for each
token in a sentence.
● BiLSTM + CRF
[1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
Fig 1: Overview of NACL-HSI module. [1]
Slide 23
Slide 23 text
Hateful Span Identification (HSI)
[1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
Fig 1: Baseline
comparison for
NACL-HSI module. [1]
Slide 24
Slide 24 text
Hate Intensity
Reduction
Overall Loss
Reward
Fig 1: Hate Normalization Framework [1]
[1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
Slide 25
Slide 25 text
Hate Intensity Reduction (HIR)
[1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
Fig 1: Baseline
comparison for
NACL-HIR module. [1]
Slide 26
Slide 26 text
More Evaluations
Slide 27
Slide 27 text
Extrinsic Evaluation
Average difference in confidence score of
Pre-trained Hate speech classifiers (Hate
probabilities), for the normalised and hateful
samples.
[1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
Fig 1: Analysing the reducing in hate confidence for normalised samples vs their non-modified counterpart
Slide 28
Slide 28 text
Human Evaluation
● Employ 20 diverse users to measure the quantitativeness of the generated texts.
● Also, performed less restrictive human evaluation comparing outputs for texts input by the user's
themselves and normalised against our system and a baseline
Fig 1: Human evaluation of generated normalised samples on the metrics of intensity reduction, fluency/coherence and adequacy [1].
[1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
Slide 29
Slide 29 text
Cross Platform Evaluation
● Pick 100 random samples from Reddit, GAB and Facebook HS datasets respectively.
● Pass them through our model and baseline models.
● Have 2 experts annotate the output for Intensity, Fuency and Adequacy values on Likert-type
scale.
[1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
Fig 1: Human Evaluation on 3 other platforms [1]
Slide 30
Slide 30 text
Deployed Tool: Detects Hateful spans and suggests changes as you type.
[1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
Fig 1: The normalization module kicking into action upon detection of mild or extreme hate [1]
Slide 31
Slide 31 text
Error Analysis
[1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
● Sometimes the normalised sentence is still highly hateful it is both
an error of intensity prediction as well as spans, and sometimes
only of individual models.
● Cascading failure!l
Fig 1: Errors in NACL-HSR (rephrasing module) [1]
Slide 32
Slide 32 text
Links
● Paper: https://arxiv.org/abs/2206.04007
● Code: https://github.com/LCS2-IIITD/Hate_Norm
● Queries: Sarah [email protected]
● Follow LCS2 lab on Twitter: lcs2iiitd
[1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022