Slide 1

Slide 1 text

Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization Sarah Masud, Manjot Bedi, Mohammad Aflah Khan, Md Shad Akhtar, Tanmoy Chakraborty IIIT-Delhi, India; Northeastern University, USA

Slide 2

Slide 2 text

Disclaimer: Subsequent content has extreme language (verbatim from social media), which does not reflect the opinions of myself or my collaborators. Reader’s discretion is advised.

Slide 3

Slide 3 text

Hate Speech & Hate Intensity

Slide 4

Slide 4 text

Clouded by Malicious Online Content Cyber Bullying Abuse Offense Aggression Provocation Toxicity Spam Fake News Rumours Hate Speech Trolling ● Anonymity has lead to increase in anti-social behaviour [1], hate speech being one of them. ● They can be studied at a macroscopic as well as microscopic level. ● Exists in various mediums Fraud [[1]: Super, John, CyberPsychology & Behavior, 2004 Rumours Personal Attacks

Slide 5

Slide 5 text

Hatred is an age old problem [1]: Wiki [2]: Youtube [3], [4]: Anti-Sematics Schooling [5]: Radio and Rawanda, Image Fig 1 : List of Extremist/Controversial SubReddits [1] Fig3, 4: Twitter hate Speech [3] Fig 2: Youtube Video Incident to Violence and Hate Crime [2] Fig 5: Rwanda Genocide, 1994 [5] “I will surely kill thee“ Story of Cain and Abel

Slide 6

Slide 6 text

Definition of Hate Speech ● Hate is subjective, temporal and cultural in nature. ● UN defines hate speech as “any kind of communication that attacks or uses pejorative or discriminatory language with reference to a person or a group on the basis of who they are.” [1] [1]: UN hate [3]: Pyramid of Hate Fig 1: Pyramid of Hate [2]

Slide 7

Slide 7 text

Hate Intensity ● Intensity/Severity of hate speech captures the explicitness of hate speech. ● High Intensity hate is more likely to contain offensive lexicon, and offensive spans, direct attacks and mentions of target entity. Consuming Coffee is bad, I hate it! (the world can live with this opinion) Lets bomb every coffee shop and kill all coffee makers (this is a threat) Fig 1: Pyramid of Hate [1] [1]: Pyramid of Hate

Slide 8

Slide 8 text

Motivation & Evidence for Proactive Nudging

Slide 9

Slide 9 text

How to Combat Hate Speech Reactive countering When a hateful post has been made and we are intervening to prevent it further spreading. Proactive countering Intervene before the post goes public [1]: Mudit et al. Fig 1: Strategies for countering hate speech[1]

Slide 10

Slide 10 text

Literature Overview: Intervention during Tweet creation ● 200k users identified in the study. 50% randomly assigned to the control group ● H1: Are prompted users less likely to post the current offensive content. ● H2: Are prompted users less likely to post content in future. [1]: Katsaros et al., ICWSM ‘22 Fig 1: User behaviour statistics as a part of intervention study [1] Fig 2: Twitter reply test for offense replies. [1]

Slide 11

Slide 11 text

Motivation & Evidence ● Reducing intensity is the stepping stone towards non-hate. ● Does not force to change sentiment or opinion. ● Evidently leads to less virality. Fig 1: Difference in predicted number of comments per set per iteration. [1] [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022

Slide 12

Slide 12 text

Data Set

Slide 13

Slide 13 text

NACL Dataset ● Hateful samples collected from existing Hate Speech datasets. ● Manually annotated for Hate intensity and hateful spans. ● Hate Intensity is marked on a scale of 1-10. ● Manual generation of normalised counter-part and its intensity. (k = 0.88) Fig 1: Original and Normalised Intensity Distribution [1] Fig 2: Dataset Stats [1] [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022

Slide 14

Slide 14 text

NACL Dataset: Annotation Guideline ● A sexist or racist slur term, or an abusive term directly attacking a minority group/individual. ● A phrase that advocated violent action or hate crime against a group/individual. ● Negatively stereotyping a group/individual with unfounded claims or false criminal accusations. ● Hashtag(s) supporting one or more of the points as mentioned earlier. Span Labelling [1] ● Score[8 − 10]: The sample promotes hate crime and calls for violence against the individual/group. ● Score[6 − 7]: The sample is mainly composed of sexist/racist terms or portrays a sense of gender/racial superiority on the part of the person sharing the sample. ● Score[4 − 5]: Mainly consists of offensive hashtags, or most hateful phrases are in the form of offensive hashtags. ● Score[1 − 3]: The sample uses dark humor or implicit hateful term. Intensity Labelling [1] [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022

Slide 15

Slide 15 text

Problem Definition

Slide 16

Slide 16 text

Problem Statement For a given hate sample 𝑡, our objective is to obtain its normalized (sensitised) form 𝑡` such that the intensity of hatred 𝜙𝑡 is reduced while the meaning still conveys. [1] 𝜙 𝑡` < 𝜙 𝑡 Fig [1]: Example of original high intensity vs normalised sentence [1] [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022

Slide 17

Slide 17 text

Proposed Method: NACL- Neural hAte speeCh normaLizer Fig 1: Flowchart of NACL [1] [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022

Slide 18

Slide 18 text

Proposed Model

Slide 19

Slide 19 text

Proposed Method: NACL- Neural hAte speeCh normaLizer [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022 Fig 1. Architecture overview of NACL [1]

Slide 20

Slide 20 text

Hate Intensity Prediction (HIP) ● Takes the raw tweet as inputs and returns the hate intensity of the tweet. ● BERT as base model. ● Hierarchical BiLSTM with self attention. ● Linear activation. [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022 Fig 1. NACL-HIP module workflow [1]

Slide 21

Slide 21 text

Hate Intensity Prediction (HIP) [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022 Fig 1: Baseline comparison for NACL-HIP module. [1]

Slide 22

Slide 22 text

Hateful Span Identification (HSI) ● A sample can have multiple, non-overlapping hate spans. ● Use the BIO notation for each token in a sentence. ● BiLSTM + CRF [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022 Fig 1: Overview of NACL-HSI module. [1]

Slide 23

Slide 23 text

Hateful Span Identification (HSI) [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022 Fig 1: Baseline comparison for NACL-HSI module. [1]

Slide 24

Slide 24 text

Hate Intensity Reduction Overall Loss Reward Fig 1: Hate Normalization Framework [1] [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022

Slide 25

Slide 25 text

Hate Intensity Reduction (HIR) [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022 Fig 1: Baseline comparison for NACL-HIR module. [1]

Slide 26

Slide 26 text

More Evaluations

Slide 27

Slide 27 text

Extrinsic Evaluation Average difference in confidence score of Pre-trained Hate speech classifiers (Hate probabilities), for the normalised and hateful samples. [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022 Fig 1: Analysing the reducing in hate confidence for normalised samples vs their non-modified counterpart

Slide 28

Slide 28 text

Human Evaluation ● Employ 20 diverse users to measure the quantitativeness of the generated texts. ● Also, performed less restrictive human evaluation comparing outputs for texts input by the user's themselves and normalised against our system and a baseline Fig 1: Human evaluation of generated normalised samples on the metrics of intensity reduction, fluency/coherence and adequacy [1]. [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022

Slide 29

Slide 29 text

Cross Platform Evaluation ● Pick 100 random samples from Reddit, GAB and Facebook HS datasets respectively. ● Pass them through our model and baseline models. ● Have 2 experts annotate the output for Intensity, Fuency and Adequacy values on Likert-type scale. [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022 Fig 1: Human Evaluation on 3 other platforms [1]

Slide 30

Slide 30 text

Deployed Tool: Detects Hateful spans and suggests changes as you type. [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022 Fig 1: The normalization module kicking into action upon detection of mild or extreme hate [1]

Slide 31

Slide 31 text

Error Analysis [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022 ● Sometimes the normalised sentence is still highly hateful it is both an error of intensity prediction as well as spans, and sometimes only of individual models. ● Cascading failure!l Fig 1: Errors in NACL-HSR (rephrasing module) [1]

Slide 32

Slide 32 text

Links ● Paper: https://arxiv.org/abs/2206.04007 ● Code: https://github.com/LCS2-IIITD/Hate_Norm ● Queries: Sarah [email protected] ● Follow LCS2 lab on Twitter: lcs2iiitd [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022

Slide 33

Slide 33 text

Thank You Q & A