Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, accepted in ADS track, full paper at KDD 2022.
https://arxiv.org/abs/2206.04007
Provocation Toxicity Spam Fake News Rumours Hate Speech Trolling • Anonymity has lead to increase in anti-social behaviour [1], hate speech being one of them. • They can be studied at a macroscopic as well as microscopic level. • Exists in various mediums Fraud [[1]: Super, John, CyberPsychology & Behavior, 2004 Rumours Personal Attacks
[3], [4]: Anti-Sematics Schooling [5]: Radio and Rawanda, Image Fig 1 : List of Extremist/Controversial SubReddits [1] Fig3, 4: Twitter hate Speech [3] Fig 2: Youtube Video Incident to Violence and Hate Crime [2] Fig 5: Rwanda Genocide, 1994 [5] “I will surely kill thee“ Story of Cain and Abel
cultural in nature. • UN defines hate speech as “any kind of communication that attacks or uses pejorative or discriminatory language with reference to a person or a group on the basis of who they are.” [1] [1]: UN hate [3]: Pyramid of Hate Fig 1: Pyramid of Hate [2]
of hate speech. • High Intensity hate is more likely to contain offensive lexicon, and offensive spans, direct attacks and mentions of target entity. Consuming Coffee is bad, I hate it! (the world can live with this opinion) Lets bomb every coffee shop and kill all coffee makers (this is a threat) Fig 1: Pyramid of Hate [1] [1]: Pyramid of Hate
post has been made and we are intervening to prevent it further spreading. Proactive countering Intervene before the post goes public [1]: Mudit et al. Fig 1: Strategies for countering hate speech[1]
in the study. 50% randomly assigned to the control group • H1: Are prompted users less likely to post the current offensive content. • H2: Are prompted users less likely to post content in future. [1]: Katsaros et al., ICWSM ‘22 Fig 1: User behaviour statistics as a part of intervention study [1] Fig 2: Twitter reply test for offense replies. [1]
towards non-hate. • Does not force to change sentiment or opinion. • Evidently leads to less virality. Fig 1: Difference in predicted number of comments per set per iteration. [1] [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
datasets. • Manually annotated for Hate intensity and hateful spans. • Hate Intensity is marked on a scale of 1-10. • Manual generation of normalised counter-part and its intensity. (k = 0.88) Fig 1: Original and Normalised Intensity Distribution [1] Fig 2: Dataset Stats [1] [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
term, or an abusive term directly attacking a minority group/individual. • A phrase that advocated violent action or hate crime against a group/individual. • Negatively stereotyping a group/individual with unfounded claims or false criminal accusations. • Hashtag(s) supporting one or more of the points as mentioned earlier. Span Labelling [1] • Score[8 − 10]: The sample promotes hate crime and calls for violence against the individual/group. • Score[6 − 7]: The sample is mainly composed of sexist/racist terms or portrays a sense of gender/racial superiority on the part of the person sharing the sample. • Score[4 − 5]: Mainly consists of offensive hashtags, or most hateful phrases are in the form of offensive hashtags. • Score[1 − 3]: The sample uses dark humor or implicit hateful term. Intensity Labelling [1] [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
is to obtain its normalized (sensitised) form 𝑡` such that the intensity of hatred 𝜙𝑡 is reduced while the meaning still conveys. [1] 𝜙 𝑡` < 𝜙 𝑡 Fig [1]: Example of original high intensity vs normalised sentence [1] [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
inputs and returns the hate intensity of the tweet. • BERT as base model. • Hierarchical BiLSTM with self attention. • Linear activation. [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022 Fig 1. NACL-HIP module workflow [1]
non-overlapping hate spans. • Use the BIO notation for each token in a sentence. • BiLSTM + CRF [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022 Fig 1: Overview of NACL-HSI module. [1]
speech classifiers (Hate probabilities), for the normalised and hateful samples. [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022 Fig 1: Analysing the reducing in hate confidence for normalised samples vs their non-modified counterpart
quantitativeness of the generated texts. • Also, performed less restrictive human evaluation comparing outputs for texts input by the user's themselves and normalised against our system and a baseline Fig 1: Human evaluation of generated normalised samples on the metrics of intensity reduction, fluency/coherence and adequacy [1]. [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
GAB and Facebook HS datasets respectively. • Pass them through our model and baseline models. • Have 2 experts annotate the output for Intensity, Fuency and Adequacy values on Likert-type scale. [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022 Fig 1: Human Evaluation on 3 other platforms [1]
type. [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022 Fig 1: The normalization module kicking into action upon detection of mild or extreme hate [1]
Intensity of Online Posts via Hate Speech Normalization, KDD 2022 • Sometimes the normalised sentence is still highly hateful it is both an error of intensity prediction as well as spans, and sometimes only of individual models. • Cascading failure!l Fig 1: Errors in NACL-HSR (rephrasing module) [1]
sarahm@iiitd.ac.in • Follow LCS2 lab on Twitter: lcs2iiitd [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022