Application_of_NLP_in_hate_speech.pdf

Applications of NLP: Hate Speech on Web Presented by: Sarah
Masud Advisors: Dr. Tanmoy Chakraborty, Dr. Vikram Goyal Mentors/Collaborators: Dr. Shad Akhtar Wipro AI

Outline • Introduction and Motivation • Overview of Datasets •
Hate Speech Detection ◦ Text ◦ Multimodal • Associated tasks ◦ Overview ◦ Normalization • Concluding Remarks Disclaimer: Subsequent content has extreme language (verbatim from social media), which does not reflect the opinions of myself or my collaborators. Reader’s discretion is advised.

Introduction and Motivation

Clouded by Malicious Online Content Cyber Bullying Abuse Offense Aggression
Provocation Toxicity Spam Fake News Rumours Hate Speech Trolling • Anonymity has lead to increase in anti-social behaviour [1], hate speech being one of them. • They can be studied at a macroscopic as well as microscopic level. • Exists in various mediums. Fraud [[1]: Super, John, CyberPsychology & Behavior, 2004 Rumours Personal Attacks

Definition of Hate Speech • Hate is subjective, temporal and
cultural in nature. • UN defines hate speech as “any kind of communication that attacks or uses pejorative or discriminatory language with reference to a person or a group on the basis of who they are.” [1] • Need sensitisation of social media users. [1]: UN hate [2]: Pyramid of Hate Fig 1: Pyramid of Hate [2]

Internet’s policy w.r.t curbing Hate Moderated • Twitter • Facebook
• Instagram • Youtube Semi- Moderated • Reddit Unmoderated • Gab • 4chan • BitChute • Parler • StormFront • Anonymity has lead to increase in anti-social behaviour [1], hate speech being one of them. • They can be studied at a macroscopic as well as microscopic level. [2] • Exists in various mediums. [1]: Super, John, CyberPsychology & Behavior, 2004 [2]: Luke Munn, Humanities and Social Sciences Communication, Article 53

Workflow for Analysing and Mitigating Hate Speech [1]: Tanmoy and
Sarah, Nipping in the bud: detection, diffusion and mitigation of hate speech on social media, ACM SIGWEB Winter, Invited Publication Fig 1: Various components of Hate Speech on Web [1]

Overview of Hate Speech Datasets

Subjectivity in Annotating Hate • Rules of Grammar? ◦ IIIT-Delhi
is an ____. • Verifiability? ◦ Is this statement factual? • Target and context? ◦ Anyone can offend anyone, but can every offense be hateful? • Reclaimed slurs, who is talking to whom? ◦ Nigger vs Nigga • What are we trying to capture [2] Fig 1: Different Annotation groups provide different perception of hatefulness [1] [1]: Jan Kocon et al., Info Processing & Management’21 [2]: Rottger et al., NAACL’22

Literature Overview: Hate Dataset Dataset Source & Language (Modality) Year
Labels Annotation Waseem & Hovy [1] Twitter, English, Texts 2016 R, S, N 16k, E, k = 0.84 Davidson et al. [2] Twitter, English, Texts 2017 H,O,N 25k, C, k = 0.92 Wulczyn et al. [3] Wikipedia comments, English, Texts 2017 PA, N 100k, C, k = 0.45 Gibert et al. [5] Stormfront, English, Texts 2018 H, N 10k, k = 0.62 Founta et al. [4] Twitter, English, Texts 2018 H, A, S M, N 70k, C, k = ? Albadi et al. [6] Twitter, Arabic, Texts 2018 H, N 6k, C, k = 0.81 R- Racism S- Sexism H- Hate PA- Personal Attack A- Abuse SM- Spam O- Offensive L- Religion N- Neither I- Implicit E- Explicit [1]: Waseem & Hovy, NAACL’16 [2]: Davidson et al., WebSci’17 [3]: Wulczyn et al., WWW’17 [4]: Founta at al., WebSci’18 [5]: Gibert et al., ALW2’18 [6]: Albadi et al., ANLP’20 E- Internal Experts C- Crowd Sourced

Dataset Source & Language (Modality) Year Labels Annotation Mathur et
al. [1] Twitter, Hinglish, Texts 2018 H, O, N 3k, E, k = 0.83 Rizwan et al. [3] Twitter, Urdu (Roman Urdu), Texts 2020 A, S, L, P, N 10k, E, k=? Gomez et al. [4] Twitter, English, Memes 2020 H, N 150k, C, k = ? El Sherif et al. [11] Twitter, English, Texts 2021 I, E, N Literature Overview: Hate Dataset [1]: Mathur et al., AAAI’20 [3]: Rizwan et al., EMNLP’19 [4]: Gomez et al., WACv’20 • HASOC [5], Jigsaw Kaggle [6], SemEval [7], FB Hate-Meme Challenge [8], • WOAH [9], CONSTRAINT [10] [5]: HASOC [6]: Jigsaw Kaggle [7]: SemEval [8]: FB Hate-Meme [9]: WOAH [10]: CONSTRAINT [11]: ElSheried et al., EMNLP’21 E- Internal Experts C- Crowd Sourced R- Racism S- Sexism H- Hate PA- Personal Attack A- Abuse SM- Spam O- Offensive L- Religion N- Neither I- Implicit E- Explicit

Detection Text Multimodal Video Image Contextual Embeddings Tf-idf vectorizer Non-contextual
Embeddings Network Connections Historical Information News Trends Literature Overview: Hate Detection

Literature Overview: Hate Detection • N-gram Tf-idf + LR/SVM [1,2]
• Glove + CNN, RNN [3] • Transformer based ◦ Zero , Few Shot [4] ◦ Fine-tuning [5] ◦ HateBERT [6] • Generation for classification [7,11] • Multimodality ◦ Images [8] ◦ Historical Context [9] ◦ Network and Neighbours [10] ◦ News, Trends, Prompts [11] [1]: Waseem & Hovy, NAACL’16 [2]: Davidson et al., WebSci’17 [3]: Barjatiya et al., WWW’17 [4]: Pelican et al., EACL Hackashop’21 [5]: Tomer et al. ,EMNLP’21 [6]: Caselli et al., WOAH’21 [7]: Ke-Li et al. [8]: Kiela et al., NeuIPS’20 [9]: Qian et al., NAACL’19 [10]: Mehdi et al., IJCA’20, Vol 13 [11]: Badr et al.,

Text-Based Hate Detection

• Dictionaries based detection of explicitly hateful content. [1] •
N-gram character and word level. [2,3] • Readability score of a sentence. [3] Base Features [1]: Vargas et al, RANLP’21 [2]: Waseem & Hovy, NAACL’16 [3]: Davidson et al., WebSci’17 [4]: Wiegand et al., NAACL’18 Fig 1: Hate Lexicon based on POS [4] Fig 2: Hate character n-gram [2]

Linguistic Features • Davidson’s HS Detection was motivated by [2]:
◦ Length of comments. ◦ # Punctuations, Capitalization. ◦ URLs, Hashtags, emojis etc. ◦ Sentiment score. ◦ Readability score. Syntactic Features • POS tags [1,2] [1]: Davidson et al., WebSci’17 [2]: Giuseppe & Roberto, HaSpeeDe, EVALITA’20 Fig 1: Adding various POS tags to hateful tweet [2] Fig 2: Infusing POS for hate detection via CNN [2]

Non-Contextual Embedding [1]: Barjatiya et al., WWW’17 • Using W&H
Dataset apply various Vanilla DL methods, beyond char-ngram [1] • Observation: Ensemble combination with trainable embeddings works better Fig 1: Word affinity for offensive terms increasing post training [1] Fig 2: Performance comparison on W&H dataset [1]

HateBERT: Retraining BERT for Abusive Language in English Large Scale
General LM large scale unlabeled Corpus from various sources across the web Mask prediction (Unsupervised generalised training) Training a Large LM (LLM) from Scratch: Pre-training Initialise LM with random weights Saved LM with trained weights Medium sized unlabeled Corpus from a specific domain Mask prediction (Unsupervised domain specific training) Training a Large LM (LLM) from Saved checkpoint: Continued Pre-training Load LM with trained weights Large Scale Domain LM Saved LM with updated weights

Large Scale Fine-tuned LM Labelled Corpus Add a classification head
on top of LLM Training a Large LM (LLM) for a task: Fine-tuning Initialise classifier layer with random weights and load the LLM with trained weights Saved LM with updated weights + classifier with trained weights Large Scale LM Unlabelled Domain Specific Corpus Add a softmax on top of LLM No training a Large LM (LLM) for a task: Zero-shot Initialise classifier layer with frozen LM weights No update in any weights —--X—- —--X—- —-Y—- - HateBERT: Retraining BERT for Abusive Language in English

HateBERT: Retraining BERT for Abusive Language in English • Obtain
unlabelled samples of potentially harmful content from Banned or Controversial Reddit Communities. (Curated 1M+ messages) • Re-trained BERT base for Masked Language Modeling Task [1] [1]: Caselli et al., WOAH’21 Fig 1: Performance comparison on BERT vs HateBERT [1]

Transformer Inspired Prompting [1]: https://www.inovex.de/de/blog/prompt-engineering-guide/ Fig 1: Obtaining Sentiment Label
From LM without training on sentiment dataset [1] Fig 2: Obtaining Machine Translation from LM without training on MT datasets[1]

Transformer Inspired Prompting Zero-Shot [1] One-shot [1] Few-shot [1] [1]:
Ke-Li et al, arxiv [2]: https://openai.com/api/

Multimodal Hate Detection

Non-Contextual + Metadata • Twitter Meta-data: ◦ # Followers ◦
# Followee ◦ # Tweets/Retweets/Likes ◦ Account Age etc… Embedding: Glove based [1]: Founta et.al Fig 1: Concatenating textual and metadata information from tweets for hate detection [1]

Non-Contextual + Network Feature • Infusing Network Information with textual
feature [1]. • Node2vec is employed to map graphs to emb space [2]. [1]: Chowdhury et al., SRW-ACL’21 [2]: Grover et al., KDD’16 Fig 1: Infusing textual and network information for hate detection [1]

Non-Contextual + External Knowledge Fig 1: Does KG infusion improve
hate detection? [1]: Soha et al., COD-COM’21

[1]: Gomez et al., WACV’20 • Proposed a collection of
hateful/offensive images mostly shared as memes on social media platforms (MMHS150K) [1] Detection Offensive Memes Fig 1: Combining Textual (TT), OCR (IT) and Image (I) features for determining if a meme is hateful in nature [1]

Associated Tasks

Non-English Hate Speech Detection [1]: Rizwan et.al, EMNLP’20 • Codemixed
• Romanised • Native script Example: Roman Hindi Urdu [1] Fig 1: Example of offensive romanised urdu text from Pakistani Twitter [1] Fig 1: CNN based proposed module for capturing hateful roman urdu tweets [1]

Implicit Hate Speech Detection • Implicit hate has no cuss
words • Lexically resembles non-hate. • Contains innuendos, irony, sarcasm Fig 2: Examples of annotating negatively biased implied statements [2] Fig 1: Example of explicit and implicit hate on Twitter [1] [1]: ElSheried et al., EMNLP’21 [2]: Sap et al., ACL’20

Hate Intensity Prediction • Intensity/Severity of hate speech captures the
explicitness of hate speech. • High Intensity hate is more likely to contain offensive lexicon, and offensive spans, direct attacks and mentions of target entity. Consuming Coffee is bad, I hate it! (the world can live with this opinion) Lets bomb every coffee shop and kill all coffee makers (this is a threat) Fig 1: Pyramid of Hate [1] [1]: Pyramid of Hate

Hate Normalization For a given hate sample 𝑡, our objective
is to obtain its normalized (sensitised) form 𝑡` such that the intensity of hatred 𝜙𝑡 is reduced while the meaning still conveys. [1] 𝜙 𝑡` < 𝜙 𝑡 Fig [1]: Example of original high intensity vs normalised sentence [1] [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022

Hate Normalization: NACL- Neural hAte speeCh normaLizer Fig 1: Flowchart
of NACL [1] [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022

Hate Rationale / Hate Explanation Fig 1: Annotating Rationale terms
that contribute to hatefulness [1] Fig 1: Incorporating various knowledge sources to generating an explanation for hatefulness and unmasking the latent stereotypes [2] [1]: Mathew et al., AAAI’21 [2]: Rohit & Diyi, NAACL’22

Concluding Remarks

Key Takeaways • Datasets used for hate speech: ◦ There
is a diversity of data labels, with limited overlap/uniformity. ◦ Skewed in favour of English textual content. • Methods used for hate speech detection: ◦ A vast array of techniques from classical ML to prompt based zero-shot learning have been tested. ◦ Out-of-domain performance is abysmal for most cases. ◦ Need to move towards lifelong learning, dynamic catchphrase detection methods. ◦ Study of impact of offline hate instances from online hate. • Enhancing Understanding of hateful content via: ◦ Intensity and span detection. ◦ Normalising hate to proactively counter it. ◦ Generating explanations for hateful connotations.

Future Scope • Better detection of implicit hate. • Better
detection of multimodal hate. • How psychological traits help predict hate speech? • Language-agnostic and topic-agnostic hate speech. • Explainable hate speech classifier. • Multilingual and cross-lingual hate speech. • Detection and Removal of bias in hate speech detection [1] [1]: Handling Bias in Toxic Speech Detection: A Survey: https://arxiv.org/abs/2202.00126

Our Contributions • Hate is the New Infodemic: A Topic-aware
Modeling of Hate Speech Diffusion on Twitter (ICDE 2021) • Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization (KDD 2022) • DiVA: A Scalable, Interactive and Customizable Visual Analytics Platform for Information Diffusion on Large Networks (ACM TKDD 2022) • Nipping in the bud: detection, diffusion and mitigation of hate speech on social media (ACM SIGWEB Newsletter, Invited Publication) • MOMENTA: A Multimodal Framework for Detecting Harmful Memes and Their Targets (EMNLP 2021) • Domain-aware Self-supervised Pre-training for Label-Efficient Meme Analysis (AACL-IJCNLP 2022) • Detecting and Understanding Harmful Memes: A Survey (IJCAI 2022) (Survey Track) • DISARM: Detecting the Victims Targeted by Harmful Memes (NAACL 2022) • What do you MEME? Generating Explanations for Visual Semantic Role Labelling in Memes (AAAI 2023) • Survey: Handling Bias in Toxic Speech Detection: A Survey • Tutorials Conducted: Combating Online Hate Speech (WSDM 2021, ECML PKDD 2021)

Public Profile Website: sara-02.github.io Contact Official Email: [email protected] Twitter Profile:
@ _themessier

Thank You

Application_of_NLP_in_hate_speech.pdf

Application_of_NLP_in_hate_speech.pdf

More Decks by _themessier

Other Decks in Research

Featured

Transcript