Application_of_NLP_in_hate_speech.pdf

Slide 1

Slide 1 text

Applications of NLP: Hate Speech on Web Presented by: Sarah Masud Advisors: Dr. Tanmoy Chakraborty, Dr. Vikram Goyal Mentors/Collaborators: Dr. Shad Akhtar Wipro AI

Slide 2

Slide 2 text

Outline ● Introduction and Motivation ● Overview of Datasets ● Hate Speech Detection ○ Text ○ Multimodal ● Associated tasks ○ Overview ○ Normalization ● Concluding Remarks Disclaimer: Subsequent content has extreme language (verbatim from social media), which does not reflect the opinions of myself or my collaborators. Reader’s discretion is advised.

Slide 3

Slide 3 text

Introduction and Motivation

Slide 4

Slide 4 text

Clouded by Malicious Online Content Cyber Bullying Abuse Offense Aggression Provocation Toxicity Spam Fake News Rumours Hate Speech Trolling ● Anonymity has lead to increase in anti-social behaviour [1], hate speech being one of them. ● They can be studied at a macroscopic as well as microscopic level. ● Exists in various mediums. Fraud [[1]: Super, John, CyberPsychology & Behavior, 2004 Rumours Personal Attacks

Slide 5

Slide 5 text

Definition of Hate Speech ● Hate is subjective, temporal and cultural in nature. ● UN defines hate speech as “any kind of communication that attacks or uses pejorative or discriminatory language with reference to a person or a group on the basis of who they are.” [1] ● Need sensitisation of social media users. [1]: UN hate [2]: Pyramid of Hate Fig 1: Pyramid of Hate [2]

Slide 6

Slide 6 text

Internet’s policy w.r.t curbing Hate Moderated ● Twitter ● Facebook ● Instagram ● Youtube Semi- Moderated ● Reddit Unmoderated ● Gab ● 4chan ● BitChute ● Parler ● StormFront ● Anonymity has lead to increase in anti-social behaviour [1], hate speech being one of them. ● They can be studied at a macroscopic as well as microscopic level. [2] ● Exists in various mediums. [1]: Super, John, CyberPsychology & Behavior, 2004 [2]: Luke Munn, Humanities and Social Sciences Communication, Article 53

Slide 7

Slide 7 text

Workflow for Analysing and Mitigating Hate Speech [1]: Tanmoy and Sarah, Nipping in the bud: detection, diffusion and mitigation of hate speech on social media, ACM SIGWEB Winter, Invited Publication Fig 1: Various components of Hate Speech on Web [1]

Slide 8

Slide 8 text

Overview of Hate Speech Datasets

Slide 9

Slide 9 text

Subjectivity in Annotating Hate ● Rules of Grammar? ○ IIIT-Delhi is an ____. ● Verifiability? ○ Is this statement factual? ● Target and context? ○ Anyone can offend anyone, but can every offense be hateful? ● Reclaimed slurs, who is talking to whom? ○ Nigger vs Nigga ● What are we trying to capture [2] Fig 1: Different Annotation groups provide different perception of hatefulness [1] [1]: Jan Kocon et al., Info Processing & Management’21 [2]: Rottger et al., NAACL’22

Slide 10

Slide 10 text

Literature Overview: Hate Dataset Dataset Source & Language (Modality) Year Labels Annotation Waseem & Hovy [1] Twitter, English, Texts 2016 R, S, N 16k, E, k = 0.84 Davidson et al. [2] Twitter, English, Texts 2017 H,O,N 25k, C, k = 0.92 Wulczyn et al. [3] Wikipedia comments, English, Texts 2017 PA, N 100k, C, k = 0.45 Gibert et al. [5] Stormfront, English, Texts 2018 H, N 10k, k = 0.62 Founta et al. [4] Twitter, English, Texts 2018 H, A, S M, N 70k, C, k = ? Albadi et al. [6] Twitter, Arabic, Texts 2018 H, N 6k, C, k = 0.81 R- Racism S- Sexism H- Hate PA- Personal Attack A- Abuse SM- Spam O- Offensive L- Religion N- Neither I- Implicit E- Explicit [1]: Waseem & Hovy, NAACL’16 [2]: Davidson et al., WebSci’17 [3]: Wulczyn et al., WWW’17 [4]: Founta at al., WebSci’18 [5]: Gibert et al., ALW2’18 [6]: Albadi et al., ANLP’20 E- Internal Experts C- Crowd Sourced

Slide 11

Slide 11 text

Dataset Source & Language (Modality) Year Labels Annotation Mathur et al. [1] Twitter, Hinglish, Texts 2018 H, O, N 3k, E, k = 0.83 Rizwan et al. [3] Twitter, Urdu (Roman Urdu), Texts 2020 A, S, L, P, N 10k, E, k=? Gomez et al. [4] Twitter, English, Memes 2020 H, N 150k, C, k = ? El Sherif et al. [11] Twitter, English, Texts 2021 I, E, N Literature Overview: Hate Dataset [1]: Mathur et al., AAAI’20 [3]: Rizwan et al., EMNLP’19 [4]: Gomez et al., WACv’20 ● HASOC [5], Jigsaw Kaggle [6], SemEval [7], FB Hate-Meme Challenge [8], ● WOAH [9], CONSTRAINT [10] [5]: HASOC [6]: Jigsaw Kaggle [7]: SemEval [8]: FB Hate-Meme [9]: WOAH [10]: CONSTRAINT [11]: ElSheried et al., EMNLP’21 E- Internal Experts C- Crowd Sourced R- Racism S- Sexism H- Hate PA- Personal Attack A- Abuse SM- Spam O- Offensive L- Religion N- Neither I- Implicit E- Explicit

Slide 12

Slide 12 text

Detection Text Multimodal Video Image Contextual Embeddings Tf-idf vectorizer Non-contextual Embeddings Network Connections Historical Information News Trends Literature Overview: Hate Detection

Slide 13

Slide 13 text

Literature Overview: Hate Detection ● N-gram Tf-idf + LR/SVM [1,2] ● Glove + CNN, RNN [3] ● Transformer based ○ Zero , Few Shot [4] ○ Fine-tuning [5] ○ HateBERT [6] ● Generation for classification [7,11] ● Multimodality ○ Images [8] ○ Historical Context [9] ○ Network and Neighbours [10] ○ News, Trends, Prompts [11] [1]: Waseem & Hovy, NAACL’16 [2]: Davidson et al., WebSci’17 [3]: Barjatiya et al., WWW’17 [4]: Pelican et al., EACL Hackashop’21 [5]: Tomer et al. ,EMNLP’21 [6]: Caselli et al., WOAH’21 [7]: Ke-Li et al. [8]: Kiela et al., NeuIPS’20 [9]: Qian et al., NAACL’19 [10]: Mehdi et al., IJCA’20, Vol 13 [11]: Badr et al.,

Slide 14

Slide 14 text

Text-Based Hate Detection

Slide 15

Slide 15 text

● Dictionaries based detection of explicitly hateful content. [1] ● N-gram character and word level. [2,3] ● Readability score of a sentence. [3] Base Features [1]: Vargas et al, RANLP’21 [2]: Waseem & Hovy, NAACL’16 [3]: Davidson et al., WebSci’17 [4]: Wiegand et al., NAACL’18 Fig 1: Hate Lexicon based on POS [4] Fig 2: Hate character n-gram [2]

Slide 16

Slide 16 text

Linguistic Features ● Davidson’s HS Detection was motivated by [2]: ○ Length of comments. ○ # Punctuations, Capitalization. ○ URLs, Hashtags, emojis etc. ○ Sentiment score. ○ Readability score. Syntactic Features ● POS tags [1,2] [1]: Davidson et al., WebSci’17 [2]: Giuseppe & Roberto, HaSpeeDe, EVALITA’20 Fig 1: Adding various POS tags to hateful tweet [2] Fig 2: Infusing POS for hate detection via CNN [2]

Slide 17

Slide 17 text

Non-Contextual Embedding [1]: Barjatiya et al., WWW’17 ● Using W&H Dataset apply various Vanilla DL methods, beyond char-ngram [1] ● Observation: Ensemble combination with trainable embeddings works better Fig 1: Word affinity for offensive terms increasing post training [1] Fig 2: Performance comparison on W&H dataset [1]

Slide 18

Slide 18 text

HateBERT: Retraining BERT for Abusive Language in English Large Scale General LM large scale unlabeled Corpus from various sources across the web Mask prediction (Unsupervised generalised training) Training a Large LM (LLM) from Scratch: Pre-training Initialise LM with random weights Saved LM with trained weights Medium sized unlabeled Corpus from a specific domain Mask prediction (Unsupervised domain specific training) Training a Large LM (LLM) from Saved checkpoint: Continued Pre-training Load LM with trained weights Large Scale Domain LM Saved LM with updated weights

Slide 19

Slide 19 text

Large Scale Fine-tuned LM Labelled Corpus Add a classification head on top of LLM Training a Large LM (LLM) for a task: Fine-tuning Initialise classifier layer with random weights and load the LLM with trained weights Saved LM with updated weights + classifier with trained weights Large Scale LM Unlabelled Domain Specific Corpus Add a softmax on top of LLM No training a Large LM (LLM) for a task: Zero-shot Initialise classifier layer with frozen LM weights No update in any weights —--X—- —--X—- —-Y—- - HateBERT: Retraining BERT for Abusive Language in English

Slide 20

Slide 20 text

HateBERT: Retraining BERT for Abusive Language in English ● Obtain unlabelled samples of potentially harmful content from Banned or Controversial Reddit Communities. (Curated 1M+ messages) ● Re-trained BERT base for Masked Language Modeling Task [1] [1]: Caselli et al., WOAH’21 Fig 1: Performance comparison on BERT vs HateBERT [1]

Slide 21

Slide 21 text

Transformer Inspired Prompting [1]: https://www.inovex.de/de/blog/prompt-engineering-guide/ Fig 1: Obtaining Sentiment Label From LM without training on sentiment dataset [1] Fig 2: Obtaining Machine Translation from LM without training on MT datasets[1]

Slide 22

Slide 22 text

Transformer Inspired Prompting Zero-Shot [1] One-shot [1] Few-shot [1] [1]: Ke-Li et al, arxiv [2]: https://openai.com/api/

Slide 23

Slide 23 text

Multimodal Hate Detection

Slide 24

Slide 24 text

Non-Contextual + Metadata ● Twitter Meta-data: ○ # Followers ○ # Followee ○ # Tweets/Retweets/Likes ○ Account Age etc… Embedding: Glove based [1]: Founta et.al Fig 1: Concatenating textual and metadata information from tweets for hate detection [1]

Slide 25

Slide 25 text

Non-Contextual + Network Feature ● Infusing Network Information with textual feature [1]. ● Node2vec is employed to map graphs to emb space [2]. [1]: Chowdhury et al., SRW-ACL’21 [2]: Grover et al., KDD’16 Fig 1: Infusing textual and network information for hate detection [1]

Slide 26

Slide 26 text

Non-Contextual + External Knowledge Fig 1: Does KG infusion improve hate detection? [1]: Soha et al., COD-COM’21

Slide 27

Slide 27 text

[1]: Gomez et al., WACV’20 ● Proposed a collection of hateful/offensive images mostly shared as memes on social media platforms (MMHS150K) [1] Detection Offensive Memes Fig 1: Combining Textual (TT), OCR (IT) and Image (I) features for determining if a meme is hateful in nature [1]

Slide 28

Slide 28 text

Associated Tasks

Slide 29

Slide 29 text

Non-English Hate Speech Detection [1]: Rizwan et.al, EMNLP’20 ● Codemixed ● Romanised ● Native script Example: Roman Hindi Urdu [1] Fig 1: Example of offensive romanised urdu text from Pakistani Twitter [1] Fig 1: CNN based proposed module for capturing hateful roman urdu tweets [1]

Slide 30

Slide 30 text

Implicit Hate Speech Detection ● Implicit hate has no cuss words ● Lexically resembles non-hate. ● Contains innuendos, irony, sarcasm Fig 2: Examples of annotating negatively biased implied statements [2] Fig 1: Example of explicit and implicit hate on Twitter [1] [1]: ElSheried et al., EMNLP’21 [2]: Sap et al., ACL’20

Slide 31

Slide 31 text

Hate Intensity Prediction ● Intensity/Severity of hate speech captures the explicitness of hate speech. ● High Intensity hate is more likely to contain offensive lexicon, and offensive spans, direct attacks and mentions of target entity. Consuming Coffee is bad, I hate it! (the world can live with this opinion) Lets bomb every coffee shop and kill all coffee makers (this is a threat) Fig 1: Pyramid of Hate [1] [1]: Pyramid of Hate

Slide 32

Slide 32 text

Hate Normalization For a given hate sample 𝑡, our objective is to obtain its normalized (sensitised) form 𝑡` such that the intensity of hatred 𝜙𝑡 is reduced while the meaning still conveys. [1] 𝜙 𝑡` < 𝜙 𝑡 Fig [1]: Example of original high intensity vs normalised sentence [1] [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022

Slide 33

Slide 33 text

Hate Normalization: NACL- Neural hAte speeCh normaLizer Fig 1: Flowchart of NACL [1] [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022

Slide 34

Slide 34 text

Hate Rationale / Hate Explanation Fig 1: Annotating Rationale terms that contribute to hatefulness [1] Fig 1: Incorporating various knowledge sources to generating an explanation for hatefulness and unmasking the latent stereotypes [2] [1]: Mathew et al., AAAI’21 [2]: Rohit & Diyi, NAACL’22

Slide 35

Slide 35 text

Concluding Remarks

Slide 36

Slide 36 text

Key Takeaways ● Datasets used for hate speech: ○ There is a diversity of data labels, with limited overlap/uniformity. ○ Skewed in favour of English textual content. ● Methods used for hate speech detection: ○ A vast array of techniques from classical ML to prompt based zero-shot learning have been tested. ○ Out-of-domain performance is abysmal for most cases. ○ Need to move towards lifelong learning, dynamic catchphrase detection methods. ○ Study of impact of offline hate instances from online hate. ● Enhancing Understanding of hateful content via: ○ Intensity and span detection. ○ Normalising hate to proactively counter it. ○ Generating explanations for hateful connotations.

Slide 37

Slide 37 text

Future Scope ● Better detection of implicit hate. ● Better detection of multimodal hate. ● How psychological traits help predict hate speech? ● Language-agnostic and topic-agnostic hate speech. ● Explainable hate speech classifier. ● Multilingual and cross-lingual hate speech. ● Detection and Removal of bias in hate speech detection [1] [1]: Handling Bias in Toxic Speech Detection: A Survey: https://arxiv.org/abs/2202.00126

Slide 38

Slide 38 text

Our Contributions ● Hate is the New Infodemic: A Topic-aware Modeling of Hate Speech Diffusion on Twitter (ICDE 2021) ● Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization (KDD 2022) ● DiVA: A Scalable, Interactive and Customizable Visual Analytics Platform for Information Diffusion on Large Networks (ACM TKDD 2022) ● Nipping in the bud: detection, diffusion and mitigation of hate speech on social media (ACM SIGWEB Newsletter, Invited Publication) ● MOMENTA: A Multimodal Framework for Detecting Harmful Memes and Their Targets (EMNLP 2021) ● Domain-aware Self-supervised Pre-training for Label-Efficient Meme Analysis (AACL-IJCNLP 2022) ● Detecting and Understanding Harmful Memes: A Survey (IJCAI 2022) (Survey Track) ● DISARM: Detecting the Victims Targeted by Harmful Memes (NAACL 2022) ● What do you MEME? Generating Explanations for Visual Semantic Role Labelling in Memes (AAAI 2023) ● Survey: Handling Bias in Toxic Speech Detection: A Survey ● Tutorials Conducted: Combating Online Hate Speech (WSDM 2021, ECML PKDD 2021)

Slide 39

Slide 39 text

Public Profile Website: sara-02.github.io Contact Official Email: [email protected] Twitter Profile: @ _themessier

Slide 40

Slide 40 text

Thank You