Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Application_of_NLP_in_hate_speech.pdf

_themessier
November 22, 2022

 Application_of_NLP_in_hate_speech.pdf

Lecture Discussion for CSE 556 NLP 2022.

_themessier

November 22, 2022
Tweet

More Decks by _themessier

Other Decks in Research

Transcript

  1. Applications of NLP: Hate Speech on Web Presented by: Sarah

    Masud Advisors: Dr. Tanmoy Chakraborty, Dr. Vikram Goyal Mentors/Collaborators: Dr. Shad Akhtar Wipro AI
  2. Outline • Introduction and Motivation • Overview of Datasets •

    Hate Speech Detection ◦ Text ◦ Multimodal • Associated tasks ◦ Overview ◦ Normalization • Concluding Remarks Disclaimer: Subsequent content has extreme language (verbatim from social media), which does not reflect the opinions of myself or my collaborators. Reader’s discretion is advised.
  3. Clouded by Malicious Online Content Cyber Bullying Abuse Offense Aggression

    Provocation Toxicity Spam Fake News Rumours Hate Speech Trolling • Anonymity has lead to increase in anti-social behaviour [1], hate speech being one of them. • They can be studied at a macroscopic as well as microscopic level. • Exists in various mediums. Fraud [[1]: Super, John, CyberPsychology & Behavior, 2004 Rumours Personal Attacks
  4. Definition of Hate Speech • Hate is subjective, temporal and

    cultural in nature. • UN defines hate speech as “any kind of communication that attacks or uses pejorative or discriminatory language with reference to a person or a group on the basis of who they are.” [1] • Need sensitisation of social media users. [1]: UN hate [2]: Pyramid of Hate Fig 1: Pyramid of Hate [2]
  5. Internet’s policy w.r.t curbing Hate Moderated • Twitter • Facebook

    • Instagram • Youtube Semi- Moderated • Reddit Unmoderated • Gab • 4chan • BitChute • Parler • StormFront • Anonymity has lead to increase in anti-social behaviour [1], hate speech being one of them. • They can be studied at a macroscopic as well as microscopic level. [2] • Exists in various mediums. [1]: Super, John, CyberPsychology & Behavior, 2004 [2]: Luke Munn, Humanities and Social Sciences Communication, Article 53
  6. Workflow for Analysing and Mitigating Hate Speech [1]: Tanmoy and

    Sarah, Nipping in the bud: detection, diffusion and mitigation of hate speech on social media, ACM SIGWEB Winter, Invited Publication Fig 1: Various components of Hate Speech on Web [1]
  7. Subjectivity in Annotating Hate • Rules of Grammar? ◦ IIIT-Delhi

    is an ____. • Verifiability? ◦ Is this statement factual? • Target and context? ◦ Anyone can offend anyone, but can every offense be hateful? • Reclaimed slurs, who is talking to whom? ◦ Nigger vs Nigga • What are we trying to capture [2] Fig 1: Different Annotation groups provide different perception of hatefulness [1] [1]: Jan Kocon et al., Info Processing & Management’21 [2]: Rottger et al., NAACL’22
  8. Literature Overview: Hate Dataset Dataset Source & Language (Modality) Year

    Labels Annotation Waseem & Hovy [1] Twitter, English, Texts 2016 R, S, N 16k, E, k = 0.84 Davidson et al. [2] Twitter, English, Texts 2017 H,O,N 25k, C, k = 0.92 Wulczyn et al. [3] Wikipedia comments, English, Texts 2017 PA, N 100k, C, k = 0.45 Gibert et al. [5] Stormfront, English, Texts 2018 H, N 10k, k = 0.62 Founta et al. [4] Twitter, English, Texts 2018 H, A, S M, N 70k, C, k = ? Albadi et al. [6] Twitter, Arabic, Texts 2018 H, N 6k, C, k = 0.81 R- Racism S- Sexism H- Hate PA- Personal Attack A- Abuse SM- Spam O- Offensive L- Religion N- Neither I- Implicit E- Explicit [1]: Waseem & Hovy, NAACL’16 [2]: Davidson et al., WebSci’17 [3]: Wulczyn et al., WWW’17 [4]: Founta at al., WebSci’18 [5]: Gibert et al., ALW2’18 [6]: Albadi et al., ANLP’20 E- Internal Experts C- Crowd Sourced
  9. Dataset Source & Language (Modality) Year Labels Annotation Mathur et

    al. [1] Twitter, Hinglish, Texts 2018 H, O, N 3k, E, k = 0.83 Rizwan et al. [3] Twitter, Urdu (Roman Urdu), Texts 2020 A, S, L, P, N 10k, E, k=? Gomez et al. [4] Twitter, English, Memes 2020 H, N 150k, C, k = ? El Sherif et al. [11] Twitter, English, Texts 2021 I, E, N Literature Overview: Hate Dataset [1]: Mathur et al., AAAI’20 [3]: Rizwan et al., EMNLP’19 [4]: Gomez et al., WACv’20 • HASOC [5], Jigsaw Kaggle [6], SemEval [7], FB Hate-Meme Challenge [8], • WOAH [9], CONSTRAINT [10] [5]: HASOC [6]: Jigsaw Kaggle [7]: SemEval [8]: FB Hate-Meme [9]: WOAH [10]: CONSTRAINT [11]: ElSheried et al., EMNLP’21 E- Internal Experts C- Crowd Sourced R- Racism S- Sexism H- Hate PA- Personal Attack A- Abuse SM- Spam O- Offensive L- Religion N- Neither I- Implicit E- Explicit
  10. Detection Text Multimodal Video Image Contextual Embeddings Tf-idf vectorizer Non-contextual

    Embeddings Network Connections Historical Information News Trends Literature Overview: Hate Detection
  11. Literature Overview: Hate Detection • N-gram Tf-idf + LR/SVM [1,2]

    • Glove + CNN, RNN [3] • Transformer based ◦ Zero , Few Shot [4] ◦ Fine-tuning [5] ◦ HateBERT [6] • Generation for classification [7,11] • Multimodality ◦ Images [8] ◦ Historical Context [9] ◦ Network and Neighbours [10] ◦ News, Trends, Prompts [11] [1]: Waseem & Hovy, NAACL’16 [2]: Davidson et al., WebSci’17 [3]: Barjatiya et al., WWW’17 [4]: Pelican et al., EACL Hackashop’21 [5]: Tomer et al. ,EMNLP’21 [6]: Caselli et al., WOAH’21 [7]: Ke-Li et al. [8]: Kiela et al., NeuIPS’20 [9]: Qian et al., NAACL’19 [10]: Mehdi et al., IJCA’20, Vol 13 [11]: Badr et al.,
  12. • Dictionaries based detection of explicitly hateful content. [1] •

    N-gram character and word level. [2,3] • Readability score of a sentence. [3] Base Features [1]: Vargas et al, RANLP’21 [2]: Waseem & Hovy, NAACL’16 [3]: Davidson et al., WebSci’17 [4]: Wiegand et al., NAACL’18 Fig 1: Hate Lexicon based on POS [4] Fig 2: Hate character n-gram [2]
  13. Linguistic Features • Davidson’s HS Detection was motivated by [2]:

    ◦ Length of comments. ◦ # Punctuations, Capitalization. ◦ URLs, Hashtags, emojis etc. ◦ Sentiment score. ◦ Readability score. Syntactic Features • POS tags [1,2] [1]: Davidson et al., WebSci’17 [2]: Giuseppe & Roberto, HaSpeeDe, EVALITA’20 Fig 1: Adding various POS tags to hateful tweet [2] Fig 2: Infusing POS for hate detection via CNN [2]
  14. Non-Contextual Embedding [1]: Barjatiya et al., WWW’17 • Using W&H

    Dataset apply various Vanilla DL methods, beyond char-ngram [1] • Observation: Ensemble combination with trainable embeddings works better Fig 1: Word affinity for offensive terms increasing post training [1] Fig 2: Performance comparison on W&H dataset [1]
  15. HateBERT: Retraining BERT for Abusive Language in English Large Scale

    General LM large scale unlabeled Corpus from various sources across the web Mask prediction (Unsupervised generalised training) Training a Large LM (LLM) from Scratch: Pre-training Initialise LM with random weights Saved LM with trained weights Medium sized unlabeled Corpus from a specific domain Mask prediction (Unsupervised domain specific training) Training a Large LM (LLM) from Saved checkpoint: Continued Pre-training Load LM with trained weights Large Scale Domain LM Saved LM with updated weights
  16. Large Scale Fine-tuned LM Labelled Corpus Add a classification head

    on top of LLM Training a Large LM (LLM) for a task: Fine-tuning Initialise classifier layer with random weights and load the LLM with trained weights Saved LM with updated weights + classifier with trained weights Large Scale LM Unlabelled Domain Specific Corpus Add a softmax on top of LLM No training a Large LM (LLM) for a task: Zero-shot Initialise classifier layer with frozen LM weights No update in any weights —--X—- —--X—- —-Y—- - HateBERT: Retraining BERT for Abusive Language in English
  17. HateBERT: Retraining BERT for Abusive Language in English • Obtain

    unlabelled samples of potentially harmful content from Banned or Controversial Reddit Communities. (Curated 1M+ messages) • Re-trained BERT base for Masked Language Modeling Task [1] [1]: Caselli et al., WOAH’21 Fig 1: Performance comparison on BERT vs HateBERT [1]
  18. Transformer Inspired Prompting [1]: https://www.inovex.de/de/blog/prompt-engineering-guide/ Fig 1: Obtaining Sentiment Label

    From LM without training on sentiment dataset [1] Fig 2: Obtaining Machine Translation from LM without training on MT datasets[1]
  19. Transformer Inspired Prompting Zero-Shot [1] One-shot [1] Few-shot [1] [1]:

    Ke-Li et al, arxiv [2]: https://openai.com/api/
  20. Non-Contextual + Metadata • Twitter Meta-data: ◦ # Followers ◦

    # Followee ◦ # Tweets/Retweets/Likes ◦ Account Age etc… Embedding: Glove based [1]: Founta et.al Fig 1: Concatenating textual and metadata information from tweets for hate detection [1]
  21. Non-Contextual + Network Feature • Infusing Network Information with textual

    feature [1]. • Node2vec is employed to map graphs to emb space [2]. [1]: Chowdhury et al., SRW-ACL’21 [2]: Grover et al., KDD’16 Fig 1: Infusing textual and network information for hate detection [1]
  22. Non-Contextual + External Knowledge Fig 1: Does KG infusion improve

    hate detection? [1]: Soha et al., COD-COM’21
  23. [1]: Gomez et al., WACV’20 • Proposed a collection of

    hateful/offensive images mostly shared as memes on social media platforms (MMHS150K) [1] Detection Offensive Memes Fig 1: Combining Textual (TT), OCR (IT) and Image (I) features for determining if a meme is hateful in nature [1]
  24. Non-English Hate Speech Detection [1]: Rizwan et.al, EMNLP’20 • Codemixed

    • Romanised • Native script Example: Roman Hindi Urdu [1] Fig 1: Example of offensive romanised urdu text from Pakistani Twitter [1] Fig 1: CNN based proposed module for capturing hateful roman urdu tweets [1]
  25. Implicit Hate Speech Detection • Implicit hate has no cuss

    words • Lexically resembles non-hate. • Contains innuendos, irony, sarcasm Fig 2: Examples of annotating negatively biased implied statements [2] Fig 1: Example of explicit and implicit hate on Twitter [1] [1]: ElSheried et al., EMNLP’21 [2]: Sap et al., ACL’20
  26. Hate Intensity Prediction • Intensity/Severity of hate speech captures the

    explicitness of hate speech. • High Intensity hate is more likely to contain offensive lexicon, and offensive spans, direct attacks and mentions of target entity. Consuming Coffee is bad, I hate it! (the world can live with this opinion) Lets bomb every coffee shop and kill all coffee makers (this is a threat) Fig 1: Pyramid of Hate [1] [1]: Pyramid of Hate
  27. Hate Normalization For a given hate sample 𝑡, our objective

    is to obtain its normalized (sensitised) form 𝑡` such that the intensity of hatred 𝜙𝑡 is reduced while the meaning still conveys. [1] 𝜙 𝑡` < 𝜙 𝑡 Fig [1]: Example of original high intensity vs normalised sentence [1] [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
  28. Hate Normalization: NACL- Neural hAte speeCh normaLizer Fig 1: Flowchart

    of NACL [1] [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
  29. Hate Rationale / Hate Explanation Fig 1: Annotating Rationale terms

    that contribute to hatefulness [1] Fig 1: Incorporating various knowledge sources to generating an explanation for hatefulness and unmasking the latent stereotypes [2] [1]: Mathew et al., AAAI’21 [2]: Rohit & Diyi, NAACL’22
  30. Key Takeaways • Datasets used for hate speech: ◦ There

    is a diversity of data labels, with limited overlap/uniformity. ◦ Skewed in favour of English textual content. • Methods used for hate speech detection: ◦ A vast array of techniques from classical ML to prompt based zero-shot learning have been tested. ◦ Out-of-domain performance is abysmal for most cases. ◦ Need to move towards lifelong learning, dynamic catchphrase detection methods. ◦ Study of impact of offline hate instances from online hate. • Enhancing Understanding of hateful content via: ◦ Intensity and span detection. ◦ Normalising hate to proactively counter it. ◦ Generating explanations for hateful connotations.
  31. Future Scope • Better detection of implicit hate. • Better

    detection of multimodal hate. • How psychological traits help predict hate speech? • Language-agnostic and topic-agnostic hate speech. • Explainable hate speech classifier. • Multilingual and cross-lingual hate speech. • Detection and Removal of bias in hate speech detection [1] [1]: Handling Bias in Toxic Speech Detection: A Survey: https://arxiv.org/abs/2202.00126
  32. Our Contributions • Hate is the New Infodemic: A Topic-aware

    Modeling of Hate Speech Diffusion on Twitter (ICDE 2021) • Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization (KDD 2022) • DiVA: A Scalable, Interactive and Customizable Visual Analytics Platform for Information Diffusion on Large Networks (ACM TKDD 2022) • Nipping in the bud: detection, diffusion and mitigation of hate speech on social media (ACM SIGWEB Newsletter, Invited Publication) • MOMENTA: A Multimodal Framework for Detecting Harmful Memes and Their Targets (EMNLP 2021) • Domain-aware Self-supervised Pre-training for Label-Efficient Meme Analysis (AACL-IJCNLP 2022) • Detecting and Understanding Harmful Memes: A Survey (IJCAI 2022) (Survey Track) • DISARM: Detecting the Victims Targeted by Harmful Memes (NAACL 2022) • What do you MEME? Generating Explanations for Visual Semantic Role Labelling in Memes (AAAI 2023) • Survey: Handling Bias in Toxic Speech Detection: A Survey • Tutorials Conducted: Combating Online Hate Speech (WSDM 2021, ECML PKDD 2021)