Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CSE_KetchupTalk_Sarah.pdf

_themessier
January 27, 2023

 CSE_KetchupTalk_Sarah.pdf

_themessier

January 27, 2023
Tweet

More Decks by _themessier

Other Decks in Research

Transcript

  1. Disclaimer: Subsequent content has extreme language (verbatim from social media),

    which does not reflect the opinions of myself or my collaborators. Reader’s discretion is advised.
  2. Hate Speech Intensity • Hate is subjective, temporal and cultural

    in nature. • Intensity/Severity of hate speech captures the explicitness of hate speech. • High Intensity hate is more likely to contain offensive lexicon, and offensive spans, direct attacks and mentions of target entity. Consuming Coffee is bad, I hate it! (the world can live with this opinion) Let us bomb every coffee shop and kill all coffee makers (this is a threat) Pyramid of Hate [1] [1]: Pyramid of Hate
  3. Internet’s policy w.r.t curbing Hate Moderated • Twitter • Facebook

    • Instagram • Youtube Semi- Moderated • Reddit Unmoderated • Gab • 4chan • BitChute • Parler • StormFront Anonymity has lead to increase in anti-social behaviour [1], hate speech being one of them [2]. [1]: J Suler [2]: Luke Munn
  4. How to Combat Hate Speech Reactive countering When a hateful

    post has been made and we are intervening to prevent it further spreading. Proactive countering Intervene before the post goes public Strategies for countering hate speech [3] [3]: Mudit et al.
  5. • Warn/report/flag the user who has posted. • Generate a

    text that counters the existing hate. • Ask influential members of the community to help spread the counter narrative. Reactive Methods of Intervention [4] Manoel Horta Ribeiro et al. Hate Interventions on Web
  6. [5]: Dominik Hangartner et al. [6]: Serra Sinem Tekiroglu et

    al. Data Collection Strategy for Counter Narration • CRAWL: (Real-world samples of both hate and counter-hate) • CROWD: (Real-world samples of hate and synthetic samples of counter-hate) • NICHE: (Synthetic samples of both hate and counter-hate) Characteristics of counter hate dataset [6] Countering hate speech on Twitter [5]
  7. Literature Overview: Intervention during Tweet creation • 200k users identified

    in the study. 50% randomly assigned to the control group • H1: Are prompted users less likely to post the current offensive content. • H2: Are prompted users less likely to post content in future. [7]: Katsaros et al., ICWSM ‘22 User behaviour statistics as a part of intervention study [7] Twitter reply test for offense replies. [7]
  8. Problem Statement For a given hate sample 𝑡, our objective

    is to obtain its normalized (sensitised) form 𝑡` such that the intensity of hatred 𝜙𝑡 is reduced while the meaning still conveys. [1] 𝜙 𝑡` < 𝜙 𝑡 Example of original high intensity vs normalised sentence [8] [8]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
  9. NACL Dataset • Hateful samples collected from existing Hate Speech

    datasets. • Manually annotated for Hate intensity and hateful spans. • Hate Intensity is marked on a scale of 1-10. • Manual generation of normalised counter-part and its intensity. (k = 0.88) Original and Normalised Intensity Distribution [8] Dataset Stats [8] [8]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
  10. NACL Dataset: Annotation Guideline • A sexist or racist slur

    term, or an abusive term directly attacking a minority group/individual. • A phrase that advocated violent action or hate crime against a group/individual. • Negatively stereotyping a group/individual with unfounded claims or false criminal accusations. • Hashtag(s) supporting one or more of the points as mentioned earlier. Span Labelling [1] • Score[8 − 10]: The sample promotes hate crime and calls for violence against the individual/group. • Score[6 − 7]: The sample is mainly composed of sexist/racist terms or portrays a sense of gender/racial superiority on the part of the person sharing the sample. • Score[4 − 5]: Mainly consists of offensive hashtags, or most hateful phrases are in the form of offensive hashtags. • Score[1 − 3]: The sample uses dark humor or implicit hateful term. Intensity Labelling [1] [8]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
  11. Motivation & Evidence • Reducing intensity is the stepping stone

    towards non-hate. • Does not force to change sentiment or opinion. • Evidently leads to less virality. Fig 1: Difference in predicted number of comments per set per iteration. [1] [8]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
  12. Flowchart of NACL [1] [8]: Masud et al., Proactively Reducing

    the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022 Proposed Method: NACL- Neural hAte speeCh normaLizer
  13. BART: An overview [9]: Mike Lewis et al. [1]: Intro

    to BART BART is a Seq2Seq Encoder-Decoder transformer based architecture. [9] Encoder-Decoder token masking scheme [1]
  14. Hate Intensity Reduction Overall Loss Reward Hate Normalization Framework [8]

    [8]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
  15. Hate Intensity Reduction (HIR) [8]: Masud et al., Proactively Reducing

    the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022 Fig 1: Baseline comparison for NACL-HIR module. [1]
  16. Reference Literature & Resources 1. The Online Disinhibition Effect 2.

    Angry by design: toxic communication and technical architectures 3. Countering Online Hate Speech: An NLP Perspective 4. Automated Content Moderation Increases Adherence to Community Guidelines 5. Empathy-based counterspeech can reduce racist hate speech in a social media field experiment 6. Generating Counter Narratives against Online Hate Speech: Data and Strategies 7. Reconsidering Tweets: Intervening during Tweet Creation Decreases Offensive Content 8. Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization 9. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension