CSE_KetchupTalk_Sarah.pdf

Proactive & Reactive Flagging of Hateful Content CSE Ketchup Talk
Jan 2023 Sarah Masud

Disclaimer: Subsequent content has extreme language (verbatim from social media),
which does not reflect the opinions of myself or my collaborators. Reader’s discretion is advised.

Hate Speech & Hate Intensity

Hate Speech Intensity • Hate is subjective, temporal and cultural
in nature. • Intensity/Severity of hate speech captures the explicitness of hate speech. • High Intensity hate is more likely to contain offensive lexicon, and offensive spans, direct attacks and mentions of target entity. Consuming Coffee is bad, I hate it! (the world can live with this opinion) Let us bomb every coffee shop and kill all coffee makers (this is a threat) Pyramid of Hate [1] [1]: Pyramid of Hate

Reactive Methods of Intervention

Internet’s policy w.r.t curbing Hate Moderated • Twitter • Facebook
• Instagram • Youtube Semi- Moderated • Reddit Unmoderated • Gab • 4chan • BitChute • Parler • StormFront Anonymity has lead to increase in anti-social behaviour [1], hate speech being one of them [2]. [1]: J Suler [2]: Luke Munn

How to Combat Hate Speech Reactive countering When a hateful
post has been made and we are intervening to prevent it further spreading. Proactive countering Intervene before the post goes public Strategies for countering hate speech [3] [3]: Mudit et al.

• Warn/report/flag the user who has posted. • Generate a
text that counters the existing hate. • Ask influential members of the community to help spread the counter narrative. Reactive Methods of Intervention [4] Manoel Horta Ribeiro et al. Hate Interventions on Web

[5]: Dominik Hangartner et al. [6]: Serra Sinem Tekiroglu et
al. Data Collection Strategy for Counter Narration • CRAWL: (Real-world samples of both hate and counter-hate) • CROWD: (Real-world samples of hate and synthetic samples of counter-hate) • NICHE: (Synthetic samples of both hate and counter-hate) Characteristics of counter hate dataset [6] Countering hate speech on Twitter [5]

Proactive Methods of Intervention

Literature Overview: Intervention during Tweet creation • 200k users identified
in the study. 50% randomly assigned to the control group • H1: Are prompted users less likely to post the current offensive content. • H2: Are prompted users less likely to post content in future. [7]: Katsaros et al., ICWSM ‘22 User behaviour statistics as a part of intervention study [7] Twitter reply test for offense replies. [7]

Problem Definition

Problem Statement For a given hate sample 𝑡, our objective
is to obtain its normalized (sensitised) form 𝑡` such that the intensity of hatred 𝜙𝑡 is reduced while the meaning still conveys. [1] 𝜙 𝑡` < 𝜙 𝑡 Example of original high intensity vs normalised sentence [8] [8]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022

Data Set

NACL Dataset • Hateful samples collected from existing Hate Speech
datasets. • Manually annotated for Hate intensity and hateful spans. • Hate Intensity is marked on a scale of 1-10. • Manual generation of normalised counter-part and its intensity. (k = 0.88) Original and Normalised Intensity Distribution [8] Dataset Stats [8] [8]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022

NACL Dataset: Annotation Guideline • A sexist or racist slur
term, or an abusive term directly attacking a minority group/individual. • A phrase that advocated violent action or hate crime against a group/individual. • Negatively stereotyping a group/individual with unfounded claims or false criminal accusations. • Hashtag(s) supporting one or more of the points as mentioned earlier. Span Labelling [1] • Score[8 − 10]: The sample promotes hate crime and calls for violence against the individual/group. • Score[6 − 7]: The sample is mainly composed of sexist/racist terms or portrays a sense of gender/racial superiority on the part of the person sharing the sample. • Score[4 − 5]: Mainly consists of offensive hashtags, or most hateful phrases are in the form of offensive hashtags. • Score[1 − 3]: The sample uses dark humor or implicit hateful term. Intensity Labelling [1] [8]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022

Motivation & Evidence • Reducing intensity is the stepping stone
towards non-hate. • Does not force to change sentiment or opinion. • Evidently leads to less virality. Fig 1: Difference in predicted number of comments per set per iteration. [1] [8]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022

Proposed Model

Flowchart of NACL [1] [8]: Masud et al., Proactively Reducing
the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022 Proposed Method: NACL- Neural hAte speeCh normaLizer

BART: An overview [9]: Mike Lewis et al. [1]: Intro
to BART BART is a Seq2Seq Encoder-Decoder transformer based architecture. [9] Encoder-Decoder token masking scheme [1]

Hate Intensity Reduction Overall Loss Reward Hate Normalization Framework [8]
[8]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022

Hate Intensity Reduction (HIR) [8]: Masud et al., Proactively Reducing
the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022 Fig 1: Baseline comparison for NACL-HIR module. [1]

Reference Literature & Resources 1. The Online Disinhibition Effect 2.
Angry by design: toxic communication and technical architectures 3. Countering Online Hate Speech: An NLP Perspective 4. Automated Content Moderation Increases Adherence to Community Guidelines 5. Empathy-based counterspeech can reduce racist hate speech in a social media field experiment 6. Generating Counter Narratives against Online Hate Speech: Data and Strategies 7. Reconsidering Tweets: Intervening during Tweet Creation Decreases Offensive Content 8. Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization 9. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

Thank You

CSE_KetchupTalk_Sarah.pdf

CSE_KetchupTalk_Sarah.pdf

_themessier

More Decks by _themessier

Other Decks in Research

Featured

Transcript

Proactive & Reactive Flagging of Hateful Content CSE Ketchup Talk

Disclaimer: Subsequent content has extreme language (verbatim from social media),

Hate Speech & Hate Intensity

Hate Speech Intensity • Hate is subjective, temporal and cultural

Reactive Methods of Intervention

Internet’s policy w.r.t curbing Hate Moderated • Twitter • Facebook

How to Combat Hate Speech Reactive countering When a hateful

• Warn/report/flag the user who has posted. • Generate a

[5]: Dominik Hangartner et al. [6]: Serra Sinem Tekiroglu et

Proactive Methods of Intervention

Literature Overview: Intervention during Tweet creation • 200k users identified

Problem Definition

Problem Statement For a given hate sample 𝑡, our objective

Data Set

NACL Dataset • Hateful samples collected from existing Hate Speech

NACL Dataset: Annotation Guideline • A sexist or racist slur

Motivation & Evidence • Reducing intensity is the stepping stone

Proposed Model

Flowchart of NACL [1] [8]: Masud et al., Proactively Reducing

BART: An overview [9]: Mike Lewis et al. [1]: Intro

Hate Intensity Reduction Overall Loss Reward Hate Normalization Framework [8]

Hate Intensity Reduction (HIR) [8]: Masud et al., Proactively Reducing

Reference Literature & Resources 1. The Online Disinhibition Effect 2.

Thank You