Revisiting Hate Speech Benchmarks KDD 2023

Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment
Atharva Kulkarni (CMU), Sarah Masud (IIIT-Delhi), Vikram Goyal (IIIT-Delhi), Tanmoy Chakraborty (IIT-Delhi)

Disclaimer: Subsequent content has extreme language (verbatim from social media),
which does not reflect the opinions of myself or my collaborators. They are employed solely for the purpose of explaining the work. Viewer’s discretion is advised.

Motivation & Data curation • Limitations of Existing Studies •
Curations of GOTHate

Definition of Hate Speech • Hate is subjective, temporal and
cultural in nature. • UN defines hate speech as “any kind of communication that attacks or uses pejorative or discriminatory language with reference to a person or a group on the basis of who they are.” [1] Fig 1: Pyramid of Hate [2] [1]: UN hate [2]: Pyramid of Hate

Limitations of Existing Studies • A myopic approach for hate
speech datasets using hate lexicons/slur terms. • Limited Study in English-Hindi code-mixed (Hinglish) context. • Limited context means systems default to non-hate. Motivation • Can we curate a large scale Indic Dataset? • Can we model contextual information into detection of hate?

GOTHate: Curation of Primary Data Geo-political topical Hate Speech Dataset
• Neutrally seeded from socio-political topics in India and USA and UK. • Source: Twitter (from Jan 2020-Jan 2021) Fig 1: Dataset Sample of GOTHate [1] [1]: Kulkarni et al., Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment, KDD 2023 Fig 2: Dataset Sample of GOTHate [1]

GOTHate: Curation of Auxiliary Data • From 50k root tweets
we have 25k unique root users. • For each root tweet we collect the list of first 100 retweeters. • For each root user we collect the +/-25 tweets from the time of posting the tweet. Thus timeline is not only user but also tweet specific. • For each root user we collect their 1-hop ego-network (followers and followee).

Data Annotation & Analysis • Annotation Guideline Overview • Annotation
Phases • Yet another Hate Speech dataset?

Annotation Guideline: Overview • We start with Hate, Offensive and
Neutral/Normal. • Observe posts that are not flagged by the system but have an context to offend (not using explicit slurs). • Introduce the class of Provocation. • Maintain a precedence to class H>O>P>N. Fig 1: Overview of Annotation Guideline [1] [1]: Kulkarni et al., Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment, KDD 2023

Annotation Phases: Overview Fig 1: 2-phased Annotation Mode [1] •
Phase I: IAA = 0.80 (3 experts, F:3). • Phase II IAA = 0.70 (10 Crowdsource workers, M:F 6:10) [2]. [1]: Kulkarni et al., Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment, KDD 2023 [2]: Xsaras

Fig 1: Dataset Stats [1] • 50k tweets. • 3k
hateful and 10k provocative. [1]: Kulkarni et al., Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment, KDD 2023 GOTHate Annotations

Does vocabulary of classes differ for differently curated datasets? Fig
1: Intra-class JS Distance of different HS datasets [1] Observations: O1: JS distance for N-P=0.063 is lowest. O2: In proposed dataset the hate class is closer to neutral than with offense class. O3: All HS datasets have lower divergence. Reasons: O1: Both cause and effect of higher disagreement in provocation class. O2: Due to neutral hate seeding and lack of lexicon for curation. O3: Curation from real-world interactions leads to fuzzy classification of hate. Experimental Setup: • Laplacian n-gram for each class in each dataset. • Pick the top-k and bottom-k terms as representative of each class. • Perform JS comparison (higher the value more similar the two distributions). [1]: Kulkarni et al., Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment, KDD 2023

Contextual Modeling • Within-dataset signals • Auxiliary dataset signals •
Attentive modeling of signals

\$MENTION\$ \$MENTION\$ \$MENTION\$ AND Remember president loco SAID MEXICO WILL
PAY FUC**kfu ck trump f*** gop f*** republicans Make go fund me FOR HEALTH CARE, COLLEGE EDUCATION , CLIMATE CHANGE, SOMETHING GOOD AND POSITIVE !! Not for a fucking wall go fund the wall the resistance resist \$URL\$" $MENTION\$ DERANGED DELUSIONAL DUMB DICTATOR DONALD IS MENTALLY UNSTABLE! I WILL NEVER VOTE REPUBLICAN AGAIN IF THEY DON'T STAND UP TO THIS TYRANT LIVING IN THE WHITE HOUSE! fk republicans worst dictator ever unstable dictator \$URL\$" $MENTION\$ COULD WALK ON WATER AND THE never trump WILL CRAP ON EVERYTHING HE DOES. SHAME IN THEM. UNFOLLOW ALL OF THEM PLEASE!" Offensive train sample Labelled Corpus E1: Offensive train sample exemplar (can be same or different author) E2: Offensive train sample exemplar (can be same or different author) In-Dataset Signal: Exemplars Module

"look at what Hindus living in mixed-population localities are facing,
what Dhruv Tyagi had to face for merely asking his Muslim neighbors not to sexually harass his daughter...and even then, if u ask why people don’t rent to Muslims, get ur head examined $MENTION\$ $MENTION\$ naah...Islamists will never accept Muslim refugees, they will tell the Muslims to create havoc in their home countries and do whatever it takes to convert Dar-ul-Harb into Dar-ul Islam..something we should seriously consider doing with Pak Hindus too One of the tweet by author before Example 2 One of the tweet by author after Example 2 Accusatory tone timestamp t-1 Hateful tweet timestamp t Accusatory and instigating timestamp t+1 Auxiliary Dataset Signal: Timeline Module

Fig 1: Motivation for Auxiliary Data Signals[1] Contextual Signal Infusion
for Hate Detection [1]: Kulkarni et al., Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment, KDD 2023

Proposed model: HEN-mBERT [1]: Kulkarni et al., Revisiting Hate Speech
Benchmarks: From Data Curation to System Deployment, KDD 2023 HEN-mBERT: History, Exemplar and Network infused mBERT model. Fig 1: Proposed model HEN-mBERT [1]

Fig 1: Baseline and Ablation [1] [1]: Kulkarni et al.,
Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment, KDD 2023 Experiments & Ablation: HEN-mBERT

Conclusion • Key Contributions and Takeaways

Contributions & Takeaways Contributions • Curate a auxiliary signal rich
hate speech dataset with Indic topics. • Perform extensive experiments on analysing the quality of dataset and its similarity to existing datasets. • Propose a signal rich attention module to fine-tune mBERT for detection of hateful content. Takeaways: • Tradeoff in time/effort vs quality of annotation from expert to crowdsourced. • Contextual signals infused in an attentive manner reduce the signal/noise and improves detection of hate.

THANK YOU Sarah, 4th year PhD@ LCS2, IIIT-Delhi sarahmasud02@gmail.com _themessier
on Twitter Paper Link

Revisiting Hate Speech Benchmarks KDD 2023

Revisiting Hate Speech Benchmarks KDD 2023

_themessier

More Decks by _themessier

Other Decks in Research

Featured

Transcript

Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment

Disclaimer: Subsequent content has extreme language (verbatim from social media),

Motivation & Data curation • Limitations of Existing Studies •

Definition of Hate Speech • Hate is subjective, temporal and

Limitations of Existing Studies • A myopic approach for hate

GOTHate: Curation of Primary Data Geo-political topical Hate Speech Dataset

GOTHate: Curation of Auxiliary Data • From 50k root tweets

Data Annotation & Analysis • Annotation Guideline Overview • Annotation

Annotation Guideline: Overview • We start with Hate, Offensive and

Annotation Phases: Overview Fig 1: 2-phased Annotation Mode [1] •

Fig 1: Dataset Stats [1] • 50k tweets. • 3k

Does vocabulary of classes differ for differently curated datasets? Fig

Contextual Modeling • Within-dataset signals • Auxiliary dataset signals •

\$MENTION\$ \$MENTION\$ \$MENTION\$ AND Remember president loco SAID MEXICO WILL

"look at what Hindus living in mixed-population localities are facing,

Fig 1: Motivation for Auxiliary Data Signals[1] Contextual Signal Infusion

Proposed model: HEN-mBERT [1]: Kulkarni et al., Revisiting Hate Speech

Fig 1: Baseline and Ablation [1] [1]: Kulkarni et al.,

Conclusion • Key Contributions and Takeaways

Contributions & Takeaways Contributions • Curate a auxiliary signal rich

THANK YOU Sarah, 4th year PhD@ LCS2, IIIT-Delhi sarahmasud02@gmail.com _themessier