Revisiting Hate Speech Benchmarks KDD 2023

Slide 1

Slide 1 text

Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment Atharva Kulkarni (CMU), Sarah Masud (IIIT-Delhi), Vikram Goyal (IIIT-Delhi), Tanmoy Chakraborty (IIT-Delhi)

Slide 2

Slide 2 text

Disclaimer: Subsequent content has extreme language (verbatim from social media), which does not reflect the opinions of myself or my collaborators. They are employed solely for the purpose of explaining the work. Viewer’s discretion is advised.

Slide 3

Slide 3 text

Motivation & Data curation ● Limitations of Existing Studies ● Curations of GOTHate

Slide 4

Slide 4 text

Definition of Hate Speech ● Hate is subjective, temporal and cultural in nature. ● UN defines hate speech as “any kind of communication that attacks or uses pejorative or discriminatory language with reference to a person or a group on the basis of who they are.” [1] Fig 1: Pyramid of Hate [2] [1]: UN hate [2]: Pyramid of Hate

Slide 5

Slide 5 text

Limitations of Existing Studies ● A myopic approach for hate speech datasets using hate lexicons/slur terms. ● Limited Study in English-Hindi code-mixed (Hinglish) context. ● Limited context means systems default to non-hate. Motivation ● Can we curate a large scale Indic Dataset? ● Can we model contextual information into detection of hate?

Slide 6

Slide 6 text

GOTHate: Curation of Primary Data Geo-political topical Hate Speech Dataset ● Neutrally seeded from socio-political topics in India and USA and UK. ● Source: Twitter (from Jan 2020-Jan 2021) Fig 1: Dataset Sample of GOTHate [1] [1]: Kulkarni et al., Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment, KDD 2023 Fig 2: Dataset Sample of GOTHate [1]

Slide 7

Slide 7 text

GOTHate: Curation of Auxiliary Data ● From 50k root tweets we have 25k unique root users. ● For each root tweet we collect the list of first 100 retweeters. ● For each root user we collect the +/-25 tweets from the time of posting the tweet. Thus timeline is not only user but also tweet specific. ● For each root user we collect their 1-hop ego-network (followers and followee).

Slide 8

Slide 8 text

Data Annotation & Analysis ● Annotation Guideline Overview ● Annotation Phases ● Yet another Hate Speech dataset?

Slide 9

Slide 9 text

Annotation Guideline: Overview ● We start with Hate, Offensive and Neutral/Normal. ● Observe posts that are not flagged by the system but have an context to offend (not using explicit slurs). ● Introduce the class of Provocation. ● Maintain a precedence to class H>O>P>N. Fig 1: Overview of Annotation Guideline [1] [1]: Kulkarni et al., Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment, KDD 2023

Slide 10

Slide 10 text

Annotation Phases: Overview Fig 1: 2-phased Annotation Mode [1] ● Phase I: IAA = 0.80 (3 experts, F:3). ● Phase II IAA = 0.70 (10 Crowdsource workers, M:F 6:10) [2]. [1]: Kulkarni et al., Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment, KDD 2023 [2]: Xsaras

Slide 11

Slide 11 text

Fig 1: Dataset Stats [1] ● 50k tweets. ● 3k hateful and 10k provocative. [1]: Kulkarni et al., Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment, KDD 2023 GOTHate Annotations

Slide 12

Slide 12 text

Does vocabulary of classes differ for differently curated datasets? Fig 1: Intra-class JS Distance of different HS datasets [1] Observations: O1: JS distance for N-P=0.063 is lowest. O2: In proposed dataset the hate class is closer to neutral than with offense class. O3: All HS datasets have lower divergence. Reasons: O1: Both cause and effect of higher disagreement in provocation class. O2: Due to neutral hate seeding and lack of lexicon for curation. O3: Curation from real-world interactions leads to fuzzy classification of hate. Experimental Setup: ● Laplacian n-gram for each class in each dataset. ● Pick the top-k and bottom-k terms as representative of each class. ● Perform JS comparison (higher the value more similar the two distributions). [1]: Kulkarni et al., Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment, KDD 2023

Slide 13

Slide 13 text

Contextual Modeling ● Within-dataset signals ● Auxiliary dataset signals ● Attentive modeling of signals

Slide 14

Slide 14 text

\$MENTION\$ \$MENTION\$ \$MENTION\$ AND Remember president loco SAID MEXICO WILL PAY FUC**kfu ck trump f*** gop f*** republicans Make go fund me FOR HEALTH CARE, COLLEGE EDUCATION , CLIMATE CHANGE, SOMETHING GOOD AND POSITIVE !! Not for a fucking wall go fund the wall the resistance resist \$URL\$" $MENTION\$ DERANGED DELUSIONAL DUMB DICTATOR DONALD IS MENTALLY UNSTABLE! I WILL NEVER VOTE REPUBLICAN AGAIN IF THEY DON'T STAND UP TO THIS TYRANT LIVING IN THE WHITE HOUSE! fk republicans worst dictator ever unstable dictator \$URL\$" $MENTION\$ COULD WALK ON WATER AND THE never trump WILL CRAP ON EVERYTHING HE DOES. SHAME IN THEM. UNFOLLOW ALL OF THEM PLEASE!" Offensive train sample Labelled Corpus E1: Offensive train sample exemplar (can be same or different author) E2: Offensive train sample exemplar (can be same or different author) In-Dataset Signal: Exemplars Module

Slide 15

Slide 15 text

"look at what Hindus living in mixed-population localities are facing, what Dhruv Tyagi had to face for merely asking his Muslim neighbors not to sexually harass his daughter...and even then, if u ask why people don’t rent to Muslims, get ur head examined $MENTION\$ $MENTION\$ naah...Islamists will never accept Muslim refugees, they will tell the Muslims to create havoc in their home countries and do whatever it takes to convert Dar-ul-Harb into Dar-ul Islam..something we should seriously consider doing with Pak Hindus too One of the tweet by author before Example 2 One of the tweet by author after Example 2 Accusatory tone timestamp t-1 Hateful tweet timestamp t Accusatory and instigating timestamp t+1 Auxiliary Dataset Signal: Timeline Module

Slide 16

Slide 16 text

Fig 1: Motivation for Auxiliary Data Signals[1] Contextual Signal Infusion for Hate Detection [1]: Kulkarni et al., Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment, KDD 2023

Slide 17

Slide 17 text

Proposed model: HEN-mBERT [1]: Kulkarni et al., Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment, KDD 2023 HEN-mBERT: History, Exemplar and Network infused mBERT model. Fig 1: Proposed model HEN-mBERT [1]

Slide 18

Slide 18 text

Fig 1: Baseline and Ablation [1] [1]: Kulkarni et al., Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment, KDD 2023 Experiments & Ablation: HEN-mBERT

Slide 19

Slide 19 text

Conclusion ● Key Contributions and Takeaways

Slide 20

Slide 20 text

Contributions & Takeaways Contributions ● Curate a auxiliary signal rich hate speech dataset with Indic topics. ● Perform extensive experiments on analysing the quality of dataset and its similarity to existing datasets. ● Propose a signal rich attention module to fine-tune mBERT for detection of hateful content. Takeaways: ● Tradeoff in time/effort vs quality of annotation from expert to crowdsourced. ● Contextual signals infused in an attentive manner reduce the signal/noise and improves detection of hate.

Slide 21

Slide 21 text

THANK YOU Sarah, 4th year PhD@ LCS2, IIIT-Delhi [email protected] _themessier on Twitter Paper Link