Slide 10
Slide 10 text
Literature Overview: Hate Dataset
Dataset Source & Language (Modality) Year Labels Annotation
Waseem & Hovy [1] Twitter, English, Texts 2016 R, S, N 16k, E, k = 0.84
Davidson et al. [2] Twitter, English, Texts 2017 H,O,N 25k, C, k = 0.92
Wulczyn et al. [3] Wikipedia comments, English, Texts 2017 PA, N 100k, C, k = 0.45
Gibert et al. [5] Stormfront, English, Texts 2018 H, N 10k, k = 0.62
Founta et al. [4] Twitter, English, Texts 2018 H, A, S M, N 70k, C, k = ?
Albadi et al. [6] Twitter, Arabic, Texts 2018 H, N 6k, C, k = 0.81
R- Racism
S- Sexism
H- Hate
PA- Personal
Attack
A- Abuse
SM- Spam
O- Offensive
L- Religion
N- Neither
I- Implicit
E- Explicit
[1]: Waseem & Hovy, NAACL’16
[2]: Davidson et al., WebSci’17
[3]: Wulczyn et al., WWW’17
[4]: Founta at al., WebSci’18
[5]: Gibert et al., ALW2’18
[6]: Albadi et al., ANLP’20
E- Internal Experts
C- Crowd Sourced