Does hate sound the same in all languages?

Does hate sound the same in all languages? Andrada Pumnea
Data Scientist @ Futurice @alucardna Shout out to Antti Ajanki @ Futurice Prof. Radu Meza @ Babes-Bolyai University, Cluj Napoca

Hi, I’m Andrada @alucardna

This project Data Science for Social Good Natural Language Processing
(NLP)

INDEX BERLIN · HELSINKI · LONDON · MUNICH · OSLO
· STOCKHOLM · STUTTGART · TAMPERE 1. Status quo of NLP 2. Blood, sweat and tears behind data labeling 3. (Language) model zoo 4. Where do we go from here? What you can expect from this talk:

“Sexual minorities were one of the main targets of intolerant
and hate speech in Romania last year [2018]” according to the “Annual report on the intolerant and hate speech in Romania – 2018” released on June 13th by ActiveWatch. @alucardna

All thanks to transfer learning! @alucardna

“More data & compute = SOTA” is NOT research news.
• Computational intensity • Diﬃcult reproducibility • Shallow language understanding @alucardna

#BenderRule: "Always name the language(s) you're working on." @alucardna

Why focus on low(er) resource languages?

Decreasing the Digital Divide

Project roadmap Collect data Label data Train models Results

Project roadmap Collect data • Collected by Prof. Dr. Radu
Meza @ Babes-Bolyai University • Timeline: 15th sep - 15th oct 2018 • 13 FB news pages/3 FB groups • First 25 comments per post Label data Train models Results @alucardna

Project roadmap Collect data Label data • Tools: • Deﬁning
categories of hate speech • Deﬁning annotation guidelines Train models Results

“Hate speech is language that attacks or dimin- ishes, that
incites violence or hate against groups, based on speciﬁc characteristics [...].” Fortuna et. al (2019) Fortuna and Nunes (2018)

Categories of hate speech Hate speech categories: 1. Mention of
minority/actors involved in the referendum (LGBT, gay, BOR, Dragnea) 2. Urge to action/violence (boycott, resign, vote, stay home, shoot them, kill them) 3. Violent language (pedophile, zoophile, drug addict, junk, thief, sick) 4. Explicit language BIASED towards the topic of the referendum @alucardna

Annotation guidelines A comment is hateful if: • It attacks
a group of people • Seeks to silence a group of people • Negatively stereotypes a group of people • Promotes, but not directly uses, hate speech or violent crime • Blatantly misrepresents truth or seeks to distort views on a group of people @alucardna Example: “I vote YES for Normality! I vote YES for the FAMILY!” “Votez DA pentru Normalitate! Votez DA pentru FAMILIE!”

Labeling workﬂow Manually label 350 comments @alucardna

Labeling workﬂow Manually label 350 comments Write Labeling Functions (LFs)
@alucardna

Evaluate the performance of LFs on gold labels @alucardna

Evaluate the performance of LFs on gold labels Train a Label Model (LM) Apply the Label Model on the data Manually check and correct your labels @alucardna

Evaluate the performance of LFs on gold labels @alucardna Train a Label Model (LM) Apply the Label Model on the data Manually check and correct your labels Result: 1500 labeled comments 1220 not-hateful/280 hateful

Labeling data is HARD

My data labeling process: • Binary labels - not enough
to capture the complexity of hate speech • Single annotator Ideal labeling process: • Multi-label classiﬁcation • Multiple annotators (3-5) • Measure inter- and intra- rater agreement • Close collaboration with social scientists and linguists @alucardna

Hate Speech Datasets Portuguese Indonesian English Source: http://hatespeechdata.com/ 5.668 tweets
13.169 tweets 8.451 tweets 24.802 tweets German Romanian 1.500 tweets @alucardna Fortuna et. al (2019) Ibrohim & Budi (2019) Wiegand et. al (2018) Davidson et. al (2017) This project

Project roadmap Collect data Label data Train models • Feed-forward
network • Sentence representation: ◦ Bag-of-words ◦ Pooled word embeddings ◦ Transformers Results @alucardna

Product is high quality and durable. Sentence embedding model Classiﬁer
(e.g. a neural network) Hateful 96%, Not-Hateful 4%, 0.78 -1.3 ... 2.5 Feature vector (= sentence embedding) Input sentence Predicted class @alucardna

0 6.1 0.5 1.4 2.9 0 Input text: high quality
and durable product aardvark durable quality zoology high product ... ... ... ... ... TF-IDF value = term frequency x inverse document frequency Dimensionality = the size of the dictionary Bag-of-words features: TF-IDF @alucardna

Word embeddings: word2vec, Google 2013 1.2 -2.3 -2.3 -2.3 -2.3
... -2.3 -2.3 -2.3 -2.3 8.3 4.5 0.5 0.5 0.5 0.5 ... 0.5 0.5 0.5 0.5 -4.2 -2.0 1.7 1.7 1.7 1.7 ... 1.7 1.7 1.7 1.7 0.5 0.7 -0.5 -0.5 -0.5 -0.5 ... -0.5 -0.5 -0.5 -0.5 -2.2 -0.1 -4.5 -4.5 -4.5 -4.5 ... -4.5 -4.5 -4.5 -4.5 4.9 8.3 -6.2 -6.2 -6.2 -6.2 ... -6.2 -6.2 -6.2 -6.2 4.2 0.9 -2.9 -2.9 -2.9 -2.9 ... -2.9 -2.9 -2.9 -2.9 -1.7 aardvark zoology durable ... ... Dense, small dimensionality, typically around 300 Words in a dictionary Pre-training: Download a model trained on a large unrelated corpus, apply to your problem @alucardna

Word embeddings: FastText, Facebook 2017 • Trained on subword information:
apple = sum of the vectors of the n-grams <ap”, “app”, ”appl”, ”apple”, ”apple>”, “ppl”, “pple”, ”pple>”, “ple”, ”ple>”, ”le>” • Pre-training: similar to word2vec, available in 153 languages @alucardna

Sentence embeddings: Average pooled word2vec 1.2 -2.3 -2.3 -2.3 -2.3
... -2.3 -2.3 -2.3 -2.3 8.3 4.5 0.5 0.5 0.5 0.5 ... 0.5 0.5 0.5 0.5 -4.2 -2.0 1.7 1.7 1.7 1.7 ... 1.7 1.7 1.7 1.7 0.5 0.7 -0.5 -0.5 -0.5 -0.5 ... -0.5 -0.5 -0.5 -0.5 -2.2 -0.1 -4.5 -4.5 -4.5 -4.5 ... -4.5 -4.5 -4.5 -4.5 4.9 0.9 -1.0 -1.0 -1.0 -1.0 ... -1.0 -1.0 -1.0 -1.0 1.5 high quality and durable product Sentence embedding (average) Input sentence Other pooling operator: min, max, concatenated min+max etc. @alucardna

BERT, Google AI 2018 Image: https://jalammar.github.io/illustrated-bert/ Class token Sentence embedding
Input sentence high quality and @alucardna

XLM, Facebook 2019 @alucardna BERT XLM Image: https://arxiv.org/abs/1901.07291

LASER, Facebook 2019 @alucardna Image: https://engineering.fb.com/ai-research/laser-multilingual-sentence-embeddings/

Summary Model type Dimensions Language Semantics TF-IDF Bag-of-words 1128 -
- Pooled Word2vec Word embeddings 300 Trained from scratch Unique representation Pooled FastText Word embeddings 300 Trained from scratch Unique representation BERT Pre-trained language model 786 Multilingual Contextual representation LASER Pre-trained language model 1024 Cross-lingual Contextual representation XLM-100 Pre-training language model 1280 Cross-lingual Contextual representation @alucardna

Project roadmap Collect data Label data Train models Results •
Datasets: Ro, Id, Pt, De, En • Test set: 50 hate/250 not-hate • Scores: Average of 3 runs • Metric: F1-score • Baseline: 0.45 @alucardna

Results on Romanian dataset @alucardna

Confusion matrix for LASER Hate classiﬁed correctly: • “vote”, “family”,
“children”, “future”, “normal” Hate classiﬁed incorrectly: • reference to politicians, unexpected combinations of words, strong language 240 10 20 30 Predicted Actual no-hate hate hate no-hate F1-score: 0.80 @alucardna

Overall: • Laser: best on ¾ non-English datasets • Word2vec,
FastText, TF-IDF: very competitive • Bert Multilingual and XLM-100: worse than word embeddings and TF-IDF Results on non-English datasets @alucardna

Absolut increase: Word2vec: ~22% FastText: ~22% TF-IDF: ~14% Bert Multilingual:
~15% XLM-100: ~18% Laser: ~2 Results on English tell a different story... @alucardna

What happens when a model is trained from scratch on
a language? German Bert: ~6% XLM-ende: ~2% English Bert: ~8% XLM-en: ~6% @alucardna

Does hate sound the same in all languages?

This is a community effort!

References Labeling Data: https://github.com/andra-pumnea/hate-speech-ro Sentence embedding framework: https://github.com/aajanki/ﬁ-sentence-embeddings-eval Meza, Radu,
Hanna Orsolya Vincze, and ANDREEA MOGOȘ. "Targets of Online Hate Speech in Context. A Comparative Digital Social Science Analysis of Comments on Public Facebook Pages from Romania and Hungary." East European Journal of Society and Politics (2018): 26. Mikolov, Tomas, et al. "Eﬃcient estimation of word representations in vector space." arXiv preprint arXiv:1301.3781 (2013). Bojanowski, Piotr, et al. "Enriching word vectors with subword information." Transactions of the Association for Computational Linguistics 5 (2017): 135-146. Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018). Artetxe, Mikel, and Holger Schwenk. "Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond." arXiv preprint arXiv:1812.10464 (2018). @alucardna

References Davidson, Thomas, et al. "Automated hate speech detection and
the problem of offensive language." Eleventh international aaai conference on web and social media. 2017. Wiegand, Michael, Melanie Siegel, and Josef Ruppenhofer. "Overview of the germeval 2018 shared task on the identiﬁcation of offensive language." (2018). Ibrohim, Muhammad Okky, and Indra Budi. "Multi-label Hate Speech and Abusive Language Detection in Indonesian Twitter." Proceedings of the Third Workshop on Abusive Language Online. 2019. Fortuna, Paula, et al. "A Hierarchically-Labeled Portuguese Hate Speech Dataset." Proceedings of the Third Workshop on Abusive Language Online. 2019. How transformers broke NLP leaderboards: https://hackingsemantics.xyz/2019/leaderboards/?utm_campaign=NLP%20News&utm_medium=email&utm_source=Revue%20newsletter Current Issues with Transfer Learning in NLP: https://mohammadkhalifa.github.io/2019/09/06/Issues-With-Transfer-Learning-in-NLP/ The #BenderRule: On naming the languages we study and why it matters: https://thegradient.pub/the-benderrule-on-naming-the-languages-we-study-and-why-it-matters/ The Digital Language Divide: http://labs.theguardian.com/digital-language-divide/ @alucardna

BERLIN · HELSINKI · LONDON · MUNICH · OSLO ·
STOCKHOLM · STUTTGART · TAMPERE Extra Slides

@alucardna Complete results on non-English datasets

Hate Speech Datasets Portuguese Fortuna et. al (2019) • 2-stage
labeling • Non-experts + Experts • Collect data based on user profiles and keywords • Binary labels (hate or no hate) • Fine-grained hierarchical multiple level scheme (81 hate speech categories) Indonesian Ibrohim & Budi (2019) • 2-stage labeling • Non-experts + Experts • Diverse background for annotators • Collect data based on keywords • Binary labels (hate speech and abusive or not) • Multiple-labels: target, categories, level German Wiegand et. al (2018) • 2-stage labeling • 3 Experts • Collect data based on user profiles • Active effort for de-biasing dataset • Binary labels (offense or other) • Multiple-labels: (profanity or insult or abuse or other) English Davidson et. al (2017) • 1-stage labeling • Crowd-sourcing annotators • Collect data using keywords from Hatebase.org • Reflects subjective bias of annotators 5.668 tweets 13.169 tweets 8.451 tweets 24.802 tweets http://hatespeechdata.com/

How easy is it to apply a pre-trained model without
tuning? • Bag-of-words - TF-IDF: • Easy. Use scikit-learn implementation. • Average-pooled word embeddings: • Easy. Download a word embedding matrix and compute averages of the word vectors in a sentence • LASER: • Doable, but requires more plumbing. • BERT, XLM: • HuggingFace implementation - easy to setup and allows tinkering with the internals • Flair implementation - wrapper around HuggingFace and more suitable for embeddings • Original implementation - requires more plumbing @alucardna

Next steps • Call for annotators • STILTs (supplementary training
on intermediate labeled data tasks) • Zero-shot cross-lingual transfer learning • Mitigate echo chamber effect @alucardna Takeaway Good data labeling data is KEY Less popular languages need more attention Hate speech is complicated; more uniﬁed effort to tackle detection

Does hate sound the same in all languages?

Does hate sound the same in all languages?

Other Decks in Technology

Featured

Transcript