Comparing Different Supervised Approaches to Hate Speech Detection

Comparing Different Supervised Approaches to Hate Speech Detection

E664477a43e1818bcaf9b8b6819ba83f?s=128

Michele Corazza

December 12, 2018
Tweet

Transcript

  1. 1.

    Comparing Different Supervised Approaches to Hate Speech Detection 1Michele Corazza,

    2Stefano Menini, 1Pınar Arslan, 2Rachele Sprugnoli, 1Elena Cabrio, 2Sara Tonelli, 1Serena Villata 1Universite Cote d’Azur, CNRS, Inria, I3S, France; 2Fondazione Bruno Kessler, Trento, Italy {firstname.lastname}@inria.fr; {menini, sprugnoli, satonelli}@fbk.eu
  2. 2.

    EVALITA 2018: subtasks Goal: a system to detect hate speech

    in Italian tweets and Facebook posts. Four binary classification subtasks were proposed: • Task 1: HaSpeeDe-FB: hate speech on Facebook posts; • Task 2: HaSpeeDe-TW: hate speech on Twitter posts; • Task 3.1: Cross-HaSpeeDe_FB: hate speech on Twitter posts by training on Facebook only; • Task 3.2: Cross-HaSpeeDe_TW: hate speech on Facebook posts by training on Twitter only.
  3. 3.

    Recurrent Neural Network Preprocessing Model Output • Word Embeddings •

    Social Features • Mention replacement • Hashtag splitting • URL replacement
  4. 4.

    Linear SVC Preprocessing Model Output • Mention replacement • Hashtag

    removal • URL replacement • Stopwords removal • Stemmer • Unigrams • Emotion Features
  5. 5.

    N-gram Based Neural Network Preprocessing Model Output • Mention replacement

    • Hashtag splitting • URL replacement • Lemmatizer • Unigrams • Bigrams • Social Features
  6. 6.

    First Run (3rd ranking) Category P R F1 Non Hate

    Hate Macro AVG 0.763 0.858 0.810 0.687 0.898 0.793 0.723 0.877 0.800 Second Run (4th ranking) Non Hate Hate Macro AVG 0.716 0.859 0.788 0.703 0.867 0.785 0.709 0.863 0.786 Results (Subtasks 1, 2) Results on HaSpeeDe_FB First Run (6th ranking) Category P R F1 Non Hate Hate Macro AVG 0.873 0.675 0.774 0.827 0.750 0.788 0.850 0.711 0.780 Second Run (4th ranking) Non Hate Hate Macro AVG 0.842 0.755 0.799 0.899 0.648 0.774 0.870 0.698 0.784 Results on HaSpeeDe_TW
  7. 7.

    Results (Subtasks 3.1, 3.2) First Run (2nd ranking) Category P

    R F1 Non Hate Hate Macro AVG 0.810 0.497 0.653 0.675 0.670 0.672 0.736 0.570 0.653 Second Run (1st ranking) Non Hate Hate Macro AVG 0.818 0.494 0.656 0.660 0.694 0.677 0.731 0.580 0.654 Results on Cross-HaSpeeDe_FB First Run (4th ranking) Category P R F1 Non Hate Hate Macro AVG 0.493 0.822 0.658 0.703 0.656 0.679 0.580 0.730 0.655 Second Run (2nd ranking) Non Hate Hate Macro AVG 0.537 0.815 0.676 0.653 0.731 0.692 0.589 0.771 0.680 Results on Cross-HaSpeeDe_TW
  8. 8.

    Error Analysis Phenomena that tend to cause errors: • dialects

    / bad orthography “un se ponno sentì” “chia il potere in mano fa quello che vuole” • sarcasm “E adesso cosa gli danno? Una settimana in albergo 5 stelle?” • references to world knowledge “un certo Adolf sarebbe utile ancora oggi” • metaphorical expressions “Ruspali” “Esodatele!”
  9. 9.

    Error Analysis False positive: • misclassification of messages containing terrorista

    / terrorismo / immigrato “Il Giappone senza immigrati a corto di forza lavoro” → Poor coverage of EmoLex: • one-to-one English to Italian translation: e.g. to kill → uccidere - missing ammazzare/eliminare “ammazzare tutti i bambini, che domani diventeranno terroristi” “va eliminato fisicamente” HATE SPEECH
  10. 10.

    Models are open source! Recurrent and N-gram based Neural networks:

    https://gitlab.com/ashmikuz/creep-cyberbullying-classifier Linear SVC model: https://github.com/0707pinar/Hate-Speech-Detection/