Modelling Hate Speech on Reddit — A Three-Act Play

C11af49d069fb8086e87144f26212c67?s=47 George Ho
November 08, 2017

Modelling Hate Speech on Reddit — A Three-Act Play

Reddit is the one of the most popular discussion websites today, and is famously broad-minded in what it allows to be said on its forums: however, where there is free speech, there are invariably pockets of hate speech.

In this talk, I present a recent project to model hate speech on Reddit. In three acts, I chronicle the thought processes and stumbling blocks of the project, with each act applying a different form of machine learning: supervised learning, topic modelling and text clustering. I conclude with the current state of the project: a system that allows the modelling and summarization of entire subreddits, and possible future directions. Rest assured that both the talk and the slides have been scrubbed to be safe for work!


George Ho

November 08, 2017


  1. Modelling Hate Speech on Reddit — A Three-Act Play George

  2. About me

  3. Prologue: Motivation

  4. Data Science for Social Good • Mandate: help nonprofits, given

    their data. • Data & Society wanted to investigate hate speech on Reddit.
  5. What is Reddit? • Comprised of many communities, called subreddits.

    ◦ Each has its own rules and moderators. • 5th most popular website in the U.S. • Free speech!
  6. Data • Reddit posts and comments, since ever. • Important

    features: ◦ Text Body ◦ Author Name ◦ Subreddit ◦ Time ◦ Number of upvotes/downvotes
  7. USENIX FOCI 2017 tldr: • Word embedding + “hate

    vector” = classifier for hate speech • Used classifier to model hatefulness on Reddit
  8. Some questions 1. What happens to hate speech as subreddits

    go through takeovers/shutdowns/quarantines? 2. How do real-world events (e.g. elections, disasters) impact hate speech on subreddits? 3. Is there some measure of similarity between subreddits?
  9. Act One: (Rising Action) Supervised Learning

  10. If we could classify hate speech... 1. What does hatefulness

    look like over time? 2. Can we quantify the hatefulness of certain users? 3. How does the hatefulness of users influence the hatefulness of subreddits?
  11. ACL 2012 tldr: • Text classification • Bigrams +

    tf-idfs + NB/SVM = pretty good!
  12. Toxic comment dataset • Wikipedia comments • Labels: ◦ toxic

    ◦ severe_toxic ◦ obscene ◦ threat ◦ insult ◦ identity_hate Precision 65% Recall 81% F-1 72% Resulting Classifier
  13. Fraction of comments classified as hateful

  14. Same, but smoothed

  15. Same, but overlaid with r/The_Donald

  16. Fraction of comments of known hateful subreddits classified as hateful

  17. Same, but smoothed

  18. But… It just learnt to identify curse words • Almost

    all comments classified as hate speech contained one or more curse words! • This makes sense: fairly civil discussion happens on the Wikipedia writers’ forum, so cursing is a very good indicator of abuse.
  19. Supervised learning is the wrong approach to model hate speech

  20. Why? 1. Lack of labelled data for Reddit! We only

    have Wikipedia. 2. It is a spectrum, not an open-and-shut class. a. It ranges from mere insensitive quips to full-on identity hate. 3. Even if it wasn’t, hate speech is not conveyed using specific words. Insult and abuse do not lie in word choice! a. The distributional hypothesis does not hold as strongly. b. Hate speech is inherently a semantic distinction.
  21. Act Two: (Climax) Topic Modelling

  22. Why topic modelling? 1) Topics will give us a holistic

    view of subreddits, not just hate speech. 2) We get a very rich characterization: a) Comments belong to topics b) Topics are comprised of words c) Counting gives us a notion of size
  23. Latent Dirichlet allocation

  24. Topic Modelling and t-SNE Visualization LDA t-SNE

  25. None
  26. Another latent variable of interest

  27. Some good clusters... Topic #2: removed com https www https

    www tax money http watch news Topic #7: game team season year good win play teams playing best Topic #13: sure believe trump wrong saying comment post mueller evidence gt Topic #18: war world country israel countries china military like happy does
  28. … But mostly bad clusters Topic #0: got just time

    day like went friend told didn kids Topic #1: just gt people say right doesn know law like government Topic #3: people don just like think really good know want things Topic #4: years time did great ago ve just work life damn
  29. Flexible topic models are ill-suited for Reddit comments

  30. Why? 1) High variance, low bias: need lots of (good)

    data to learn well a) LDA infers document-by-document → short documents don’t help! 2) Even worse than just short documents: they don’t even coherently talk about a specific topic. Violation of the distributional hypothesis! 3) Breadth of topics is massive on Reddit. Easy for small pockets of hate speech to be drowned out.
  31. Examples of Reddit comments 1. “turn on particles and use

    it.” 2. “ah” 3. “Can confirm. His Twitter is great for a chuckle.” 4. “I'm basing my knowledge on the fact that I watched the [...] rock fall.”
  32. Problems 1) High variance, low bias a) Short documents 2)

    Don’t even coherently talk about a specific topic 3) Breadth of topics too large.
  33. Act Three: (Resolution) Dimensionality Reduction and Text Clustering

  34. Solution 1: NMF vs LDA • Lower variance, higher bias.

    • Strong notions of additivity. ◦ Part-based decomposition! • Still gives us latent space!
  35. Dimensionality reduction vs. text clustering NMF doesn’t just give us

    a latent space (LDA does that too)... It also gives us an easy way to reconstruct the original space. So it both reduces the dimensionality and clusters!
  36. Solution 2: Draconian text preprocessing 1. Strip punctuation, tags, etc.

    2. Convert to ASCII, lowercase 3. Lemmatize 4. Discard 70% of comments, by count of tokens. Harsh preprocessing is necessary for our model to give good results. Is this a failing?
  37. Solution 3: Only consider specific subreddits Since the breadth of

    discourse is so large on Reddit, we limit ourselves to three subreddits traditionally known to be hateful/toxic: 1. r/theredpill 2. r/The_Donald 3. r/CringeAnarchy
  38. /r/theredpill

  39. /r/The_Donald

  40. /r/CringeAnarchy

  41. We now have a way to take subreddits and tell

    a story about them
  42. What have we achieved? • A way to reduce dimensionality/cluster

    a subreddit ◦ Tells a compelling story ◦ A form of corpus summarization ◦ Rich characterization of the subreddit ▪ Topic-word distributions ▪ Topic-document distributions
  43. Epilogue: Future Directions

  44. Shortcomings • Distributional hypothesis does not hold strongly. ◦ We

    never really addressed this! • Draconian text processing ◦ Do we have a representative view of the subreddit? • This system is not specific to hate speech ◦ Only for subreddits with a priori expectation of hatefulness!
  45. ICML 2008 NIPS 2007

  46. Thank You! Questions? @_eigenfoo eigenfoo Blog post and slide

  47. Appendix: More Slides!

  48. Links • GitHub repository: ◦ • Blog post on

    NMF clusters: ◦ • Blog post on unsuitability of LDA: ◦ • Blog post on future (matrix factorization) work: ◦
  49. Number of posts

  50. Number of posts in known hateful subreddits

  51. Distill, Oct. 2016 tldr: • t-SNE is very sensitive

    to the perplexity value • Multiple t-SNE plots may be necessary
  52. None
  53. None
  54. Probabilistic Matrix Factorization Bayesian Probabilistic Matrix Factorization