Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Modelling Hate Speech on Reddit — A Three-Act Play

George Ho
November 08, 2017

Modelling Hate Speech on Reddit — A Three-Act Play

Reddit is the one of the most popular discussion websites today, and is famously broad-minded in what it allows to be said on its forums: however, where there is free speech, there are invariably pockets of hate speech.

In this talk, I present a recent project to model hate speech on Reddit. In three acts, I chronicle the thought processes and stumbling blocks of the project, with each act applying a different form of machine learning: supervised learning, topic modelling and text clustering. I conclude with the current state of the project: a system that allows the modelling and summarization of entire subreddits, and possible future directions. Rest assured that both the talk and the slides have been scrubbed to be safe for work!

George Ho

November 08, 2017
Tweet

More Decks by George Ho

Other Decks in Research

Transcript

  1. Data Science for Social Good • Mandate: help nonprofits, given

    their data. • Data & Society wanted to investigate hate speech on Reddit.
  2. What is Reddit? • Comprised of many communities, called subreddits.

    ◦ Each has its own rules and moderators. • 5th most popular website in the U.S. • Free speech!
  3. Data • Reddit posts and comments, since ever. • Important

    features: ◦ Text Body ◦ Author Name ◦ Subreddit ◦ Time ◦ Number of upvotes/downvotes
  4. USENIX FOCI 2017 https://www.usenix.org/system/files/conference/foci17/foci17-paper-nithyanand.pdf tldr: • Word embedding + “hate

    vector” = classifier for hate speech • Used classifier to model hatefulness on Reddit
  5. Some questions 1. What happens to hate speech as subreddits

    go through takeovers/shutdowns/quarantines? 2. How do real-world events (e.g. elections, disasters) impact hate speech on subreddits? 3. Is there some measure of similarity between subreddits?
  6. If we could classify hate speech... 1. What does hatefulness

    look like over time? 2. Can we quantify the hatefulness of certain users? 3. How does the hatefulness of users influence the hatefulness of subreddits?
  7. Toxic comment dataset • Wikipedia comments • Labels: ◦ toxic

    ◦ severe_toxic ◦ obscene ◦ threat ◦ insult ◦ identity_hate Precision 65% Recall 81% F-1 72% Resulting Classifier
  8. But… It just learnt to identify curse words • Almost

    all comments classified as hate speech contained one or more curse words! • This makes sense: fairly civil discussion happens on the Wikipedia writers’ forum, so cursing is a very good indicator of abuse.
  9. Why? 1. Lack of labelled data for Reddit! We only

    have Wikipedia. 2. It is a spectrum, not an open-and-shut class. a. It ranges from mere insensitive quips to full-on identity hate. 3. Even if it wasn’t, hate speech is not conveyed using specific words. Insult and abuse do not lie in word choice! a. The distributional hypothesis does not hold as strongly. b. Hate speech is inherently a semantic distinction.
  10. Why topic modelling? 1) Topics will give us a holistic

    view of subreddits, not just hate speech. 2) We get a very rich characterization: a) Comments belong to topics b) Topics are comprised of words c) Counting gives us a notion of size
  11. Some good clusters... Topic #2: removed com https www https

    www tax money http watch news Topic #7: game team season year good win play teams playing best Topic #13: sure believe trump wrong saying comment post mueller evidence gt Topic #18: war world country israel countries china military like happy does
  12. … But mostly bad clusters Topic #0: got just time

    day like went friend told didn kids Topic #1: just gt people say right doesn know law like government Topic #3: people don just like think really good know want things Topic #4: years time did great ago ve just work life damn
  13. Why? 1) High variance, low bias: need lots of (good)

    data to learn well a) LDA infers document-by-document → short documents don’t help! 2) Even worse than just short documents: they don’t even coherently talk about a specific topic. Violation of the distributional hypothesis! 3) Breadth of topics is massive on Reddit. Easy for small pockets of hate speech to be drowned out.
  14. Examples of Reddit comments 1. “turn on particles and use

    it.” 2. “ah” 3. “Can confirm. His Twitter is great for a chuckle.” 4. “I'm basing my knowledge on the fact that I watched the [...] rock fall.”
  15. Problems 1) High variance, low bias a) Short documents 2)

    Don’t even coherently talk about a specific topic 3) Breadth of topics too large.
  16. Solution 1: NMF vs LDA • Lower variance, higher bias.

    • Strong notions of additivity. ◦ Part-based decomposition! • Still gives us latent space!
  17. Dimensionality reduction vs. text clustering NMF doesn’t just give us

    a latent space (LDA does that too)... It also gives us an easy way to reconstruct the original space. So it both reduces the dimensionality and clusters!
  18. Solution 2: Draconian text preprocessing 1. Strip punctuation, tags, etc.

    2. Convert to ASCII, lowercase 3. Lemmatize 4. Discard 70% of comments, by count of tokens. Harsh preprocessing is necessary for our model to give good results. Is this a failing?
  19. Solution 3: Only consider specific subreddits Since the breadth of

    discourse is so large on Reddit, we limit ourselves to three subreddits traditionally known to be hateful/toxic: 1. r/theredpill 2. r/The_Donald 3. r/CringeAnarchy
  20. What have we achieved? • A way to reduce dimensionality/cluster

    a subreddit ◦ Tells a compelling story ◦ A form of corpus summarization ◦ Rich characterization of the subreddit ▪ Topic-word distributions ▪ Topic-document distributions
  21. Shortcomings • Distributional hypothesis does not hold strongly. ◦ We

    never really addressed this! • Draconian text processing ◦ Do we have a representative view of the subreddit? • This system is not specific to hate speech ◦ Only for subreddits with a priori expectation of hatefulness!
  22. Links • GitHub repository: ◦ https://github.com/eigenfoo/reddit-clusters • Blog post on

    NMF clusters: ◦ https://eigenfoo.xyz/reddit-clusters/ • Blog post on unsuitability of LDA: ◦ https://eigenfoo.xyz/lda-sucks/ • Blog post on future (matrix factorization) work: ◦ https://eigenfoo.xyz/matrix-factorizations/
  23. Distill, Oct. 2016 https://distill.pub/2016/misread-tsne/ tldr: • t-SNE is very sensitive

    to the perplexity value • Multiple t-SNE plots may be necessary