Reddit is the one of the most popular discussion websites today, and is famously broad-minded in what it allows to be said on its forums: however, where there is free speech, there are invariably pockets of hate speech.
In this talk, I present a recent project to model hate speech on Reddit. In three acts, I chronicle the thought processes and stumbling blocks of the project, with each act applying a different form of machine learning: supervised learning, topic modelling and text clustering. I conclude with the current state of the project: a system that allows the modelling and summarization of entire subreddits, and possible future directions. Rest assured that both the talk and the slides have been scrubbed to be safe for work!
Modelling Hate Speech on Reddit —
A Three-Act Play
Data Science for Social Good
● Mandate: help nonproﬁts, given their data.
● Data & Society wanted to investigate hate speech on Reddit.
What is Reddit?
● Comprised of many communities,
○ Each has its own rules and
● 5th most popular website in the U.S.
● Free speech!
● Reddit posts and comments, since ever.
● Important features:
○ Text Body
○ Author Name
○ Number of upvotes/downvotes
USENIX FOCI 2017
● Word embedding + “hate vector” = classiﬁer for hate speech
● Used classiﬁer to model hatefulness on Reddit
1. What happens to hate speech as subreddits go through
2. How do real-world events (e.g. elections, disasters) impact hate speech
3. Is there some measure of similarity between subreddits?
Act One: (Rising Action)
If we could classify hate speech...
1. What does hatefulness look like over time?
2. Can we quantify the hatefulness of certain users?
3. How does the hatefulness of users inﬂuence the hatefulness of
● Text classiﬁcation
● Bigrams + tf-idfs + NB/SVM = pretty good!
Toxic comment dataset
● Wikipedia comments
But… It just learnt to identify curse words
● Almost all comments classiﬁed as hate speech contained one or more
● This makes sense: fairly civil discussion happens on the Wikipedia
writers’ forum, so cursing is a very good indicator of abuse.
Supervised learning is
the wrong approach
to model hate speech
1. Lack of labelled data for Reddit! We only have Wikipedia.
2. It is a spectrum, not an open-and-shut class.
a. It ranges from mere insensitive quips to full-on identity hate.
3. Even if it wasn’t, hate speech is not conveyed using speciﬁc words.
Insult and abuse do not lie in word choice!
a. The distributional hypothesis does not hold as strongly.
b. Hate speech is inherently a semantic distinction.
Act Two: (Climax)
Why topic modelling?
1) Topics will give us a holistic view of subreddits, not just hate speech.
2) We get a very rich characterization:
a) Comments belong to topics
b) Topics are comprised of words
c) Counting gives us a notion of size
Latent Dirichlet allocation
Topic Modelling and t-SNE Visualization
Another latent variable of interest
Some good clusters...
removed com https www https www tax money http watch news
game team season year good win play teams playing best
sure believe trump wrong saying comment post mueller evidence gt
war world country israel countries china military like happy does
… But mostly bad clusters
got just time day like went friend told didn kids
just gt people say right doesn know law like government
people don just like think really good know want things
years time did great ago ve just work life damn
Flexible topic models are
ill-suited for Reddit comments
1) High variance, low bias: need lots of (good) data to learn well
a) LDA infers document-by-document → short documents don’t help!
2) Even worse than just short documents: they don’t even coherently talk
about a speciﬁc topic. Violation of the distributional hypothesis!
3) Breadth of topics is massive on Reddit. Easy for small pockets of hate
speech to be drowned out.
Examples of Reddit comments
1. “turn on particles and use it.”
3. “Can confirm. His Twitter is great for a chuckle.”
4. “I'm basing my knowledge on the fact that I watched
the [...] rock fall.”
1) High variance, low bias
a) Short documents
2) Don’t even coherently talk about a speciﬁc topic
3) Breadth of topics too large.
Act Three: (Resolution)
and Text Clustering
Solution 1: NMF vs LDA
● Lower variance, higher bias.
● Strong notions of additivity.
● Still gives us latent space!
Dimensionality reduction vs. text clustering
NMF doesn’t just give us a latent
space (LDA does that too)...
It also gives us an easy way to
reconstruct the original space.
So it both reduces the
dimensionality and clusters!
Solution 2: Draconian text preprocessing
1. Strip punctuation, tags, etc.
2. Convert to ASCII, lowercase
4. Discard 70% of comments, by count
Harsh preprocessing is necessary for our
model to give good results. Is this a failing?
Solution 3: Only consider speciﬁc subreddits
Since the breadth of discourse is so large on Reddit, we limit ourselves to
three subreddits traditionally known to be hateful/toxic:
We now have a way
to take subreddits and
tell a story about them
What have we achieved?
● A way to reduce dimensionality/cluster a subreddit
○ Tells a compelling story
○ A form of corpus summarization
○ Rich characterization of the subreddit
■ Topic-word distributions
■ Topic-document distributions
● Distributional hypothesis does not hold strongly.
○ We never really addressed this!
● Draconian text processing
○ Do we have a representative view of the subreddit?
● This system is not speciﬁc to hate speech
○ Only for subreddits with a priori expectation of hatefulness!
Blog post and slide deck:
● GitHub repository:
● Blog post on NMF clusters:
● Blog post on unsuitability of LDA:
● Blog post on future (matrix factorization) work:
Distill, Oct. 2016
● t-SNE is very sensitive to the perplexity value
● Multiple t-SNE plots may be necessary
Probabilistic Matrix Factorization
Bayesian Probabilistic Matrix Factorization