Modelling Hate Speech on Reddit — A Three-Act Play

Modelling Hate Speech on Reddit — A Three-Act Play George
Ho

About me

Prologue: Motivation

Data Science for Social Good • Mandate: help nonproﬁts, given
their data. • Data & Society wanted to investigate hate speech on Reddit.

What is Reddit? • Comprised of many communities, called subreddits.
◦ Each has its own rules and moderators. • 5th most popular website in the U.S. • Free speech!

Data • Reddit posts and comments, since ever. • Important
features: ◦ Text Body ◦ Author Name ◦ Subreddit ◦ Time ◦ Number of upvotes/downvotes

USENIX FOCI 2017 https://www.usenix.org/system/files/conference/foci17/foci17-paper-nithyanand.pdf tldr: • Word embedding + “hate
vector” = classifier for hate speech • Used classifier to model hatefulness on Reddit

Some questions 1. What happens to hate speech as subreddits
go through takeovers/shutdowns/quarantines? 2. How do real-world events (e.g. elections, disasters) impact hate speech on subreddits? 3. Is there some measure of similarity between subreddits?

Act One: (Rising Action) Supervised Learning

If we could classify hate speech... 1. What does hatefulness
look like over time? 2. Can we quantify the hatefulness of certain users? 3. How does the hatefulness of users inﬂuence the hatefulness of subreddits?

ACL 2012 https://dl.acm.org/citation.cfm?id=2390688 tldr: • Text classiﬁcation • Bigrams +
tf-idfs + NB/SVM = pretty good!

Toxic comment dataset • Wikipedia comments • Labels: ◦ toxic
◦ severe_toxic ◦ obscene ◦ threat ◦ insult ◦ identity_hate Precision 65% Recall 81% F-1 72% Resulting Classiﬁer

Fraction of comments classiﬁed as hateful

Same, but smoothed

Same, but overlaid with r/The_Donald

Fraction of comments of known hateful subreddits classiﬁed as hateful

Same, but smoothed

But… It just learnt to identify curse words • Almost
all comments classiﬁed as hate speech contained one or more curse words! • This makes sense: fairly civil discussion happens on the Wikipedia writers’ forum, so cursing is a very good indicator of abuse.

Supervised learning is the wrong approach to model hate speech

Why? 1. Lack of labelled data for Reddit! We only
have Wikipedia. 2. It is a spectrum, not an open-and-shut class. a. It ranges from mere insensitive quips to full-on identity hate. 3. Even if it wasn’t, hate speech is not conveyed using speciﬁc words. Insult and abuse do not lie in word choice! a. The distributional hypothesis does not hold as strongly. b. Hate speech is inherently a semantic distinction.

Act Two: (Climax) Topic Modelling

Why topic modelling? 1) Topics will give us a holistic
view of subreddits, not just hate speech. 2) We get a very rich characterization: a) Comments belong to topics b) Topics are comprised of words c) Counting gives us a notion of size

Latent Dirichlet allocation

Topic Modelling and t-SNE Visualization https://shuaiw.github.io/2016/12/22/topic-modeling-and-tsne-visualzation.html LDA t-SNE

Another latent variable of interest

Some good clusters... Topic #2: removed com https www https
www tax money http watch news Topic #7: game team season year good win play teams playing best Topic #13: sure believe trump wrong saying comment post mueller evidence gt Topic #18: war world country israel countries china military like happy does

… But mostly bad clusters Topic #0: got just time
day like went friend told didn kids Topic #1: just gt people say right doesn know law like government Topic #3: people don just like think really good know want things Topic #4: years time did great ago ve just work life damn

Flexible topic models are ill-suited for Reddit comments

Why? 1) High variance, low bias: need lots of (good)
data to learn well a) LDA infers document-by-document → short documents don’t help! 2) Even worse than just short documents: they don’t even coherently talk about a speciﬁc topic. Violation of the distributional hypothesis! 3) Breadth of topics is massive on Reddit. Easy for small pockets of hate speech to be drowned out.

Examples of Reddit comments 1. “turn on particles and use
it.” 2. “ah” 3. “Can confirm. His Twitter is great for a chuckle.” 4. “I'm basing my knowledge on the fact that I watched the [...] rock fall.”

Problems 1) High variance, low bias a) Short documents 2)
Don’t even coherently talk about a speciﬁc topic 3) Breadth of topics too large.

Act Three: (Resolution) Dimensionality Reduction and Text Clustering

Solution 1: NMF vs LDA • Lower variance, higher bias.
• Strong notions of additivity. ◦ Part-based decomposition! • Still gives us latent space!

Dimensionality reduction vs. text clustering NMF doesn’t just give us
a latent space (LDA does that too)... It also gives us an easy way to reconstruct the original space. So it both reduces the dimensionality and clusters!

Solution 2: Draconian text preprocessing 1. Strip punctuation, tags, etc.
2. Convert to ASCII, lowercase 3. Lemmatize 4. Discard 70% of comments, by count of tokens. Harsh preprocessing is necessary for our model to give good results. Is this a failing?

Solution 3: Only consider speciﬁc subreddits Since the breadth of
discourse is so large on Reddit, we limit ourselves to three subreddits traditionally known to be hateful/toxic: 1. r/theredpill 2. r/The_Donald 3. r/CringeAnarchy

/r/theredpill

/r/The_Donald

/r/CringeAnarchy

We now have a way to take subreddits and tell
a story about them

What have we achieved? • A way to reduce dimensionality/cluster
a subreddit ◦ Tells a compelling story ◦ A form of corpus summarization ◦ Rich characterization of the subreddit ▪ Topic-word distributions ▪ Topic-document distributions

Epilogue: Future Directions

Shortcomings • Distributional hypothesis does not hold strongly. ◦ We
never really addressed this! • Draconian text processing ◦ Do we have a representative view of the subreddit? • This system is not speciﬁc to hate speech ◦ Only for subreddits with a priori expectation of hatefulness!

ICML 2008 https://www.cs.toronto.edu/~amnih/papers/bpmf.pdf NIPS 2007 https://papers.nips.cc/paper/3208-probabilistic-matrix-factorization.pdf

Thank You! Questions? https://eigenfoo.xyz @_eigenfoo eigenfoo Blog post and slide
deck: eigenfoo.xyz/reddit-slides

Appendix: More Slides!

Links • GitHub repository: ◦ https://github.com/eigenfoo/reddit-clusters • Blog post on
NMF clusters: ◦ https://eigenfoo.xyz/reddit-clusters/ • Blog post on unsuitability of LDA: ◦ https://eigenfoo.xyz/lda-sucks/ • Blog post on future (matrix factorization) work: ◦ https://eigenfoo.xyz/matrix-factorizations/

Number of posts

Number of posts in known hateful subreddits

Distill, Oct. 2016 https://distill.pub/2016/misread-tsne/ tldr: • t-SNE is very sensitive
to the perplexity value • Multiple t-SNE plots may be necessary

Probabilistic Matrix Factorization https://eigenfoo.xyz/matrix-factorizations/ Bayesian Probabilistic Matrix Factorization https://eigenfoo.xyz/matrix-factorizations/

Modelling Hate Speech on Reddit — A Three-Act ...

Modelling Hate Speech on Reddit — A Three-Act Play

More Decks by George Ho

Other Decks in Research

Featured

Transcript