Modelling Hate Speech on Reddit — A Three-Act Play

Slide 1

Slide 1 text

Modelling Hate Speech on Reddit — A Three-Act Play George Ho

Slide 2

Slide 2 text

About me

Slide 3

Slide 3 text

Prologue: Motivation

Slide 4

Slide 4 text

Data Science for Social Good ● Mandate: help nonproﬁts, given their data. ● Data & Society wanted to investigate hate speech on Reddit.

Slide 5

Slide 5 text

What is Reddit? ● Comprised of many communities, called subreddits. ○ Each has its own rules and moderators. ● 5th most popular website in the U.S. ● Free speech!

Slide 6

Slide 6 text

Data ● Reddit posts and comments, since ever. ● Important features: ○ Text Body ○ Author Name ○ Subreddit ○ Time ○ Number of upvotes/downvotes

Slide 7

Slide 7 text

USENIX FOCI 2017 https://www.usenix.org/system/files/conference/foci17/foci17-paper-nithyanand.pdf tldr: ● Word embedding + “hate vector” = classifier for hate speech ● Used classifier to model hatefulness on Reddit

Slide 8

Slide 8 text

Some questions 1. What happens to hate speech as subreddits go through takeovers/shutdowns/quarantines? 2. How do real-world events (e.g. elections, disasters) impact hate speech on subreddits? 3. Is there some measure of similarity between subreddits?

Slide 9

Slide 9 text

Act One: (Rising Action) Supervised Learning

Slide 10

Slide 10 text

If we could classify hate speech... 1. What does hatefulness look like over time? 2. Can we quantify the hatefulness of certain users? 3. How does the hatefulness of users inﬂuence the hatefulness of subreddits?

Slide 11

Slide 11 text

ACL 2012 https://dl.acm.org/citation.cfm?id=2390688 tldr: ● Text classiﬁcation ● Bigrams + tf-idfs + NB/SVM = pretty good!

Slide 12

Slide 12 text

Toxic comment dataset ● Wikipedia comments ● Labels: ○ toxic ○ severe_toxic ○ obscene ○ threat ○ insult ○ identity_hate Precision 65% Recall 81% F-1 72% Resulting Classiﬁer

Slide 13

Slide 13 text

Fraction of comments classiﬁed as hateful

Slide 14

Slide 14 text

Same, but smoothed

Slide 15

Slide 15 text

Same, but overlaid with r/The_Donald

Slide 16

Slide 16 text

Fraction of comments of known hateful subreddits classiﬁed as hateful

Slide 17

Slide 17 text

Same, but smoothed

Slide 18

Slide 18 text

But… It just learnt to identify curse words ● Almost all comments classiﬁed as hate speech contained one or more curse words! ● This makes sense: fairly civil discussion happens on the Wikipedia writers’ forum, so cursing is a very good indicator of abuse.

Slide 19

Slide 19 text

Supervised learning is the wrong approach to model hate speech

Slide 20

Slide 20 text

Why? 1. Lack of labelled data for Reddit! We only have Wikipedia. 2. It is a spectrum, not an open-and-shut class. a. It ranges from mere insensitive quips to full-on identity hate. 3. Even if it wasn’t, hate speech is not conveyed using speciﬁc words. Insult and abuse do not lie in word choice! a. The distributional hypothesis does not hold as strongly. b. Hate speech is inherently a semantic distinction.

Slide 21

Slide 21 text

Act Two: (Climax) Topic Modelling

Slide 22

Slide 22 text

Why topic modelling? 1) Topics will give us a holistic view of subreddits, not just hate speech. 2) We get a very rich characterization: a) Comments belong to topics b) Topics are comprised of words c) Counting gives us a notion of size

Slide 23

Slide 23 text

Latent Dirichlet allocation

Slide 24

Slide 24 text

Topic Modelling and t-SNE Visualization https://shuaiw.github.io/2016/12/22/topic-modeling-and-tsne-visualzation.html LDA t-SNE

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

Another latent variable of interest

Slide 27

Slide 27 text

Some good clusters... Topic #2: removed com https www https www tax money http watch news Topic #7: game team season year good win play teams playing best Topic #13: sure believe trump wrong saying comment post mueller evidence gt Topic #18: war world country israel countries china military like happy does

Slide 28

Slide 28 text

… But mostly bad clusters Topic #0: got just time day like went friend told didn kids Topic #1: just gt people say right doesn know law like government Topic #3: people don just like think really good know want things Topic #4: years time did great ago ve just work life damn

Slide 29

Slide 29 text

Flexible topic models are ill-suited for Reddit comments

Slide 30

Slide 30 text

Why? 1) High variance, low bias: need lots of (good) data to learn well a) LDA infers document-by-document → short documents don’t help! 2) Even worse than just short documents: they don’t even coherently talk about a speciﬁc topic. Violation of the distributional hypothesis! 3) Breadth of topics is massive on Reddit. Easy for small pockets of hate speech to be drowned out.

Slide 31

Slide 31 text

Examples of Reddit comments 1. “turn on particles and use it.” 2. “ah” 3. “Can confirm. His Twitter is great for a chuckle.” 4. “I'm basing my knowledge on the fact that I watched the [...] rock fall.”

Slide 32

Slide 32 text

Problems 1) High variance, low bias a) Short documents 2) Don’t even coherently talk about a speciﬁc topic 3) Breadth of topics too large.

Slide 33

Slide 33 text

Act Three: (Resolution) Dimensionality Reduction and Text Clustering

Slide 34

Slide 34 text

Solution 1: NMF vs LDA ● Lower variance, higher bias. ● Strong notions of additivity. ○ Part-based decomposition! ● Still gives us latent space!

Slide 35

Slide 35 text

Dimensionality reduction vs. text clustering NMF doesn’t just give us a latent space (LDA does that too)... It also gives us an easy way to reconstruct the original space. So it both reduces the dimensionality and clusters!

Slide 36

Slide 36 text

Solution 2: Draconian text preprocessing 1. Strip punctuation, tags, etc. 2. Convert to ASCII, lowercase 3. Lemmatize 4. Discard 70% of comments, by count of tokens. Harsh preprocessing is necessary for our model to give good results. Is this a failing?

Slide 37

Slide 37 text

Solution 3: Only consider speciﬁc subreddits Since the breadth of discourse is so large on Reddit, we limit ourselves to three subreddits traditionally known to be hateful/toxic: 1. r/theredpill 2. r/The_Donald 3. r/CringeAnarchy

Slide 38

Slide 38 text

/r/theredpill

Slide 39

Slide 39 text

/r/The_Donald

Slide 40

Slide 40 text

/r/CringeAnarchy

Slide 41

Slide 41 text

We now have a way to take subreddits and tell a story about them

Slide 42

Slide 42 text

What have we achieved? ● A way to reduce dimensionality/cluster a subreddit ○ Tells a compelling story ○ A form of corpus summarization ○ Rich characterization of the subreddit ■ Topic-word distributions ■ Topic-document distributions

Slide 43

Slide 43 text

Epilogue: Future Directions

Slide 44

Slide 44 text

Shortcomings ● Distributional hypothesis does not hold strongly. ○ We never really addressed this! ● Draconian text processing ○ Do we have a representative view of the subreddit? ● This system is not speciﬁc to hate speech ○ Only for subreddits with a priori expectation of hatefulness!

Slide 45

Slide 45 text

ICML 2008 https://www.cs.toronto.edu/~amnih/papers/bpmf.pdf NIPS 2007 https://papers.nips.cc/paper/3208-probabilistic-matrix-factorization.pdf

Slide 46

Slide 46 text

Thank You! Questions? https://eigenfoo.xyz @_eigenfoo eigenfoo Blog post and slide deck: eigenfoo.xyz/reddit-slides

Slide 47

Slide 47 text

Appendix: More Slides!

Slide 48

Slide 48 text

Links ● GitHub repository: ○ https://github.com/eigenfoo/reddit-clusters ● Blog post on NMF clusters: ○ https://eigenfoo.xyz/reddit-clusters/ ● Blog post on unsuitability of LDA: ○ https://eigenfoo.xyz/lda-sucks/ ● Blog post on future (matrix factorization) work: ○ https://eigenfoo.xyz/matrix-factorizations/

Slide 49

Slide 49 text

Number of posts

Slide 50

Slide 50 text

Number of posts in known hateful subreddits

Slide 51

Slide 51 text

Distill, Oct. 2016 https://distill.pub/2016/misread-tsne/ tldr: ● t-SNE is very sensitive to the perplexity value ● Multiple t-SNE plots may be necessary

Slide 52

Slide 52 text

No content

Slide 53

Slide 53 text

No content

Slide 54

Slide 54 text

Probabilistic Matrix Factorization https://eigenfoo.xyz/matrix-factorizations/ Bayesian Probabilistic Matrix Factorization https://eigenfoo.xyz/matrix-factorizations/