Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Modelling Hate Speech on Reddit — A Three-Act Play

George Ho
November 08, 2017

Modelling Hate Speech on Reddit — A Three-Act Play

Reddit is the one of the most popular discussion websites today, and is famously broad-minded in what it allows to be said on its forums: however, where there is free speech, there are invariably pockets of hate speech.

In this talk, I present a recent project to model hate speech on Reddit. In three acts, I chronicle the thought processes and stumbling blocks of the project, with each act applying a different form of machine learning: supervised learning, topic modelling and text clustering. I conclude with the current state of the project: a system that allows the modelling and summarization of entire subreddits, and possible future directions. Rest assured that both the talk and the slides have been scrubbed to be safe for work!

George Ho

November 08, 2017
Tweet

More Decks by George Ho

Other Decks in Research

Transcript

  1. Modelling Hate Speech on Reddit —
    A Three-Act Play
    George Ho

    View Slide

  2. About me

    View Slide

  3. Prologue:
    Motivation

    View Slide

  4. Data Science for Social Good
    ● Mandate: help nonprofits, given their data.
    ● Data & Society wanted to investigate hate speech on Reddit.

    View Slide

  5. What is Reddit?
    ● Comprised of many communities,
    called subreddits.
    ○ Each has its own rules and
    moderators.
    ● 5th most popular website in the U.S.
    ● Free speech!

    View Slide

  6. Data
    ● Reddit posts and comments, since ever.
    ● Important features:
    ○ Text Body
    ○ Author Name
    ○ Subreddit
    ○ Time
    ○ Number of upvotes/downvotes

    View Slide

  7. USENIX FOCI 2017
    https://www.usenix.org/system/files/conference/foci17/foci17-paper-nithyanand.pdf
    tldr:
    ● Word embedding + “hate vector” = classifier for hate speech
    ● Used classifier to model hatefulness on Reddit

    View Slide

  8. Some questions
    1. What happens to hate speech as subreddits go through
    takeovers/shutdowns/quarantines?
    2. How do real-world events (e.g. elections, disasters) impact hate speech
    on subreddits?
    3. Is there some measure of similarity between subreddits?

    View Slide

  9. Act One: (Rising Action)
    Supervised Learning

    View Slide

  10. If we could classify hate speech...
    1. What does hatefulness look like over time?
    2. Can we quantify the hatefulness of certain users?
    3. How does the hatefulness of users influence the hatefulness of
    subreddits?

    View Slide

  11. ACL 2012
    https://dl.acm.org/citation.cfm?id=2390688
    tldr:
    ● Text classification
    ● Bigrams + tf-idfs + NB/SVM = pretty good!

    View Slide

  12. Toxic comment dataset
    ● Wikipedia comments
    ● Labels:
    ○ toxic
    ○ severe_toxic
    ○ obscene
    ○ threat
    ○ insult
    ○ identity_hate
    Precision 65%
    Recall 81%
    F-1 72%
    Resulting
    Classifier

    View Slide

  13. Fraction of
    comments
    classified as
    hateful

    View Slide

  14. Same, but
    smoothed

    View Slide

  15. Same, but
    overlaid with
    r/The_Donald

    View Slide

  16. Fraction of
    comments of
    known hateful
    subreddits
    classified as
    hateful

    View Slide

  17. Same, but
    smoothed

    View Slide

  18. But… It just learnt to identify curse words
    ● Almost all comments classified as hate speech contained one or more
    curse words!
    ● This makes sense: fairly civil discussion happens on the Wikipedia
    writers’ forum, so cursing is a very good indicator of abuse.

    View Slide

  19. Supervised learning is
    the wrong approach
    to model hate speech

    View Slide

  20. Why?
    1. Lack of labelled data for Reddit! We only have Wikipedia.
    2. It is a spectrum, not an open-and-shut class.
    a. It ranges from mere insensitive quips to full-on identity hate.
    3. Even if it wasn’t, hate speech is not conveyed using specific words.
    Insult and abuse do not lie in word choice!
    a. The distributional hypothesis does not hold as strongly.
    b. Hate speech is inherently a semantic distinction.

    View Slide

  21. Act Two: (Climax)
    Topic Modelling

    View Slide

  22. Why topic modelling?
    1) Topics will give us a holistic view of subreddits, not just hate speech.
    2) We get a very rich characterization:
    a) Comments belong to topics
    b) Topics are comprised of words
    c) Counting gives us a notion of size

    View Slide

  23. Latent Dirichlet allocation

    View Slide

  24. Topic Modelling and t-SNE Visualization
    https://shuaiw.github.io/2016/12/22/topic-modeling-and-tsne-visualzation.html
    LDA
    t-SNE

    View Slide

  25. View Slide

  26. Another latent variable of interest

    View Slide

  27. Some good clusters...
    Topic #2:
    removed com https www https www tax money http watch news
    Topic #7:
    game team season year good win play teams playing best
    Topic #13:
    sure believe trump wrong saying comment post mueller evidence gt
    Topic #18:
    war world country israel countries china military like happy does

    View Slide

  28. … But mostly bad clusters
    Topic #0:
    got just time day like went friend told didn kids
    Topic #1:
    just gt people say right doesn know law like government
    Topic #3:
    people don just like think really good know want things
    Topic #4:
    years time did great ago ve just work life damn

    View Slide

  29. Flexible topic models are
    ill-suited for Reddit comments

    View Slide

  30. Why?
    1) High variance, low bias: need lots of (good) data to learn well
    a) LDA infers document-by-document → short documents don’t help!
    2) Even worse than just short documents: they don’t even coherently talk
    about a specific topic. Violation of the distributional hypothesis!
    3) Breadth of topics is massive on Reddit. Easy for small pockets of hate
    speech to be drowned out.

    View Slide

  31. Examples of Reddit comments
    1. “turn on particles and use it.”
    2. “ah”
    3. “Can confirm. His Twitter is great for a chuckle.”
    4. “I'm basing my knowledge on the fact that I watched
    the [...] rock fall.”

    View Slide

  32. Problems
    1) High variance, low bias
    a) Short documents
    2) Don’t even coherently talk about a specific topic
    3) Breadth of topics too large.

    View Slide

  33. Act Three: (Resolution)
    Dimensionality Reduction
    and Text Clustering

    View Slide

  34. Solution 1: NMF vs LDA
    ● Lower variance, higher bias.
    ● Strong notions of additivity.
    ○ Part-based
    decomposition!
    ● Still gives us latent space!

    View Slide

  35. Dimensionality reduction vs. text clustering
    NMF doesn’t just give us a latent
    space (LDA does that too)...
    It also gives us an easy way to
    reconstruct the original space.
    So it both reduces the
    dimensionality and clusters!

    View Slide

  36. Solution 2: Draconian text preprocessing
    1. Strip punctuation, tags, etc.
    2. Convert to ASCII, lowercase
    3. Lemmatize
    4. Discard 70% of comments, by count
    of tokens.
    Harsh preprocessing is necessary for our
    model to give good results. Is this a failing?

    View Slide

  37. Solution 3: Only consider specific subreddits
    Since the breadth of discourse is so large on Reddit, we limit ourselves to
    three subreddits traditionally known to be hateful/toxic:
    1. r/theredpill
    2. r/The_Donald
    3. r/CringeAnarchy

    View Slide

  38. /r/theredpill

    View Slide

  39. /r/The_Donald

    View Slide

  40. /r/CringeAnarchy

    View Slide

  41. We now have a way
    to take subreddits and
    tell a story about them

    View Slide

  42. What have we achieved?
    ● A way to reduce dimensionality/cluster a subreddit
    ○ Tells a compelling story
    ○ A form of corpus summarization
    ○ Rich characterization of the subreddit
    ■ Topic-word distributions
    ■ Topic-document distributions

    View Slide

  43. Epilogue:
    Future Directions

    View Slide

  44. Shortcomings
    ● Distributional hypothesis does not hold strongly.
    ○ We never really addressed this!
    ● Draconian text processing
    ○ Do we have a representative view of the subreddit?
    ● This system is not specific to hate speech
    ○ Only for subreddits with a priori expectation of hatefulness!

    View Slide

  45. ICML 2008
    https://www.cs.toronto.edu/~amnih/papers/bpmf.pdf
    NIPS 2007
    https://papers.nips.cc/paper/3208-probabilistic-matrix-factorization.pdf

    View Slide

  46. Thank You!
    Questions?
    https://eigenfoo.xyz
    @_eigenfoo
    eigenfoo
    Blog post and slide deck:
    eigenfoo.xyz/reddit-slides

    View Slide

  47. Appendix:
    More Slides!

    View Slide

  48. Links
    ● GitHub repository:
    ○ https://github.com/eigenfoo/reddit-clusters
    ● Blog post on NMF clusters:
    ○ https://eigenfoo.xyz/reddit-clusters/
    ● Blog post on unsuitability of LDA:
    ○ https://eigenfoo.xyz/lda-sucks/
    ● Blog post on future (matrix factorization) work:
    ○ https://eigenfoo.xyz/matrix-factorizations/

    View Slide

  49. Number of
    posts

    View Slide

  50. Number of
    posts in
    known
    hateful
    subreddits

    View Slide

  51. Distill, Oct. 2016
    https://distill.pub/2016/misread-tsne/
    tldr:
    ● t-SNE is very sensitive to the perplexity value
    ● Multiple t-SNE plots may be necessary

    View Slide

  52. View Slide

  53. View Slide

  54. Probabilistic Matrix Factorization
    https://eigenfoo.xyz/matrix-factorizations/
    Bayesian Probabilistic Matrix Factorization
    https://eigenfoo.xyz/matrix-factorizations/

    View Slide