Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Look Into Reddit's Star Dish

A Look Into Reddit's Star Dish

An analysis of comments on Reddit.

CM Tech
Spring 2017
Cornell Tech

Frances Coronel

May 04, 2017
Tweet

More Decks by Frances Coronel

Other Decks in Research

Transcript

  1. A Look
    Into Reddit’s
    Star Dish
    Sindhu Babu
    Frances Coronel

    View Slide

  2. Table of Contents 1. Background
    2. Literature
    3. Hypotheses
    4. Dataset
    5. Analysis
    2

    View Slide

  3. 1
    Background
    Human & social
    motivation, high-level
    research questions
    Since the launch of Reddit in June 2005, the site has become the 7th
    most visited in the U.S., and its users have posted billions of
    comments.
    Those comments are filled with abbreviations, internet memes and
    slang, much like the rest of the web, and collectively they form a
    trove of data about how people use language online.
    3
    To Note
    As of the end of 2015, the site’s visitors were mostly 35 or younger,
    and about 80 percent male according to Google Adwords.

    View Slide

  4. 2
    Literature
    The connective media
    theories associated
    with this data analysis
    Some of the topics covered...
    ▪ Emotional Contagion
    ▪ Group Polarization
    ▪ Meforming versus Informing
    4

    View Slide

  5. The capacity
    to spread
    emotions
    quickly
    throughout
    the online
    world
    5
    Emotional Contagion

    View Slide

  6. Individuals
    tending to
    endorse a
    more
    extreme
    position in
    the direction
    already
    favored by
    the group
    6
    Group Polarization

    View Slide

  7. Users that
    typically post
    messages
    relating to
    themselves or
    their thoughts
    versus posting
    messages that
    are informing
    in nature
    7
    Meformers vs
    Informers

    View Slide

  8. In other words, what components make up
    Reddit’s secret dish of comments and what
    allows them as a whole to succeed in a
    digital world where many platforms fail to
    be regulate such discussion systems?
    8

    View Slide

  9. 3
    Hypotheses
    High-level research
    questions
    1. What kind of communication style in
    comments drives the highest reply rates?
    Passive, assertive, aggressive, or sarcastic
    sentiment?
    2. What kind of information style drives the
    highest reply rates? Meforming or informing?
    9

    View Slide

  10. There will be
    a positive
    correlation
    between
    response rate
    and level of
    aggression.
    10
    What kind of sentiment in
    comments drives the
    highest reply rates?
    Passive, assertive,
    aggressive, or sarcastic
    sentiment?

    View Slide

  11. There will be
    a positive
    correlation
    between
    response rate
    and
    meforming.
    11
    What kind of information
    style drives the highest
    reply rates? Meforming or
    informing?

    View Slide

  12. 4
    Dataset
    Introduce the service &
    dataset you looked at
    Source: Kaggle
    12

    View Slide

  13. 30GB
    Recently Reddit released an enormous dataset containing all ~1.7 billion of
    their publicly available comments. The full dataset is a crazy 1+ terabyte
    uncompressed, so Kaggle decided to just share a small portion of the
    comments from May 2015 for folks like connective media students to
    tinker with (8GB compressed, 30 GB uncompressed).
    13

    View Slide

  14. 5
    Analysis
    Describe how you
    addressed the
    questions with the data
    and talk about the
    results
    ▪ Sentiment Analysis
    - 4 styles
    - Aggressive, Assertive, Passive, Sarcastic
    - Identified keywords that are representative of these
    communication styles
    ▪ Meforming versus Informing
    - Identified keywords which might denote meforming
    - All other comments are identified as informing
    14

    View Slide

  15. Based off these results,
    our hypothesis on the
    positive correlation
    between aggression
    and reply rates is
    rejected.
    However, it is clear that
    there is in fact a
    positive correlation
    between aggression
    and the number of
    upvotes.
    15
    Communication
    Style Analysis
    Aggressive comments had the highest number of upvotes with a ranking score of 6.45
    which is ~11% higher than the second best of assertive comments.
    In turn, assertive comments had the highest reply rates with nearly 90,000 comments
    which is 200% better (2x) than the next best of aggressive comments.
    Sarcasm, in contrast, rarely received high scores.

    View Slide

  16. Based off these results,
    interestingly enough,
    our hypothesis on the
    positive correlation
    between meforming
    and reply rates is
    rejected.
    Meforming fared much
    worse when it came to
    reply rates but
    surprisingly was
    slightly higher when it
    came to number of
    upvotes.
    16
    Meforming
    versus Informing
    Meforming comments had the highest number of upvotes with a ranking
    score of 5.68 which is only ~1% higher compared to informing.
    Informing comments had the highest reply rates with over 1mill comments
    which is a staggering ~500% higher than meforming.

    View Slide

  17. 6
    Conclusions
    Describe how you
    addressed the
    questions with the data
    and talk about the
    results
    ▪ A user on Reddit is more likely to have a
    higher reply rate for a comment that is
    assertive and informing.
    ▪ In turn, it can also be concluded that a user on
    Reddit is less likely to have a higher reply rate
    for a comment that is sarcastic and
    meforming.
    17

    View Slide

  18. Credits
    Special thanks to all the people who
    made and released these awesome
    resources for free.
    18
    ▪ Presentation template by SlidesCarnival
    ▪ Dataset provided by Kaggle
    ▪ The brains of Sindhu Babu & Frances Coronel
    ▪ See our report for academic references

    View Slide