$30 off During Our Annual Pro Sale. View Details »

Can Cascades Be Predicted?

Can Cascades Be Predicted?

Presented at WWW 2014.

On many social networking web sites such as Facebook and Twitter, resharing or reposting functionality allows users to share others' content with their own friends or followers. As content is reshared from user to user, large cascades of reshares can form. In this work, we develop a framework for addressing cascade prediction problems. On a large sample of photo reshare cascades on Facebook, we find strong performance in predicting whether a cascade will continue to grow in the future. We find that the relative growth of a cascade becomes more predictable as we observe more of its reshares, that temporal and structural features are key predictors of cascade size, and that initially, breadth, rather than depth in a cascade is a better indicator of larger cascades. This prediction performance is robust in the sense that multiple distinct classes of features all achieve similar performance. Observing independent cascades of the same content, we find that while these cascades differ greatly in size, we are still able to predict which ends up the largest.

Justin Cheng

April 11, 2014
Tweet

More Decks by Justin Cheng

Other Decks in Research

Transcript

  1. Can Cascades
    be predicted?
    Justin Cheng Lada Adamic Alex Dow Jon Kleinberg Jure Leskovec
    Stanford
    Facebook
    Facebook
    Cornell
    Stanford

    View Slide

  2. We live in networks

    View Slide

  3. Networks enable the diffusion
    and flow of information
    news
    rumors
    product adoption
    disease
    mobilization

    View Slide

  4. An information cascade

    View Slide

  5. Reshares on Facebook
    Example

    View Slide

  6. Cascades form as people (re)share
    information with one another.

    View Slide

  7. How have cascades been studied?
    • Will information ever get shared?

    Petrovic, S., Osborne, M., & Lavrenko, V. (2011). RT to Win! Predicting Message Propagation in Twitter. ICWSM 2011.
    • Will popular content remain popular?

    Ma, Z., Sun, A., & Cong, G. (2013). On predicting the popularity of newly emerging hashtags in Twitter. JASIST 2013.
    • What do large cascades look like?

    Dow, P. A., Adamic, L. A., & Friggeri, A. (2013). The Anatomy of Large Facebook Cascades. ICWSM 2013.
    • How will a cascade grow in the future?

    View Slide

  8. Can we predict how a cascade
    will grow in the future?
    recommend content identify trends predict reach
    Hard problem?

    View Slide

  9. Large cascades are rare
    Empirical CCDF
    0
    0.25
    0.5
    0.75
    1
    Cascade size
    0 250 500 750 1000
    Difficulty #1

    View Slide

  10. Same content, different popularity
    Difficulty #2

    View Slide

  11. Increasing the strength of social influence
    increased both inequality and
    unpredictability of success.
    Salganik, M. J., Dodds, P. S., & Watts, D. J. (2006). Experimental study of inequality and unpredictability in an artificial cultural market.

    View Slide

  12. Cascades are predictable.
    (*solution: cascade growth prediction problem)
    size structure effect of content
    This Talk:

    View Slide

  13. How do we formulate the cascade
    growth prediction problem?
    13
    ?

    View Slide

  14. ?
    Most cascades are small
    (class imbalance)
    Will a cascade get k=100 reshares?
    Possibility #1

    View Slide

  15. ?
    How big will a small cascade get?
    Most cascades are small
    (outliers skew results)
    Possibility #2

    View Slide

  16. Just look at the large cascades?
    ?
    ?
    Possibility #3
    Selection Bias
    (only considers small subset of data)

    View Slide

  17. Will a cascade reach the median size?
    ?
    ?
    k reshares
    less than the median f(k)
    more than the median f(k)
    Our Approach

    View Slide

  18. Empirical median cascade size, f(k)
    0
    10
    20
    30
    40
    50
    Number of reshares observed, k
    5 10 15 20 25

    View Slide

  19. Let X = { cascades of size ≥ k }.
    Then the median of X is 2k.
    (see paper)

    View Slide

  20. Will a cascade reach the median size?
    ?
    ?
    k reshares
    less than the median f(k)
    more than the median f(k)

    View Slide

  21. Will a cascade double in size?
    ?
    ?
    ≤ 2k reshares = f(k)
    > 2k reshares = f(k)
    k reshares

    View Slide

  22. Given that a cascade has obtained
    k reshares, will it double?
    Cascade Growth Prediction Problem
    balanced can track growth over time

    View Slide

  23. We looked at photos uploaded to
    Facebook in June 2013 that obtained at
    least 5 reshares, and track reshares of
    these photos for 28 days following their
    initial uploads.
    Using features of the cascade, we
    evaluate the performance of a classifier.
    150k photos
    9m reshares

    View Slide

  24. What factors affect predictability?
    Content

    (e.g. has overlaid text)
    User

    (e.g. friend count)
    Structural

    (e.g. proximity to root in G)
    Temporal

    (e.g. time between reshares)

    View Slide

  25. How well can we predict cascade doubling?
    All
    Temporal
    All but temporal
    Structural
    User
    Content
    Accuracy (k=5)
    0.00 0.20 0.40 0.60 0.80
    0.558
    0.637
    0.671
    0.722
    0.78
    0.795
    All but temporal

    View Slide

  26. Given that a cascade has obtained
    k reshares, will it double?
    Cascade Growth Prediction Problem

    View Slide

  27. How does performance
    change with k?
    27
    > 10 reshares?
    5 reshares > 40 reshares?
    20 reshares
    vs.

    View Slide

  28. How does performance
    change with k?
    28
    > 10 reshares?
    5 reshares > 40 reshares?
    20 reshares
    vs.
    Less data More data

    View Slide

  29. How does performance
    change with k?
    29
    > 10 reshares?
    5 reshares > 40 reshares?
    20 reshares
    vs.
    Short-term prediction Long-term prediction

    View Slide

  30. Easier to predict if larger cascades will
    double in size
    Accuracy
    0.78
    0.79
    0.8
    0.81
    0.82
    Number of reshares observed, k
    0 25 50 75 100

    View Slide

  31. Fix the minimum cascade size R ≥ k.
    How does performance change with k?
    31
    >40 reshares?
    5 reshares > 40 reshares?
    20 reshares
    vs.

    View Slide

  32. Fix the minimum cascade size R ≥ k.
    How does performance change with k?
    32
    >40 reshares?
    5 reshares > 40 reshares?
    20 reshares
    vs.
    Is there a saturation effect?

    View Slide

  33. Accuracy
    0.5
    0.6
    0.7
    0.8
    0.9
    Number of reshares observed, k (assuming minimum cascade size R = 100)
    0 25 50 75 100
    More information about a cascade is always
    be er, with no saturation effect

    View Slide

  34. How do various factors affect
    predictability?

    View Slide

  35. The original post (and poster) get less
    important with increasing k
    Original poster’s
    friend count
    Correlation
    0
    0.07
    0.14
    0.21
    0.28
    Minimum cascade size k
    0 25 50 75 100
    Whether the photo is
    a meme
    Correlation
    0
    0.03
    0.06
    0.09
    0.12
    Minimum cascade size k
    0 25 50 75 100

    View Slide

  36. Successful cascades get many views quickly,
    but with low, or high conversion rates?
    # users who saw the
    first k reshares
    Correlation
    -1
    -0.75
    -0.5
    -0.25
    0
    0.25
    0.5
    0.75
    1
    Minimum cascade size k
    0 25 50 75 100
    ?
    Unique views per
    unit time
    Correlation
    0
    0.065
    0.13
    0.195
    0.26
    Minimum cascade size k
    0 25 50 75 100

    View Slide

  37. Successful cascades get many views quickly,
    and achieve high conversion rates
    Unique views per
    unit time
    Correlation
    0
    0.065
    0.13
    0.195
    0.26
    Minimum cascade size k
    0 25 50 75 100
    # users who saw the
    first k reshares
    Correlation
    -0.18
    -0.135
    -0.09
    -0.045
    0
    Minimum cascade size k
    0 25 50 75 100

    View Slide

  38. How does the initial structure
    of a cascade affect its growth?

    View Slide

  39. View Slide

  40. User Page
    e.g. brands, celebrities
    e.g. you or me

    View Slide

  41. Initial structure ma ers
    Proportion above median
    0.1
    0.2
    0.3
    0.4
    0.5
    0.6
    Users
    Pages

    View Slide


  42. Probability of doubling
    Your friends weren’t interested
    Only your friends were interested
    Broad appeal

    View Slide

  43. ?
    ?
    Can we predict the structure?

    View Slide

  44. Wiener index (mean all-pairs shortest distance)
    Anderson, A., Goel, S., Hofman, J. & Watts, D. J. The structural virality of online diffusion.
    d = 1.98 d = 2.47 d = 14.4

    View Slide

  45. Page cascades are driven by hub nodes
    Wiener index
    0
    10
    20
    30
    40
    Cascade Size
    [1, 10) [10, 100) [100, 1000) [1000, 10000) [10000, 100000)
    Users
    Pages

    View Slide

  46. Will a cascade have structural virality above
    the median?
    Accuracy of 0.725
    Temporal and structural features
    equally predictive
    Pages are easier than users to predict

    View Slide

  47. Increasing the strength of social influence
    increased both inequality and
    unpredictability of success.
    Salganik, M. J., Dodds, P. S., & Watts, D. J. (2006). Experimental study of inequality and unpredictability in an artificial cultural market.

    View Slide

  48. View Slide

  49. View Slide

  50. Can we differentiate cascades
    of the same content?

    View Slide

  51. We consider clusters of identical
    photos uploaded to Facebook. For
    each cluster, we select ten random
    cascades, and predict which was
    the largest (random guessing: 10%).
    983 clusters 13m reshares
    38k photo uploads

    View Slide

  52. Can we predict the largest of 10 random
    chosen cascades of the identical image?
    Accuracy of 0.497
    Mean Reciprocal Rank of 0.662
    Gini Coefficient of 0.787

    View Slide

  53. Next: how do cascades interact?

    View Slide

  54. How do I make my posts “go viral”?
    • Favor memes (i.e. post popular content)
    • Be a page and have lots of followers (i.e. be popular)
    • Know what your friends like, and what your friends’ friends
    like (i.e. be a marketing guru)

    View Slide

  55. Cascades are predictable.
    The cascade growth prediction
    problem allows us to accurately
    predict the future growth of a
    cascade.
    Justin Cheng Lada Adamic Alex Dow Jon Kleinberg Jure Leskovec
    (Technology Review has a good summary: h p://bit.ly/trcascades)

    View Slide