Can Cascades Be Predicted?

Can Cascades Be Predicted?

Presented at WWW 2014.

On many social networking web sites such as Facebook and Twitter, resharing or reposting functionality allows users to share others' content with their own friends or followers. As content is reshared from user to user, large cascades of reshares can form. In this work, we develop a framework for addressing cascade prediction problems. On a large sample of photo reshare cascades on Facebook, we find strong performance in predicting whether a cascade will continue to grow in the future. We find that the relative growth of a cascade becomes more predictable as we observe more of its reshares, that temporal and structural features are key predictors of cascade size, and that initially, breadth, rather than depth in a cascade is a better indicator of larger cascades. This prediction performance is robust in the sense that multiple distinct classes of features all achieve similar performance. Observing independent cascades of the same content, we find that while these cascades differ greatly in size, we are still able to predict which ends up the largest.

8480b47e733a040fba07c32da414b0e0?s=128

Justin Cheng

April 11, 2014
Tweet

Transcript

  1. Can Cascades be predicted? Justin Cheng Lada Adamic Alex Dow

    Jon Kleinberg Jure Leskovec Stanford Facebook Facebook Cornell Stanford
  2. We live in networks

  3. Networks enable the diffusion and flow of information news rumors

    product adoption disease mobilization
  4. An information cascade

  5. Reshares on Facebook Example

  6. Cascades form as people (re)share information with one another.

  7. How have cascades been studied? • Will information ever get

    shared?
 Petrovic, S., Osborne, M., & Lavrenko, V. (2011). RT to Win! Predicting Message Propagation in Twitter. ICWSM 2011. • Will popular content remain popular?
 Ma, Z., Sun, A., & Cong, G. (2013). On predicting the popularity of newly emerging hashtags in Twitter. JASIST 2013. • What do large cascades look like?
 Dow, P. A., Adamic, L. A., & Friggeri, A. (2013). The Anatomy of Large Facebook Cascades. ICWSM 2013. • How will a cascade grow in the future?
  8. Can we predict how a cascade will grow in the

    future? recommend content identify trends predict reach Hard problem?
  9. Large cascades are rare Empirical CCDF 0 0.25 0.5 0.75

    1 Cascade size 0 250 500 750 1000 Difficulty #1
  10. Same content, different popularity Difficulty #2

  11. Increasing the strength of social influence increased both inequality and

    unpredictability of success. Salganik, M. J., Dodds, P. S., & Watts, D. J. (2006). Experimental study of inequality and unpredictability in an artificial cultural market.
  12. Cascades are predictable. (*solution: cascade growth prediction problem) size structure

    effect of content This Talk:
  13. How do we formulate the cascade growth prediction problem? 13

    ?
  14. ? Most cascades are small (class imbalance) Will a cascade

    get k=100 reshares? Possibility #1
  15. ? How big will a small cascade get? Most cascades

    are small (outliers skew results) Possibility #2
  16. Just look at the large cascades? ? ? Possibility #3

    Selection Bias (only considers small subset of data)
  17. Will a cascade reach the median size? ? ? k

    reshares less than the median f(k) more than the median f(k) Our Approach
  18. Empirical median cascade size, f(k) 0 10 20 30 40

    50 Number of reshares observed, k 5 10 15 20 25
  19. Let X = { cascades of size ≥ k }.

    Then the median of X is 2k. (see paper)
  20. Will a cascade reach the median size? ? ? k

    reshares less than the median f(k) more than the median f(k)
  21. Will a cascade double in size? ? ? ≤ 2k

    reshares = f(k) > 2k reshares = f(k) k reshares
  22. Given that a cascade has obtained k reshares, will it

    double? Cascade Growth Prediction Problem balanced can track growth over time
  23. We looked at photos uploaded to Facebook in June 2013

    that obtained at least 5 reshares, and track reshares of these photos for 28 days following their initial uploads. Using features of the cascade, we evaluate the performance of a classifier. 150k photos 9m reshares
  24. What factors affect predictability? Content
 (e.g. has overlaid text) User


    (e.g. friend count) Structural
 (e.g. proximity to root in G) Temporal
 (e.g. time between reshares)
  25. How well can we predict cascade doubling? All Temporal All

    but temporal Structural User Content Accuracy (k=5) 0.00 0.20 0.40 0.60 0.80 0.558 0.637 0.671 0.722 0.78 0.795 All but temporal
  26. Given that a cascade has obtained k reshares, will it

    double? Cascade Growth Prediction Problem
  27. How does performance change with k? 27 > 10 reshares?

    5 reshares > 40 reshares? 20 reshares vs.
  28. How does performance change with k? 28 > 10 reshares?

    5 reshares > 40 reshares? 20 reshares vs. Less data More data
  29. How does performance change with k? 29 > 10 reshares?

    5 reshares > 40 reshares? 20 reshares vs. Short-term prediction Long-term prediction
  30. Easier to predict if larger cascades will double in size

    Accuracy 0.78 0.79 0.8 0.81 0.82 Number of reshares observed, k 0 25 50 75 100
  31. Fix the minimum cascade size R ≥ k. How does

    performance change with k? 31 >40 reshares? 5 reshares > 40 reshares? 20 reshares vs.
  32. Fix the minimum cascade size R ≥ k. How does

    performance change with k? 32 >40 reshares? 5 reshares > 40 reshares? 20 reshares vs. Is there a saturation effect?
  33. Accuracy 0.5 0.6 0.7 0.8 0.9 Number of reshares observed,

    k (assuming minimum cascade size R = 100) 0 25 50 75 100 More information about a cascade is always be er, with no saturation effect
  34. How do various factors affect predictability?

  35. The original post (and poster) get less important with increasing

    k Original poster’s friend count Correlation 0 0.07 0.14 0.21 0.28 Minimum cascade size k 0 25 50 75 100 Whether the photo is a meme Correlation 0 0.03 0.06 0.09 0.12 Minimum cascade size k 0 25 50 75 100
  36. Successful cascades get many views quickly, but with low, or

    high conversion rates? # users who saw the first k reshares Correlation -1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1 Minimum cascade size k 0 25 50 75 100 ? Unique views per unit time Correlation 0 0.065 0.13 0.195 0.26 Minimum cascade size k 0 25 50 75 100
  37. Successful cascades get many views quickly, and achieve high conversion

    rates Unique views per unit time Correlation 0 0.065 0.13 0.195 0.26 Minimum cascade size k 0 25 50 75 100 # users who saw the first k reshares Correlation -0.18 -0.135 -0.09 -0.045 0 Minimum cascade size k 0 25 50 75 100
  38. How does the initial structure of a cascade affect its

    growth?
  39. None
  40. User Page e.g. brands, celebrities e.g. you or me

  41. Initial structure ma ers Proportion above median 0.1 0.2 0.3

    0.4 0.5 0.6 Users Pages
  42. … Probability of doubling Your friends weren’t interested Only your

    friends were interested Broad appeal
  43. ? ? Can we predict the structure?

  44. Wiener index (mean all-pairs shortest distance) Anderson, A., Goel, S.,

    Hofman, J. & Watts, D. J. The structural virality of online diffusion. d = 1.98 d = 2.47 d = 14.4
  45. Page cascades are driven by hub nodes Wiener index 0

    10 20 30 40 Cascade Size [1, 10) [10, 100) [100, 1000) [1000, 10000) [10000, 100000) Users Pages
  46. Will a cascade have structural virality above the median? Accuracy

    of 0.725 Temporal and structural features equally predictive Pages are easier than users to predict
  47. Increasing the strength of social influence increased both inequality and

    unpredictability of success. Salganik, M. J., Dodds, P. S., & Watts, D. J. (2006). Experimental study of inequality and unpredictability in an artificial cultural market.
  48. None
  49. None
  50. Can we differentiate cascades of the same content?

  51. We consider clusters of identical photos uploaded to Facebook. For

    each cluster, we select ten random cascades, and predict which was the largest (random guessing: 10%). 983 clusters 13m reshares 38k photo uploads
  52. Can we predict the largest of 10 random chosen cascades

    of the identical image? Accuracy of 0.497 Mean Reciprocal Rank of 0.662 Gini Coefficient of 0.787
  53. Next: how do cascades interact?

  54. How do I make my posts “go viral”? • Favor

    memes (i.e. post popular content) • Be a page and have lots of followers (i.e. be popular) • Know what your friends like, and what your friends’ friends like (i.e. be a marketing guru)
  55. Cascades are predictable. The cascade growth prediction problem allows us

    to accurately predict the future growth of a cascade. Justin Cheng Lada Adamic Alex Dow Jon Kleinberg Jure Leskovec (Technology Review has a good summary: h p://bit.ly/trcascades)