Can Cascades Be Predicted?

Can Cascades be predicted? Justin Cheng Lada Adamic Alex Dow
Jon Kleinberg Jure Leskovec Stanford Facebook Facebook Cornell Stanford

We live in networks

Networks enable the diﬀusion and ﬂow of information news rumors
product adoption disease mobilization

An information cascade

Reshares on Facebook Example

Cascades form as people (re)share information with one another.

How have cascades been studied? • Will information ever get
shared?  Petrovic, S., Osborne, M., & Lavrenko, V. (2011). RT to Win! Predicting Message Propagation in Twitter. ICWSM 2011. • Will popular content remain popular?  Ma, Z., Sun, A., & Cong, G. (2013). On predicting the popularity of newly emerging hashtags in Twitter. JASIST 2013. • What do large cascades look like?  Dow, P. A., Adamic, L. A., & Friggeri, A. (2013). The Anatomy of Large Facebook Cascades. ICWSM 2013. • How will a cascade grow in the future?

Can we predict how a cascade will grow in the
future? recommend content identify trends predict reach Hard problem?

Large cascades are rare Empirical CCDF 0 0.25 0.5 0.75
1 Cascade size 0 250 500 750 1000 Diﬃculty #1

Same content, diﬀerent popularity Diﬃculty #2

Increasing the strength of social inﬂuence increased both inequality and
unpredictability of success. Salganik, M. J., Dodds, P. S., & Watts, D. J. (2006). Experimental study of inequality and unpredictability in an artificial cultural market.

Cascades are predictable. (*solution: cascade growth prediction problem) size structure
eﬀect of content This Talk:

How do we formulate the cascade growth prediction problem? 13
?

? Most cascades are small (class imbalance) Will a cascade
get k=100 reshares? Possibility #1

? How big will a small cascade get? Most cascades
are small (outliers skew results) Possibility #2

Just look at the large cascades? ? ? Possibility #3
Selection Bias (only considers small subset of data)

Will a cascade reach the median size? ? ? k
reshares less than the median f(k) more than the median f(k) Our Approach

Empirical median cascade size, f(k) 0 10 20 30 40
50 Number of reshares observed, k 5 10 15 20 25

Let X = { cascades of size ≥ k }.
Then the median of X is 2k. (see paper)

Will a cascade reach the median size? ? ? k
reshares less than the median f(k) more than the median f(k)

Will a cascade double in size? ? ? ≤ 2k
reshares = f(k) > 2k reshares = f(k) k reshares

Given that a cascade has obtained k reshares, will it
double? Cascade Growth Prediction Problem balanced can track growth over time

We looked at photos uploaded to Facebook in June 2013
that obtained at least 5 reshares, and track reshares of these photos for 28 days following their initial uploads. Using features of the cascade, we evaluate the performance of a classiﬁer. 150k photos 9m reshares

What factors aﬀect predictability? Content  (e.g. has overlaid text) User 
(e.g. friend count) Structural  (e.g. proximity to root in G) Temporal  (e.g. time between reshares)

How well can we predict cascade doubling? All Temporal All
but temporal Structural User Content Accuracy (k=5) 0.00 0.20 0.40 0.60 0.80 0.558 0.637 0.671 0.722 0.78 0.795 All but temporal

Given that a cascade has obtained k reshares, will it
double? Cascade Growth Prediction Problem

How does performance change with k? 27 > 10 reshares?
5 reshares > 40 reshares? 20 reshares vs.

5 reshares > 40 reshares? 20 reshares vs. Less data More data

5 reshares > 40 reshares? 20 reshares vs. Short-term prediction Long-term prediction

Easier to predict if larger cascades will double in size
Accuracy 0.78 0.79 0.8 0.81 0.82 Number of reshares observed, k 0 25 50 75 100

Fix the minimum cascade size R ≥ k. How does
performance change with k? 31 >40 reshares? 5 reshares > 40 reshares? 20 reshares vs.

Fix the minimum cascade size R ≥ k. How does
performance change with k? 32 >40 reshares? 5 reshares > 40 reshares? 20 reshares vs. Is there a saturation eﬀect?

Accuracy 0.5 0.6 0.7 0.8 0.9 Number of reshares observed,
k (assuming minimum cascade size R = 100) 0 25 50 75 100 More information about a cascade is always be er, with no saturation eﬀect

How do various factors aﬀect predictability?

The original post (and poster) get less important with increasing
k Original poster’s friend count Correlation 0 0.07 0.14 0.21 0.28 Minimum cascade size k 0 25 50 75 100 Whether the photo is a meme Correlation 0 0.03 0.06 0.09 0.12 Minimum cascade size k 0 25 50 75 100

Successful cascades get many views quickly, but with low, or
high conversion rates? # users who saw the ﬁrst k reshares Correlation -1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1 Minimum cascade size k 0 25 50 75 100 ? Unique views per unit time Correlation 0 0.065 0.13 0.195 0.26 Minimum cascade size k 0 25 50 75 100

Successful cascades get many views quickly, and achieve high conversion
rates Unique views per unit time Correlation 0 0.065 0.13 0.195 0.26 Minimum cascade size k 0 25 50 75 100 # users who saw the ﬁrst k reshares Correlation -0.18 -0.135 -0.09 -0.045 0 Minimum cascade size k 0 25 50 75 100

How does the initial structure of a cascade aﬀect its
growth?

User Page e.g. brands, celebrities e.g. you or me

Initial structure ma ers Proportion above median 0.1 0.2 0.3
0.4 0.5 0.6 Users Pages

… Probability of doubling Your friends weren’t interested Only your
friends were interested Broad appeal

? ? Can we predict the structure?

Wiener index (mean all-pairs shortest distance) Anderson, A., Goel, S.,
Hofman, J. & Watts, D. J. The structural virality of online diffusion. d = 1.98 d = 2.47 d = 14.4

Page cascades are driven by hub nodes Wiener index 0
10 20 30 40 Cascade Size [1, 10) [10, 100) [100, 1000) [1000, 10000) [10000, 100000) Users Pages

Will a cascade have structural virality above the median? Accuracy
of 0.725 Temporal and structural features equally predictive Pages are easier than users to predict

Increasing the strength of social inﬂuence increased both inequality and
unpredictability of success. Salganik, M. J., Dodds, P. S., & Watts, D. J. (2006). Experimental study of inequality and unpredictability in an artificial cultural market.

Can we diﬀerentiate cascades of the same content?

We consider clusters of identical photos uploaded to Facebook. For
each cluster, we select ten random cascades, and predict which was the largest (random guessing: 10%). 983 clusters 13m reshares 38k photo uploads

Can we predict the largest of 10 random chosen cascades
of the identical image? Accuracy of 0.497 Mean Reciprocal Rank of 0.662 Gini Coeﬃcient of 0.787

Next: how do cascades interact?

How do I make my posts “go viral”? • Favor
memes (i.e. post popular content) • Be a page and have lots of followers (i.e. be popular) • Know what your friends like, and what your friends’ friends like (i.e. be a marketing guru)

Cascades are predictable. The cascade growth prediction problem allows us
to accurately predict the future growth of a cascade. Justin Cheng Lada Adamic Alex Dow Jon Kleinberg Jure Leskovec (Technology Review has a good summary: h p://bit.ly/trcascades)

Can Cascades Be Predicted?

Can Cascades Be Predicted?

More Decks by Justin Cheng

Other Decks in Research

Featured

Transcript