Why analysing cascades is important? 1. A proxy to unravel the way information is spread on the Web 2. Explain the popularity-gaining phenomenon on the Web 3. Estimate influence and homophily between users 4. Estimate the value of the content 5.They are better indicators of users’ interest and trust networks 6.Explain social network evolution
Tumblr • Diffusion on Tumblr is powered by the Reblogging functionality • The ability to reblog allow content to spread • The traces of the spread create cascades that can be observed • The reblogging events appears as a list of notes attached to posts and their reblogged copies
The two facets of a cascade Structural Temporal Who influenced whom to spread the content? How many shares are there at any point in time (Day/Hour) 0! 6! 12! 18! 24! 30! 0! 2! 4! 6! 8! 10! Number of shares! Days after publishing!
Where do research tasks fit? Netw ork Science Data Science Web Science Cascades Construction Structural Analysis Data collection & preprocessing Temporal & Platform Analysis
Tumblr’s Year in Review “Tumblr’s Year in Review is a showcase of the best stuff on the Internet from 2014. Follow along for a daily dose of creativity, humor, humanity, fandom, and sharing. And GIFs. Lots of GIFs!” http://2014inreblogs.tumblr.com/2014
Cascades networks construction • Ideally, cascade networks will have a neat tree topology .. • However .. • That is not the case, at least not all the time • This is due to cases where: • Reblogs are deleted • Users deactivate their accounts • Users reblog more than once Isolated components Repeated appearances of users
High reblogging rate = Cascades are large! 0 1 2 3 4 5 6 x ⇥105 0.0 0.2 0.4 0.6 0.8 1.0 P(Number of reblogs >= x) 78% 18% Yet another long tail on the Web!
Q: Small branching factor but high impact? - Compute: - Ratio = Branching factor / sub cascade size - if ratio > 1: - The user generates a subcascade that goes beyond its immediate effect, i.e. their branching factor - if ratio = 1: - The user generates subcascades that equal one
Posts Age • The eldest post was active for 617 days • The youngest was active for 28 days • Surprisingly, the old post has a very small cascade size of 131 reblogs, but still managed to survive for 617 days! Accumulating popularity slowly but steady!
Wrapping up .. • Tumblr ‘year in review’ blog features some really ‘large’ cascades! • Cascades matter! • Users’ influence might be underestimated if only the branching factor was taken into account • Cascades on Tumblr have non-trivial sizes and depths • Cascades grow in size in so many ways .. • Large cascades exist!