Information contagion: An empirical study of the spread of news on Digg and Twitter social networks

{{Summary
 * title=Information contagion: An empirical study of the spread of news on Digg and Twitter social networks
 * authors=K. Lerman, R. Ghosh
 * url=http://www.aaai.org/ocs/index.php/ICWSM/ICWSM10/paper/view/1509/1839
 * tags=sociology, Digg, Twitter, network structure, information spreading, social networks, information cascades
 * summary=Using empirical analysis of two social networks (Digg and Twitter), this paper studies Studies how network structure influences the dynamics of information spread. They map the spread of interest in news stories, hypothesizing that friend networks are the primary way news is spread before a story goes to the front page of Digg.

They distinguish between diggs/retweets from a user's fans/followers and those from the community at large, effectively studying information cascades. One difference from previous research is that they directly harvest the fans/followers to understand the local structure of the network, rather than trying to infer this network from the cascades themselves.

Digg
3,553 stories promoted to the front page in June 2009. Including story title, story id, link, submitter’s name, submission time, list of voters and the time of each vote, the time the story was promoted to the front page, and the list of voters’ friends.

Twitter
398 stories from Tweetmeme [frequently retweeted URLs] originally posted between June 11, 2009 and July 3, 2009. Including name of the user who posted the link to it, the time it was posted, the number of times the link was retweeted, and details of up to 1000 of the most recent retweets (name of the retreating user, text and time stamp of the rewet). (329 stories had fewer than 1000 retweets.)

Active Users & Friends
They define "active users" as those who digged/retweeted at least one story.

At the time of the study, Digg was much more interconnected than Twitter; Digg's reciprocal links are two orders of magnitude larger for triads and one order of magnitude larger for dyads. The author suspect that Twitter will become more dense with time, since it is a newer service.

Digg
139,409 active Digg users. Of these active users, 71,834 listed a friend; there were 258,220 of these friend links.

There were 125,219 reciprocal links between 279,725 distinct users in the Digg sample, giving the fraction of mutual links for dyads as $3.20 x 10^-6$.

The fraction of mutual links for triads (clustering coefficient) was $7.60X10^-12$.

Twitter
137,582 active Twitter users. There were 6,200,051 followers.

There were 3,973,892 reciprocal links among the 6,200,051 followers in the Twitter sample, giving the fraction of mutual links for dyads as $2.07 x 10^-7$.

The fraction of mutual links for triads (clustering coefficient) was $1.92X10^-14$.

"Frontpage" Story Spread
They trace the "cascade of interest" in a story through the underlying social network. For Digg, this has a normal distribution; for Twitter, some stories don't spread, yet the distribution is broader, making it possible for some stories to spread farther.

Further, they analyze the evolution of in-network ("fan") votes.

Digg
Probability of spreading stories is about .74 before promotion and about .3 after promotion. The most common number of votes for front page stories is around 500.

Twitter
The most common number of retweets is around 400.

Distributions
There is a lognormal distribution on the number of votes for a story.
 * relevance=Brings attention to the question of why "heavy tail" lognormal distributions are common in peer production. They suggest that aggregating quality contributions, and avoiding noise, may be possible by separating the activity of a user's chosen network from the network at large.

Interesting facts: }}
 * After about a day, the number of votes/retweets saturates to "a value that reflects their popularity"
 * Digg's social network is denser and more interconnected.
 * Relatively few stories accrue thousands of votes
 * journal=ICWSM 2010
 * pub_date=2010
 * subject=Computer Science
 * pub_open_access=Yes