Information contagion: An empirical study of the spread of news on Digg and Twitter social networks

From AcaWiki
Jump to: navigation, search

Citation: K. Lerman, R. Ghosh (2010) Information contagion: An empirical study of the spread of news on Digg and Twitter social networks. ICWSM 2010 (RSS)
Internet Archive Scholar (search for fulltext): Information contagion: An empirical study of the spread of news on Digg and Twitter social networks
Download: http://www.aaai.org/ocs/index.php/ICWSM/ICWSM10/paper/view/1509/1839
Tagged: Computer Science (RSS) sociology (RSS), Digg (RSS), Twitter (RSS), network structure (RSS), information spreading (RSS), social networks (RSS), information cascades (RSS)

Summary

Using empirical analysis of two social networks (Digg and Twitter), this paper studies Studies how network structure influences the dynamics of information spread. They map the spread of interest in news stories, hypothesizing that friend networks are the primary way news is spread before a story goes to the front page of Digg.

They distinguish between diggs/retweets from a user's fans/followers and those from the community at large, effectively studying information cascades. One difference from previous research is that they directly harvest the fans/followers to understand the local structure of the network, rather than trying to infer this network from the cascades themselves.


Corpus

Digg

3,553 stories promoted to the front page in June 2009. Including story title, story id, link, submitter’s name, submission time, list of voters and the time of each vote, the time the story was promoted to the front page, and the list of voters’ friends.

Twitter

398 stories from Tweetmeme [frequently retweeted URLs] originally posted between June 11, 2009 and July 3, 2009. Including name of the user who posted the link to it, the time it was posted, the number of times the link was retweeted, and details of up to 1000 of the most recent retweets (name of the retreating user, text and time stamp of the rewet). (329 stories had fewer than 1000 retweets.)

Active Users & Friends

They define "active users" as those who digged/retweeted at least one story.

At the time of the study, Digg was much more interconnected than Twitter; Digg's reciprocal links are two orders of magnitude larger for triads and one order of magnitude larger for dyads. The author suspect that Twitter will become more dense with time, since it is a newer service.

Digg

139,409 active Digg users. Of these active users, 71,834 listed a friend; there were 258,220 of these friend links.

There were 125,219 reciprocal links between 279,725 distinct users in the Digg sample, giving the fraction of mutual links for dyads as $3.20 x 10^-6$.

The fraction of mutual links for triads (clustering coefficient) was $7.60X10^-12$.

Twitter

137,582 active Twitter users. There were 6,200,051 followers.

There were 3,973,892 reciprocal links among the 6,200,051 followers in the Twitter sample, giving the fraction of mutual links for dyads as $2.07 x 10^-7$.

The fraction of mutual links for triads (clustering coefficient) was $1.92X10^-14$.

"Frontpage" Story Spread

They trace the "cascade of interest" in a story through the underlying social network. For Digg, this has a normal distribution; for Twitter, some stories don't spread, yet the distribution is broader, making it possible for some stories to spread farther.

Further, they analyze the evolution of in-network ("fan") votes.

Digg

Probability of spreading stories is about .74 before promotion and about .3 after promotion. The most common number of votes for front page stories is around 500.

Twitter

The most common number of retweets is around 400.

Distributions

There is a lognormal distribution on the number of votes for a story.

Theoretical and Practical Relevance

Brings attention to the question of why "heavy tail" lognormal distributions are common in peer production. They suggest that aggregating quality contributions, and avoiding noise, may be possible by separating the activity of a user's chosen network from the network at large.

Interesting facts:

  • After about a day, the number of votes/retweets saturates to "a value that reflects their popularity"
  • Digg's social network is denser and more interconnected.
  • Relatively few stories accrue thousands of votes

This was published in an open access journal.