Cooperation and Quality in Wikipedia

From AcaWiki
Jump to: navigation, search

Citation: Dennis Wilkinson, Bernardo Huberman (2007) Cooperation and Quality in Wikipedia. Proceedings of the 2007 International Symposium on Wikis (RSS)
DOI (original publisher): 10.1145/1296951.1296968
Semantic Scholar (metadata): 10.1145/1296951.1296968
Sci-Hub (fulltext): 10.1145/1296951.1296968
Internet Archive Scholar (search for fulltext): Cooperation and Quality in Wikipedia
Download: http://doi.acm.org/10.1145/1296951.1296968
Tagged: Computer Science (RSS) collaborative writing (RSS), cooperation (RSS), groupware (RSS), Wikipedia (RSS), quantitative (RSS), complex systems (RSS)

Summary

This article asks 'how can we measure the quality of a Wikipedia article?' and 'How are quality articles on Wikipedia produced.' Wikipedia is awesome! It is allowing collaboration at a scale never before seen. However, measuring the quality of articles remains an open problem. People have used the number of edits, the number of unique editors, or other measures as stand-ins for quality, but none of them seem quite right.

When it comes to producing quality articles. Some people, including influential Wikipedians seem to think that quality articles are often the product of a small number of dedicated editors. However in this study they find instead that articles with a larger number of distinct editors, and a large number of edits are the highest quality. Evidence of collaboration is more important than the number of edits per editor. They also use a stochastic process model (using a differential equation) in which edits beget edits to explain the process by which articles accrue edits. Instead of a power-law, they find that edits to articles (of a given age) follow a log-normal distribution. The authors claim that the fact that the distribution is lognormal not power law "means that a small but significant population of articles experience a disproportionately high number of edits and editors, while the vast majority of articles undergo far less activity."

They used featured articles as a benchmark for quality. They used statistical controls for topic popularity (google pagerank), edit and editor counts normalized by article age, they also removed edits from the two weeks before the article was featured. This let them make some interesting and impressive plots that showed that the featured articles had more edits and editors than other articles.

Next they use similar methods to compare the number of talk page edits between featured articles and other articles. This gives them an approximate measure of the amount of cooperation in the article work.

They also showed that the featured articles tended to have a much higher fraction of "quick-turnaround" edits compared to non-featured articles. This seems to be further evidence that collaboration is important for making high quality articles.

One nice thing about this paper is that it relies on simple measures and straightforward transformations of the data in order to produce compelling plots that make each of its points. Other studies might have used multiple regression instead.

In conclusion they reiterate that Wikipedia enables coordination and organization that facilitate cooperative work on articles. Their analysis points not to a small number of dedicated editors developing articles alone, but instead to the efforts of a larger numbers of cooperating editors.

Theoretical and Practical Relevance

This early quantitative analysis of Wikipedia challenged the notion that quality Wikipedia articles are largely the produce of the efforts of a small number of dedicated editors. Instead, featured articles have a great number of contributing editors, more edits on talk pages, and more quick-turnaround edits. This paper helped establish the idea that Wikipedia is a truly collaborative project.