Size matters: Word count as a measure of quality on Wikipedia

From AcaWiki
Jump to: navigation, search

Citation: Joshua E. Blumenstock (2008) Size matters: Word count as a measure of quality on Wikipedia.
Internet Archive Scholar (search for fulltext): Size matters: Word count as a measure of quality on Wikipedia
Download: http://dx.doi.org/10.1145/1367497.1367673
Tagged: Computer Science (RSS) Wikipedia (RSS), information quality (RSS), word count (RSS)

Summary

Bloomenstock's poster and extended abstract is among a series of attempts to measure the quality of Wikipedia articles. While previous work including Stvilia et al.'s Information quality discussions in Wikipedia and Zeng et al.'s Computing trust from revision history were both rather complicated, Bloomenstock simply tries to measure the quality of articles by the number of words in the article. Bloomenstock uses all articles marked as featured and compared to them a selection of 9513 random articles. He uses 2/3 of the articles for training and 1/3 for testing. By classifying articles with greater than 2,000 words as featured and those with less as random, he achieved 96.3% accuracy. Sophisticated methods only improved on this marginally.

Of course, he assumes featured is a good proxy for quality which (because there are so few featured articles) may be tenuous. As a result, he claims that, "we can only conclude that long articles are featured, and featured articles are long."

Theoretical and Practical Relevance

Measuring quality in Wikipedia is useful; simple metrics like this would be quite useful.