GroupLens: an open architecture for collaborative filtering of netnews
Citation: Paul Resnick, Neophytos Iacovou, Mitesh Suchak, Peter Bergstrom, John Riedl (1994) GroupLens: an open architecture for collaborative filtering of netnews. Proceeding CSCW '94 Proceedings of the 1994 ACM conference on Computer supported cooperative work Pages 175-186 (RSS)
They introduce their collaborative filtering system by discussing the setting in which they make an intervention. Usenet news groups were a big deal, over 100MB in traffic per day, with up to 140,000 participants! They were very popular and were becoming important for computing professionals and for academics.
Yet there was a lot of junk on the newsgroups and many found it difficult to receive feedback on what they were sharing. Existing mechanisms for managing the volume of news included the bifurcating tree structure of usenet groups and moderation that filtered out articles that the moderator thought were not appropriate. Newsreaders collapsed articles allowing readers to select titles, and collapse threads so only the top comments are shown. They also allowed readers to blacklist subjects and authors. Some even provided rudimentary search.
Group lens provides a social mechanism for predicting what articles people will like using collaborative filtering.
They situate their system in broader work on information filtering. There are 3 categories of filtering techniques: cognitive, social, and economic (Malone et al.). Content blacklists and search are simple cognitive filters; they look at the content of the documents. Cognitive filters could be made more complex by taking user feedback into account and applying machine learning. Social filtering is like the author blacklist. Collaborative filtering is an advanced kind of social filtering. Social filtering as an advantage because humans are good at reading, understanding, and judging text. Moderation is a primitive form of collaborative filtering.
Tapestry was a prior collaborative filtering system. It accepts evaluations from many people to socially filter news, but it was monolithic and did not aggregate ratings into personalized recommendations.
Economic filtering recommend articles based on the costs and benefits.
Design of Grouplens
They built Grouplens: a system consisting of Unix and Macintosh clients and servers they call "Better Bit Bureaus."
Their design goals are openness - to allow any Usenet client to participate in Grouplens and the creation of alternative BBBs; Ease of User; Compatibility with Usenet; Scalability, and Privacy.
Usenet is a distributed system, but articles have global identifiers which they can use to track articles.
The BBB servers share ratings with each other to afford scalability to many sites.
There are some lovely diagrams of the netnews architecture and its augmentation by Grouplens.
There are a lot of alternative designs for collaborative filtering, they chose their design because they realized that a group larger than 7 people would probably be required to provision good ratings, but at larger groups sizes people's knowledge of others in the group would be low and anonymity would be more desirable.
They modified three existing news readers with a rating box where users would rate articles from good to bad on a 1-5 scale.
Grouplens reuses the Usenet protocol by publishing ratings to a special purpose newsgroup. This way the ratings get propagated and are available to clients.
How does Grouplens turn ratings into recommendations? BBB models ratings using matrix completion. Their algorithm for matrix completion is based on reinforcement learning, regression, and pairwise correlation coefficients. The method is actually pretty simple. They just take the average of all the ratings weighted by the users correlation with the rater. The assumption is that people who agreed in the past are likely to agree again, which is a pretty bad assumption but leads to reasonable predictions.
They put a lot of effort into modifying each of the usenet clients to conform to the look and feel and normal interaction motifs of the editor. The users appreciated this a lot. It is pretty cool to be able to integrate with multiple ecosystems like this. They also have a cute explanation of threads and triangle collapse interfaces.
The also discuss scale issues. How well does GroupLens predict, how fast is it? How much network traffic does it use? How do the BBB machines scale? It turns out that computing the ratings scales super linearly so running the BBB machines could get more expensive. They provide a lot of technical details about how they optimize this or limit the number of ratings to be considered. If they had 1,000,000 users they would need 10GB of network traffic per day! This is a pretty fair amount today. At the time it was enormous!
In the section called "Ongoing Experimentation" they describe experimental installations they are using to iterate on and refine their system. They are going to evaluate the systems scaling performance and how well it can predict missing entries from the ratings matrix. They use participants in these installations to collect data in order to evaluate rating prediction. They suggest that future BBBs might improve upon their current design by performing a 'combination of content filtering and collaborative filtering.'
They imagine how GroupLens could change how people consume news on the net. They suggest that GroupLens could be a sort of distributed moderation that improves the quality of articles on newsgroups and reduce the need for moderators. They may also reduce the need for "splits" that break news down into smaller categories. GroupLens might also reduce the need for blacklists. This is interesting because while collaborative filtering systems have been adopted, they often eschew the personalization component (reddit collaborative filtering produces an overall popularity rank that is not personalized.) Their influential vision remains only partly realized.
Next they consider why individuals would participate in contributing to group lens. They suggest that users may feel altruism or guilt in order to "do their share" of rating, but the suppose that ratings will be under-provisioned. This is what happened in one of their pilot tests where the volume of posts was too high. They consider providing external incentives.
This is a bit puzzling since it seems like if users understand the system then they will want to rate articles in order to keep the algorithm trained.
In conclusion they summarize their architecture and approach, with emphasis on the open architecture that allows others to create new clients and BBBs.
Theoretical and practical relevance:
This seminal work on collaborative filtering presents Grouplens, a system for crowdsourcing ratings of news articles. The hugely influential system laid the foundation for recommendation engines based on correlating user behaviors. The work anticipates the rise of the systems that aggregate small contributions like rating an article into information about quality, decisions about what content to recommend, and how such systems may structure the ways that humans interact in computer mediated sites.