Mining the peanut gallery: Opinion extraction and semantic classification of product reviews

{{Summary
 * title=Mining the peanut gallery: Opinion extraction and semantic classification of product reviews
 * authors=Dave Kushal and Steve Lawrence and David M. Pennock
 * url=http://doi.acm.org/10.1145/775152.775226
 * tags=opinion mining, sentiment analysis, product reviews, machine learning, information extraction
 * summary=This 2003 paper provides a useful guide to related work in several areas:
 * objectivity/subjectivity classification
 * word classification (e.g. 'textual conjunctions like "fair and legitimate" or "simplistic but well-received" to separate similarly- and oppositely-connoted words.' Predicting the semantic orientation of adjectives).
 * sentiment classification
 * recommendations
 * commercial products

They compare information retrieval approaches with machine learning.

Substitutions
They try several kinds of substitutions: Overgeneralization seems to cause many problems with the substitutions chosen.
 * number and category (e.g. replacing the product names with a generic variable)
 * linguistic substitutions using Wordnet colocations
 * Porter's stemming
 * negatives
 * N-grams and proximity
 * substrings

Outcomes
Then they count:
 * how many times each term occurs
 * how many documents each term occurs in
 * how many categories a term occurs in
 * how many categories a term occurs in

They smooth, score the reviews (trying various machine learning algorithms), reweight. Now they can classify new documents based on the feature vectors of these documents. They detail further experiments, such as scaling the feature records.

They present a system called ReviewSeer that collects product mentions from search engines. mining particular products and groups these into categories and give assessments.

As an initial corpus, they select and manually tag 600 sentences (200 for each of 3 products). Many sentences are ambiguous out of context, do not express an opinion, or do not describe the product. They conclude that it is important to first find "coherent, topical opinions".

They also present conclusions and ideas for future work.

Selected References
}}
 * Vasileios Hatzivassiloglou and Kathleen R. McKeown. Predicting the semantic orientation of adjectives. In Proceedings of the 35th Annual Meeting of ACL, 1997.
 * journal=Proceedings of the 12th international conference on World Wide Web
 * pub_date=2003
 * doi=10.1145/775152.775226
 * subject=Computer Science