From tweets to polls: Linking text sentiment to public opinion time series

{{Summary
 * title=From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series
 * authors=Brendan O'Connor, Ramnath Balasubramanyan, Bryan R. Routledge, Noah A. Smith
 * url=http://aaai.org/ocs/index.php/ICWSM/ICWSM10/paper/view/1536
 * tags=politics, public opinion, polls, Twitter, sentiment, aggregate sentiment, sentiment analysis, aggegate sentiment analysis, political science, economics
 * summary=This study finds that poll data on consumer confidence and presidential job approvals can be approximated with straightforward sentiment analysis of Twitter data.

Methodology
Sentiment analysis based on keyword searches is compared to poll data, using various window and smoothing options.

Corpus and Data Sources
They use Twitter along with polling data on consumer confidence and two types of political opinion: presidential approval and elections.

Twitter
1 billion Twitter messages posted over the years 2008 and 2009, collected by querying the Twitter API, as well as archiving the “Gardenhose” real-time stream. This comprises a roughly uniform sample of public messages, in the range of 100,000 to 7 million messages per day. (The primary source of variation is growth of Twitter itself; its message volume increased by a factor of 50 over this twoyear time period.)

Consumer Confidence Polls

 * The Index of Consumer Sentiment (ICS) from the Reuters/University of Michigan Surveys of Consumers.
 * The Gallup Organization’s “Economic Conﬁdence” index.

Presidential job approval ratings

 * Gallup’s daily tracking poll for the presidential job approval rating for Barack Obama over the course of 2009

Election Polls

 * Election polls compilation provided by Pollster.com, consisting of 491 data points from 46 different polls.

Methodology
Rather than training a topic-sentiment model for jointly inferring the topic and sentiment of tweets, they get topic from keywords and sentiment from a lexicon.

Their methodology consists of three steps: first they retrieve messages, then they estimate opinions, finally they smooth and compare data.

Message Retrieval
They make three topic subsets, using keywords as follows:
 * consumer confidence keywords: economy, job, jobs
 * presidential approval keywords: obama
 * elections keywords: obama, mccain

Each topic subset contains about .1-.5% of all messages on a day. Terms occur more frequently on weekdays and in the middle of the week, as opposed to weekends.

The size of the datasets vary substantially with time since Twitter was growing quickly at this time: early datasets may be 100's of messages while by late 2008 most datasets have 1000's of messages.

Opinion Estimation
They count positive and negative messages, where messages can be both positive and negative if they contain both positive and negative words. Words are drawn from OpinionFinder, a word list of 1600 positive and 1200 negative words (see Recognizing contextual polarity in phrase-level sentiment analysis ).

They note that since Twitter messages are short this is similar to counting the positive and negative words on a day.

Sentiment score is the ratio of positive over negative messages on a day.

Since they are looking at aggregate sentiment analysis (more below), they ignore falsely detected sentiment. However, they note that a web-derived lexicon could reduce this problem (The viability of web-dervied polarity lexicons). Other questions include whether news headlines and retweets should be used.

Smoothing Data, Comparing Data
Smoothing out the noise while retaining the signal is a critical issue.

They would like a leading indicator for consumer confidence changes and polls, in part because polls are expensive to conduct.

Thus, they investigate various windows and lag, then.

One issue is that Twitter changed substantially over the 2008-2009 time of data collection. They suggest that future work should investigate vector autoregressions and other techniques for dealing with historical signals. A larger issue, they say, is that poll data is merely a proxy for true sentiment.

Selected References

 * Velikovich, L.; Blair-Goldensohn, S.; Hannan, K.; and McDonald, R. 2010. The viability of web-dervied polarity lexicons. In Proceedings of Human Language Technologies: The 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics.


 * Wilson, T.; Wiebe, J.; and Hoffmann, P. 2005. Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing.
 * relevance=Their smoothing issues (e.g. how to average to see the trends and avoid the noise) are probably of wider interest.

Aggregate sentiment analysis
The idea of aggregate sentiment is particularly interesting--where the errors are treated as noise which is expected to cancel itself out in aggregate. They point to Hopkins and King (2010) to show that standard text analysis techniques are inappropriate for assessing aggregate populations. Further, they provide some evidence from their own experiment: they mention that filtering out "will" (which is treated as positive sentiment despite being a verb sense, since they don't do POS tagging). However, they mention one caution: errors could potentially correlate with information of interest, such as if certain demographic groups might tweet in ways that are harder to analyze.

Their review of prior studies using aggregated text sentiment may be useful elsewhere.

Interesting facts
}}
 * Twitter messages are about 11 words.
 * journal=Fourth International AAAI Conference on Weblogs and Social Media
 * pub_date=2010
 * subject=Computer Science
 * pub_open_access=yes