Citation: Daniel Rampage, Susan Dumais, Dan Liebling (2010) Characterizing microblogs with topic models. Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media (RSS)
Internet Archive Scholar (search for fulltext): Characterizing microblogs with topic models
Download: http://aaai.org/ocs/index.php/ICWSM/ICWSM10/paper/view/1528/1846
Tagged: Computer Science (RSS) Twitter (RSS), microblogging (RSS), streams (RSS), filtering (RSS), machine learning (RSS)

Summary

Motivated by a model of user behavior from interviews and user surveys (at Microsoft), this paper argues that better models of tweets would be useful for two major problems Twitter users have: finding new users and topics to follow, and filtering out "noise" in feeds.

To model Tweets, the paper uses machine learning techniques. Training data consists of hashtags, replies, emoticons, @user labels, reply, question and the model is Labeled LDA, an extension of Latent Direichlet Allocation (2003).

Data used was 8,214,019 Twitter posts from one week in November 2009.

Terms ("200 latent dimensions", following the run) were manually labelled by four raters, with the ("4S") dimensions

substance
social
status
style
other

These dimensions first arose in these user interviews. "At the word level, Twitter is 11% substance, 5% status, 16% style, 10% social, and 56% other."

The data is used for two tasks:

ranking posts from a user's current feed
recommending new users to follow

which are tested with users at Microsoft.

Selected References

Blei, D., Ng, A., & Jordan, M. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research
Ramage, D., Hall, D., Nallapati, R., & Manning, C. D. (2009). Labeled LDA: A supervised topic model for credit attribution in multi-label corpora. EMNLP 2009.

Theoretical and Practical Relevance

Labeled LDA, the technique used, could be useful for other studies, and their notion of what data to provide is interesting. The "4S" dimensions could be validated by futher studies.

See a summary and discussion of other papers using LDA on Twitter

This was published in an open access journal.

Characterizing microblogs with topic models

Summary

Selected References

Theoretical and Practical Relevance

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

New

Discussion

Help

Tools