Cats rule and dogs drool!: Classifying stance in online debate
From AcaWiki
Citation: Pranav Anand, Marilyn Walker, Rob Abbott, Jean E. Fox Tree, Robeson Bowmani, and Michael Minor (2011) Cats rule and dogs drool!: Classifying stance in online debate. Proceedings of the 2n Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (RSS)
Tagged: Computer Science (RSS) online argumentation (RSS), NLP (RSS), stance (RSS), disagreement (RSS), Mechanical Turk (RSS), natural language processing (RSS), discourse analysis (RSS), cue words (RSS)
Summary:
Investigates the problem of classifying stance using 1113 two-sided debates for 12 topics, from the debate website Convinceme.com.
They note that there are 3 dialogue structure elements at Convinceme.com:
- the side (e.g. pro or con)
- explicit rebuttal links
- temporal context/state of the debate at a particular time.
They identify rebuttals with 63% accuracy, and classify the side of debates by 54-69% using lexical and contextual features (compared to 49-60% for a unigram baseline).
Human annotators, however, can classify the side of debates 73% of the time (rebuttals) and 87% of the time (non-rebuttals). Posts were difficult to classify in some cases, e.g. short comments and ad hominem responses (39% of those where only 4-6 of the 9 annotators were correct); ambiguous and out-of-context comments (17%); and meta-debate comments (10%).
They distinguish ideological and non-ideological topics. Ideological topics have:
- more posts per author
- more rebuttals per topic
- more context-dependence
Yet post length is not correlated with these.
Rebuttals have
- more "markers of dialogic interaction"
- more pronouns (you, that, it)
- more ellipsis
- more dialogic cue words
Human Annotation
Mechanical Turk was used to get human annotations to judge what side a post was on, without context. They present some interesting results about which posts are harder and easier to place.
Machine Learning
They used the Weka toolkit and two classifiers: NaiveBayes and JRip with the following feature sets:
- Post info (IsRebuttal, Poster)
- Unigrams
- Bigrams
- Cue words (initial unigram, bigram, and trigram)
- Repeated punctuation (collapsed into ??, !!, ?!)
- LIWC measures and frequences
- Dependencies derived from the Stanford parser
- Generalized dependencies (POS of the head word, opinion polarity of both words)
- Context features ("matching features use for the post from the parent post")
See also
An extended journal version of this paper was submitted as That’s your evidence?: Classifying stance in online political debate
Theoretical and practical relevance:
Their corpus is available!
May contribute to long-term goals; they suggest that their work is motivated by
- Automatic summarization
- Understanding persuasiveness
- "Identifying the linguistic reflexes of perlocutionary acts" (e.g. persuasion, disagreement)
Significant review of related work, embedded in the paper.