Cats rule and dogs drool!: Classifying stance in online debate

From AcaWiki
Jump to: navigation, search

Citation: Pranav Anand, Marilyn Walker, Rob Abbott, Jean E. Fox Tree, Robeson Bowmani, and Michael Minor (2011) Cats rule and dogs drool!: Classifying stance in online debate. Proceedings of the 2n Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (RSS)
Internet Archive Scholar (search for fulltext): Cats rule and dogs drool!: Classifying stance in online debate
Tagged: Computer Science (RSS) online argumentation (RSS), NLP (RSS), stance (RSS), disagreement (RSS), Mechanical Turk (RSS), natural language processing (RSS), discourse analysis (RSS), cue words (RSS)

Summary

Investigates the problem of classifying stance using 1113 two-sided debates for 12 topics, from the debate website Convinceme.com.

They note that there are 3 dialogue structure elements at Convinceme.com:

  1. the side (e.g. pro or con)
  2. explicit rebuttal links
  3. temporal context/state of the debate at a particular time.

They identify rebuttals with 63% accuracy, and classify the side of debates by 54-69% using lexical and contextual features (compared to 49-60% for a unigram baseline).

Human annotators, however, can classify the side of debates 73% of the time (rebuttals) and 87% of the time (non-rebuttals). Posts were difficult to classify in some cases, e.g. short comments and ad hominem responses (39% of those where only 4-6 of the 9 annotators were correct); ambiguous and out-of-context comments (17%); and meta-debate comments (10%).

They distinguish ideological and non-ideological topics. Ideological topics have:

  • more posts per author
  • more rebuttals per topic
  • more context-dependence

Yet post length is not correlated with these.

Rebuttals have

  • more "markers of dialogic interaction"
  • more pronouns (you, that, it)
  • more ellipsis
  • more dialogic cue words

Human Annotation

Mechanical Turk was used to get human annotations to judge what side a post was on, without context. They present some interesting results about which posts are harder and easier to place.

Machine Learning

They used the Weka toolkit and two classifiers: NaiveBayes and JRip with the following feature sets:

  • Post info (IsRebuttal, Poster)
  • Unigrams
  • Bigrams
  • Cue words (initial unigram, bigram, and trigram)
  • Repeated punctuation (collapsed into ??, !!, ?!)
  • LIWC measures and frequences
  • Dependencies derived from the Stanford parser
  • Generalized dependencies (POS of the head word, opinion polarity of both words)
  • Context features ("matching features use for the post from the parent post")

See also

An extended journal version of this paper was submitted as That’s your evidence?: Classifying stance in online political debate

Theoretical and Practical Relevance

Their corpus is available!

May contribute to long-term goals; they suggest that their work is motivated by

  1. Automatic summarization
  2. Understanding persuasiveness
  3. "Identifying the linguistic reflexes of perlocutionary acts" (e.g. persuasion, disagreement)

Significant review of related work, embedded in the paper.