Cats rule and dogs drool!: Classifying stance in online debate

{{Summary
 * title=Cats rule and dogs drool!: Classifying stance in online debate
 * authors=Pranav Anand, Marilyn Walker, Rob Abbott, Jean E. Fox Tree, Robeson Bowmani, and Michael Minor
 * tags=online argumentation, NLP, stance, disagreement, Mechanical Turk, natural language processing, discourse analysis, cue words
 * summary=Investigates the problem of classifying stance using 1113 two-sided debates for 12 topics, from the debate website Convinceme.com.

They note that there are 3 dialogue structure elements at Convinceme.com:
 * 1) the side (e.g. pro or con)
 * 2) explicit rebuttal links
 * 3) temporal context/state of the debate at a particular time.

They identify rebuttals with 63% accuracy, and classify the side of debates by 54-69% using lexical and contextual features (compared to 49-60% for a unigram baseline).

Human annotators, however, can classify the side of debates 73% of the time (rebuttals) and 87% of the time (non-rebuttals). Posts were difficult to classify in some cases, e.g. short comments and ad hominem responses (39% of those where only 4-6 of the 9 annotators were correct); ambiguous and out-of-context comments (17%); and meta-debate comments (10%).

They distinguish ideological and non-ideological topics. Ideological topics have: Yet post length is not correlated with these.
 * more posts per author
 * more rebuttals per topic
 * more context-dependence

Rebuttals have
 * more "markers of dialogic interaction"
 * more pronouns (you, that, it)
 * more ellipsis
 * more dialogic cue words

Human Annotation
Mechanical Turk was used to get human annotations to judge what side a post was on, without context. They present some interesting results about which posts are harder and easier to place.

Machine Learning
They used the Weka toolkit and two classifiers: NaiveBayes and JRip with the following feature sets:
 * Post info (IsRebuttal, Poster)
 * Unigrams
 * Bigrams
 * Cue words (initial unigram, bigram, and trigram)
 * Repeated punctuation (collapsed into ??, !!, ?!)
 * LIWC measures and frequences
 * Dependencies derived from the Stanford parser
 * Generalized dependencies (POS of the head word, opinion polarity of both words)
 * Context features ("matching features use for the post from the parent post")