A microtext corpus for persuasion detection in dialog

From AcaWiki
Jump to: navigation, search

Citation: Joel Young, Craig Martell, Parnav Anand, Pedro Ortiz, Henry Tucker Gilbert IV (2011) A microtext corpus for persuasion detection in dialog. Analyzing Microtext: Papers from the 2011 AAAI Workshop (RSS)
Internet Archive Scholar (search for fulltext): A microtext corpus for persuasion detection in dialog
Tagged: Computer Science (RSS) persuasion (RSS), argumentation mining (RSS), dialogue (RSS), corpora (RSS), microtexts (RSS), transcripts of spoken dialogue (RSS), hostage negotiation (RSS)

Summary

This paper presents a microtext corpus derived from hostage negotiation transcripts. This source was chosen for its availability and its density of persuasion: Traditional microtext sources (Twitter, SMS, chat rooms) showed "limited occurrences of directly persuasive attempts". Even the negotiation transcripts showed fewer than 12% persuasive utterances.

They definie persuasion as "the ability of one party to convince another party to act or believe in some desired way". Cialdini's persuasion model was used, focusing on:

  1. Reciprocity
  2. Commitment and Consistency
  3. Scarcity
  4. Liking
  5. Authority
  6. Social Proof

Corpus

4 sets of transcripts with 18,847 utterances, drawn from police and FBI.

Annotation

In the initial process, they found persuasive annotations that didn't fit into a category. 9 Categories (the above 7 (splitting "commitment" and "consistency" into two separate categories), "other", and "no persuasion".

Their codebook is shown on the second and third pages.

Supervised Machine Learning

Gappy word bigrams (GWB), which pair two words at a given distance from each other, are used (Bikel and Sorensen 2007), along with orthogonal sparse bigrams (OSBs), which preserve distinctions between words AND map to the gappy bigram. A gap of four words was used and there was no stemming.

GWBs collapse e.g. "the purple dog" and "the big purple dog" to "the dog"; yet meanwhile OSBs collapse these to "the dog 1" and "the dog 2".

Algorithms

Naive Bayes with add-one smoothing, Maximum Entropy, and Vector_Machine SVMs were used. Results were evaluated with precision, recall, and F-score.

Results

The unigrams and bigrams most and least predictive of persuasion are shown in Table 4:

Most predictive unigrams

  • sincere
  • honorable
  • answers
  • clubs
  • legitimate
  • abord
  • guaranteed
  • bout
  • trusting
  • cooperate

Most predictive bigrams

  • your friends
  • that gun
  • I guarantee
  • your family
  • your cells
  • get all
  • your safety
  • good job
  • what I'd
  • gun down

Least Predictive Unigrams

  • jesus
  • thanks
  • shall
  • huh
  • seals
  • hello
  • hi
  • christ
  • bye
  • hum

Least Predictive Bigrams

  • yeah I'm
  • me in
  • hang up
  • name is
  • I tried
  • mm hm
  • my wife
  • of God
  • you doing
  • um hum

Selected References

Persuasion Model

Data Sources

"Side Detection"

  • (Lin et al. 2006)--political opinion pieces
  • (Somasundaran and Wiebe 2010)--online forums

SMS Work

  • Cormack, G. V.; Go ́mez Hidalgo, J. M.; and Sa ́nz, E. P. 2007. Spam filtering for short messages. In CIKM ’07: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, 313–320. Lisbon, Portugal: ACM.

Text summarization

Theoretical and Practical Relevance

The Naval Postgraduate School Persuasion Corpus is available on request from the authors.

This work demonstrates the difficulty of the problem and suggests some possible future approaches.