A microtext corpus for persuasion detection in dialog
Citation: Joel Young, Craig Martell, Parnav Anand, Pedro Ortiz, Henry Tucker Gilbert IV (2011) A microtext corpus for persuasion detection in dialog. Analyzing Microtext: Papers from the 2011 AAAI Workshop (RSS)
Internet Archive Scholar (search for fulltext): A microtext corpus for persuasion detection in dialog
Tagged: Computer Science
(RSS) persuasion (RSS), argumentation mining (RSS), dialogue (RSS), corpora (RSS), microtexts (RSS), transcripts of spoken dialogue (RSS), hostage negotiation (RSS)
Summary
This paper presents a microtext corpus derived from hostage negotiation transcripts. This source was chosen for its availability and its density of persuasion: Traditional microtext sources (Twitter, SMS, chat rooms) showed "limited occurrences of directly persuasive attempts". Even the negotiation transcripts showed fewer than 12% persuasive utterances.
They definie persuasion as "the ability of one party to convince another party to act or believe in some desired way". Cialdini's persuasion model was used, focusing on:
- Reciprocity
- Commitment and Consistency
- Scarcity
- Liking
- Authority
- Social Proof
Corpus
4 sets of transcripts with 18,847 utterances, drawn from police and FBI.
Annotation
In the initial process, they found persuasive annotations that didn't fit into a category. 9 Categories (the above 7 (splitting "commitment" and "consistency" into two separate categories), "other", and "no persuasion".
Their codebook is shown on the second and third pages.
Supervised Machine Learning
Gappy word bigrams (GWB), which pair two words at a given distance from each other, are used (Bikel and Sorensen 2007), along with orthogonal sparse bigrams (OSBs), which preserve distinctions between words AND map to the gappy bigram. A gap of four words was used and there was no stemming.
GWBs collapse e.g. "the purple dog" and "the big purple dog" to "the dog"; yet meanwhile OSBs collapse these to "the dog 1" and "the dog 2".
Algorithms
Naive Bayes with add-one smoothing, Maximum Entropy, and Vector_Machine SVMs were used. Results were evaluated with precision, recall, and F-score.
Results
The unigrams and bigrams most and least predictive of persuasion are shown in Table 4:
Most predictive unigrams
- sincere
- honorable
- answers
- clubs
- legitimate
- abord
- guaranteed
- bout
- trusting
- cooperate
Most predictive bigrams
- your friends
- that gun
- I guarantee
- your family
- your cells
- get all
- your safety
- good job
- what I'd
- gun down
Least Predictive Unigrams
- jesus
- thanks
- shall
- huh
- seals
- hello
- hi
- christ
- bye
- hum
Least Predictive Bigrams
- yeah I'm
- me in
- hang up
- name is
- I tried
- mm hm
- my wife
- of God
- you doing
- um hum
Selected References
Persuasion Model
- R. Cialdini, Influence: The psychology of persuasion. New York, NY: Collins, 2007.
Data Sources
- Rogan, R., and Hammer, M. 2002. Crisis/hostage negotiations: A communication-based approach. In Giles, H., ed., Law Enforcement, Communication, and Community, 229–254. Philadelphia: Benjamins.
- Taylor, P., and Thomas, S. 2008. Linguistic style matching and negotiation outcome. Negotiation and Conflict Management Research 1:263–281.
- Waco, TX transcripts
- San Diego Police Negotiation
"Side Detection"
- Thomas, M.; Pang, B.; and Lee, L. 2006. Get out the vote: Determining support or opposition from congressional floor-debate transcripts. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, EMNLP ’06, 327–335.
- (Lin et al. 2006)--political opinion pieces
- (Somasundaran and Wiebe 2010)--online forums
SMS Work
- Cormack, G. V.; Go ́mez Hidalgo, J. M.; and Sa ́nz, E. P. 2007. Spam filtering for short messages. In CIKM ’07: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, 313–320. Lisbon, Portugal: ACM.
Text summarization
- Hearst, M. 1997. Texttiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics 23(1):33–64.
Theoretical and Practical Relevance
The Naval Postgraduate School Persuasion Corpus is available on request from the authors.
This work demonstrates the difficulty of the problem and suggests some possible future approaches.