Discourse-level argumentation in scientific articles: Human and automatic annotation

{{Summary
 * title=Discourse-level argumentation in scientific articles: Human and automatic annotation
 * authors=Simone Teufel, Marc Moens
 * url=http://www.cl.cam.ac.uk/~sht25/papers/acl99.pdf
 * tags=discourse, argumentation, rhetorical structure, rhetorical moves, argumentative zoning, summarization
 * summary=This paper is closely related to An annotation scheme for discourse-level argumentation in research articles; annotation evidence from that paper (which is contemporaneous research by the same authors) is used to underscore the main argument of this paper: Machine annotation of discourse-level argumentation is as reliable as that of human annotators with no prior training.

The two papers share some discussions, particularly of the annotation scheme and human annotation. This paper provides further details. The bulk of the new material in this paper is found in the discussions of the experiments on automatic annotation, which are based around the aforementioned human annotations of 80 papers (around 12,000 sentences).

Human annotation
Beyond the discussion in An annotation scheme for discourse-level argumentation in research articles, here the authors also report on experiments with annotating only selections from the paper (e.g. the abstract and introduction), rather than the entire paper.

One major result that is highlighted is that annotators are good at determining AIM sentences, which provide the best summaries of the paper, and compress to 1.8% of the original length. Clear instructions and very specific guidelines for ambiguous cases help the annotator, especially compared to previous summarization work (e.g. Rath 1961) that (somewhat unsuccessfully) just asked for "relevant" sentences to be selected.

Automatic annotation
The automatic annotation is based on supervised learning, building on Kupiec 1995 (which uses estimates of the probability that a sentence is contained in the abstract), but revising it to consider the probability that a sentence has a particular rhetorical role.

Figure 8 shows the features for supervised learning. Each sentence is considered by:

Explicit structure

 * Type of headline
 * Relative position within the paragraph
 * Relative position with the section

Relative location

 * Relative location (1st-10th segment of the paper)

Syntactic features
These features rely on the first finite verb in the sentence.


 * Tense
 * Modal auxiliaries
 * Voice
 * Negation

Semantic features
These features use template matching.


 * Action type of the first verb in the sentence (see Figure 9 -- e.g. 'comparison', 'better solution', future interest')
 * Type of Agent (Authors, Others, Nothing)
 * Type of formulaic expression (see Figure 9 -- e.g. 'general agent', 'previous context', 'textstructure')

Content features

 * Whether the sentence contains keywords (according to td/idf)
 * Whether the sentence contains words also occurring in the title or headlines

Of these, location, type of header, citations, and semantic classes are the strongest predictors.

Selected References
}}
 * Julian Kupiec, Jan O. Pedersen, and Francine Chen. 1995. A trainable document summarizer. In Proceedings of the 18th ACM-SIGIR Conference, pages 68-73.
 * G.J. Rath, A. Resnick, and T. R. Savage. 1961. The formation of abstracts by the selection of sentences. American Documentation 12(2):139-143.
 * This is part of the 'argumentation zoning' project of Simone Teufel's thesis.
 * journal=Towards Standards and Tools for Discourse Tagging, Workshop at ACL 1999
 * pub_date=1999
 * subject=Computer Science