Citation: Simone Teufel, Marc Moens (1999) Discourse-level argumentation in scientific articles: Human and automatic annotation. Towards Standards and Tools for Discourse Tagging, Workshop at ACL 1999 (RSS)
Internet Archive Scholar (search for fulltext): Discourse-level argumentation in scientific articles: Human and automatic annotation
Download: http://www.cl.cam.ac.uk/~sht25/papers/acl99.pdf
Tagged: Computer Science (RSS) discourse (RSS), argumentation (RSS), rhetorical structure (RSS), rhetorical moves (RSS), argumentative zoning (RSS), summarization (RSS)

Summary

This paper is closely related to An annotation scheme for discourse-level argumentation in research articles; annotation evidence from that paper (which is contemporaneous research by the same authors) is used to underscore the main argument of this paper: Machine annotation of discourse-level argumentation is as reliable as that of human annotators with no prior training.

The two papers share some discussions, particularly of the annotation scheme and human annotation. This paper provides further details. The bulk of the new material in this paper is found in the discussions of the experiments on automatic annotation, which are based around the aforementioned human annotations of 80 papers (around 12,000 sentences).

Human annotation

Beyond the discussion in An annotation scheme for discourse-level argumentation in research articles, here the authors also report on experiments with annotating only selections from the paper (e.g. the abstract and introduction), rather than the entire paper.

One major result that is highlighted is that annotators are good at determining AIM sentences, which provide the best summaries of the paper, and compress to 1.8% of the original length. Clear instructions and very specific guidelines for ambiguous cases help the annotator, especially compared to previous summarization work (e.g. Rath 1961) that (somewhat unsuccessfully) just asked for "relevant" sentences to be selected.

Automatic annotation

The automatic annotation is based on supervised learning, building on Kupiec 1995 (which uses estimates of the probability that a sentence is contained in the abstract), but revising it to consider the probability that a sentence has a particular rhetorical role.

Figure 8 shows the features for supervised learning. Each sentence is considered by:

Explicit structure

Type of headline
Relative position within the paragraph
Relative position with the section

Relative location

Relative location (1st-10th segment of the paper)

Citations

Whether the sentence contains a citation or the name of an author in the reference list
Whether the sentence contains a self-citation

Syntactic features

These features rely on the first finite verb in the sentence.

Tense
Modal auxiliaries
Voice
Negation

Semantic features

These features use template matching.

Action type of the first verb in the sentence (see Figure 9 -- e.g. 'comparison', 'better solution', future interest')
Type of Agent (Authors, Others, Nothing)
Type of formulaic expression (see Figure 9 -- e.g. 'general agent', 'previous context', 'textstructure')

Content features

Whether the sentence contains keywords (according to td/idf)
Whether the sentence contains words also occurring in the title or headlines

Of these, location, type of header, citations, and semantic classes are the strongest predictors.

Selected References

Julian Kupiec, Jan O. Pedersen, and Francine Chen. 1995. A trainable document summarizer. In Proceedings of the 18th ACM-SIGIR Conference, pages 68-73.
G.J. Rath, A. Resnick, and T. R. Savage. 1961. The formation of abstracts by the selection of sentences. American Documentation 12(2):139-143.
This is part of the 'argumentation zoning' project of Simone Teufel's thesis.

Discourse-level argumentation in scientific articles: Human and automatic annotation

Summary

Human annotation

Automatic annotation

Explicit structure

Relative location

Citations

Syntactic features

Semantic features

Content features

Selected References

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

New

Discussion

Help

Tools