Discourse-level argumentation in scientific articles: Human and automatic annotation

From AcaWiki

Jump to: navigation, search


Citation: Simone Teufel, Marc Moens (1999) Discourse-level argumentation in scientific articles: Human and automatic annotation. Towards Standards and Tools for Discourse Tagging, Workshop at ACL 1999 (RSS)


Download: http://www.cl.cam.ac.uk/~sht25/papers/acl99.pdf

Tagged: Computer Science (RSS) discourse (RSS), argumentation (RSS), rhetorical structure (RSS), rhetorical moves (RSS), argumentative zoning (RSS), summarization (RSS)


Summary:

This paper is closely related to An annotation scheme for discourse-level argumentation in research articles; annotation evidence from that paper (which is contemporaneous research by the same authors) is used to underscore the main argument of this paper: Machine annotation of discourse-level argumentation is as reliable as that of human annotators with no prior training.

The two papers share some discussions, particularly of the annotation scheme and human annotation. This paper provides further details. The bulk of the new material in this paper is found in the discussions of the experiments on automatic annotation, which are based around the aforementioned human annotations of 80 papers (around 12,000 sentences).

Human annotation

Beyond the discussion in An annotation scheme for discourse-level argumentation in research articles, here the authors also report on experiments with annotating only selections from the paper (e.g. the abstract and introduction), rather than the entire paper.

One major result that is highlighted is that annotators are good at determining AIM sentences, which provide the best summaries of the paper, and compress to 1.8% of the original length. Clear instructions and very specific guidelines for ambiguous cases help the annotator, especially compared to previous summarization work (e.g. Rath 1961) that (somewhat unsuccessfully) just asked for "relevant" sentences to be selected.

Automatic annotation

The automatic annotation is based on supervised learning, building on Kupiec 1995 (which uses estimates of the probability that a sentence is contained in the abstract), but revising it to consider the probability that a sentence has a particular rhetorical role.

Figure 8 shows the features for supervised learning. Each sentence is considered by:

Explicit structure

Relative location

Citations

Syntactic features

These features rely on the first finite verb in the sentence.

Semantic features

These features use template matching.

Content features

Of these, location, type of header, citations, and semantic classes are the strongest predictors.

Selected References




Personal tools
Namespaces
Variants
Actions
Navigation
New
Tools
Discussion
Help
Toolbox