An annotation scheme for discourse-level argumentation in research articles

From AcaWiki
Jump to: navigation, search

Citation: Simone Teufel, Jean Carletta, Marc Moens (1999) An annotation scheme for discourse-level argumentation in research articles. Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics (RSS)



Tagged: Computer Science (RSS) annotation (RSS), discourse (RSS), argumentation (RSS), rhetorical structure (RSS), rhetorical moves (RSS), argumentative zoning (RSS)


This paper works towards improved summarization of research articles by building an annotation scheme, based on Swales' 1990 CARS (Creating a Research Space) model, and using a corpus of computational linguistics articles.

Sentence structure is not the only determinant of whether a sentence is appropriate for a summary: argumentative role is very important. The authors give the example that pointing out a weakness in existing work towards the beginning of the paper can be used to set the stage for the paper (and thus such a sentence would provide a good characterization of the content); however, a sentence pointing out a weakness in the work a paper describes, in the future work section, would not provide a good summary of the paper.

Two Annotation Schemes

With attribution and authorship in mind, they propose two schemes, a basic scheme with three elements:

  • Background
  • Other (work described in other papers, including previous work by the same authors)
  • Own (contribution of this paper)

And a full scheme with 7 elements, adding the following four elements to the basic scheme:

  • Aim (main research aims of the paper)
  • Textual (explicit statements of the paper structure)
  • Contrast (comparison and contrast statements)
  • Basis (statements that the contribution is based on this other work)

Three annotation experiments

Using these schemes, the authors carried out three annotation experiments, in order to determine whether the basic and full schemes could be learned by trained users (in the first two experiments), and whether they could be used by minimally trained users (in the third experiment). In fact, in the third experiment, annotations by those who were not formally trained in the domain were reasonably good, comparable to the annotations by the less-savvy experts.

They used Kappa scores to measure both stability (whether a single annotator produces the same classification at different times) and reproducibility (whether multiple annotators produce the same classification).

Selected References

Theoretical and practical relevance:

Variations in reproducibility among paper types point to differences in paper structure. For instance, a review paper was discarded due to the difficulty of annotating it according to this scheme. The authors suggest that the scheme might be a pragmatic way to test the clarity of argumentation in papers.

The authors suggest that shallow information extraction can be useful; besides the success of non-experts in the domain, they point to the usefulness of the physical layout and metatextual terminology as two aids for understanding the arguments.

This paper also has two features of interest to those learning the domain: a nice summary of related work on rhetorically-based annotation and summarization, and a decision tree for annotations (in Figure 2) which might help in preparing guides for similar annotations.