What's yours and what's mine: Determining intellectual attribution in scientific text

From AcaWiki
Jump to: navigation, search

Citation: Simone Teufel, Marc Moens (2000) What's yours and what's mine: Determining intellectual attribution in scientific text. Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora (RSS)
DOI (original publisher): 10.3115/1117794.1117796
Semantic Scholar (metadata): 10.3115/1117794.1117796
Sci-Hub (fulltext): 10.3115/1117794.1117796
Internet Archive Scholar (search for fulltext): What's yours and what's mine: Determining intellectual attribution in scientific text
Download: http://www.cl.cam.ac.uk/~sht25/papers/emnlp00.pdf
Tagged: Computer Science (RSS) discourse (RSS), argumentation (RSS), rhetorical structure (RSS), rhetorical moves (RSS), argumentative zoning (RSS), summarization (RSS)

Summary

This paper discusses the idea of 'argumentative zones' in research papers. Each zone (background, own, other; aim, textual, contrast, basis) has the same "intellectual attribution" used for a particular global rhetorical purpose.

Distilling and Extending Previous Work

This is a distillation and extension of the authors' previous work: they use the 7 categories previously defined in An annotation scheme for discourse-level argumentation in research articles; Figure 3 provides new example sentences in each category.

The decision tree given (Figure 2) is a slight modification of the one from An annotation scheme for discourse-level argumentation in research articles; the first question asks whether the scientific statements expressed in the sentence being annotated is attributed to the authors (Own work-which subdivides into Aim, Textual, and Own categories), the general field (Background category), or specific other researchers (Other work-which subdivides into Contrast and Basis categories)

They further discuss the features useful for machine learning, first introduced in Discourse-level argumentation in scientific articles: Human and automatic annotation. Specifically they use position, syntactic features like tense and voice, meta-discourse (with agents and actions as well as citations). They also consider the order of zones. To the existing features they add when there is no explicit attribution, they assume that the previous sentence's attribution holds.

Detecting Agents and Actions

The pattern matching work of Paice 1981 (e.g. "the aim/purpose of this paper/article/study") is extended; two lexicons are used to avoid polysemy problems without the use of word-sense disambiguation. For instance, the agent lexicon has 168 patterns in 13 classes (e.g. THEM_PRONOUN_AGENT, TEXTSTRUCTURE_AGENT), where each pattern may have multiple matches (e.g. paper, article, study, or chapter for @WORK_NOUN). The action lexicon has 366 verbs in 20 classes (e.g. NEED, SIMILAR, TEXTSTRUCTURE).

The algorithm for detecting agents and actions is as follows:

  1. Start from the first finite verb in the sentence
  2. Find the main "semantic" verb (Do this by checking the right context of the finite verb to find more complex tenses: stay within assumed clause boundaries and do not cross commas or other finite verbs.) Determine its lemma, tense, and voice.
  3. Look up the semantic verb (as a lemma) in the Axion Lexicon to find the corresponding Action Class. Otherwise return Action 0.
  4. "Determine if one of 32 fixed negation words contained in the lexicon is present within a fixed window of 6 to the right of the finite verb."
  5. Search for the agent. Depending on the voice from step 2, either look for a by-PP to the right or as a subject-NP to the left. Stay within assumed clause boundaries and do not cross commas or other finite verbs
  6. Return the Agent Type, if found, otherwise return Agent 0.
  7. Repeat the above steps until there are no more finite verbs left.

Evaluation

Two evaluations were conducted, first to determine the precision and recall of pattern matching (with quite strong results) and second to test how well agent and action recognition helps perform automatic argumentative zoning. Aim and Textual categories are easiest to recognize both for humans and for the algorithm. Since the machine performance lags human performance, the authors suggest further work on both making enlarging the training data the set of features more distinctive.

Selected References

Theoretical and Practical Relevance

In addition to improving improving summarization (see also Wellons and Purcell 1999), they argue that argumentative zoning can improve citation indexes, by indicating *how* references are used, rather than merely counting the number of references. The authors take up that issue in later work: Argumentative Zoning for improved citation indexing.