What's yours and what's mine: Determining intellectual attribution in scientific text
Citation: Simone Teufel, Marc Moens (2000) What's yours and what's mine: Determining intellectual attribution in scientific text. Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora (RSS)
DOI (original publisher): 10.3115/1117794.1117796
Semantic Scholar (metadata): 10.3115/1117794.1117796
Sci-Hub (fulltext): 10.3115/1117794.1117796
Internet Archive Scholar (search for fulltext): What's yours and what's mine: Determining intellectual attribution in scientific text
Download: http://www.cl.cam.ac.uk/~sht25/papers/emnlp00.pdf
Tagged: Computer Science
(RSS) discourse (RSS), argumentation (RSS), rhetorical structure (RSS), rhetorical moves (RSS), argumentative zoning (RSS), summarization (RSS)
Summary
This paper discusses the idea of 'argumentative zones' in research papers. Each zone (background, own, other; aim, textual, contrast, basis) has the same "intellectual attribution" used for a particular global rhetorical purpose.
Distilling and Extending Previous Work
This is a distillation and extension of the authors' previous work: they use the 7 categories previously defined in An annotation scheme for discourse-level argumentation in research articles; Figure 3 provides new example sentences in each category.
The decision tree given (Figure 2) is a slight modification of the one from An annotation scheme for discourse-level argumentation in research articles; the first question asks whether the scientific statements expressed in the sentence being annotated is attributed to the authors (Own work-which subdivides into Aim, Textual, and Own categories), the general field (Background category), or specific other researchers (Other work-which subdivides into Contrast and Basis categories)
They further discuss the features useful for machine learning, first introduced in Discourse-level argumentation in scientific articles: Human and automatic annotation. Specifically they use position, syntactic features like tense and voice, meta-discourse (with agents and actions as well as citations). They also consider the order of zones. To the existing features they add when there is no explicit attribution, they assume that the previous sentence's attribution holds.
Detecting Agents and Actions
The pattern matching work of Paice 1981 (e.g. "the aim/purpose of this paper/article/study") is extended; two lexicons are used to avoid polysemy problems without the use of word-sense disambiguation. For instance, the agent lexicon has 168 patterns in 13 classes (e.g. THEM_PRONOUN_AGENT, TEXTSTRUCTURE_AGENT), where each pattern may have multiple matches (e.g. paper, article, study, or chapter for @WORK_NOUN). The action lexicon has 366 verbs in 20 classes (e.g. NEED, SIMILAR, TEXTSTRUCTURE).
The algorithm for detecting agents and actions is as follows:
- Start from the first finite verb in the sentence
- Find the main "semantic" verb (Do this by checking the right context of the finite verb to find more complex tenses: stay within assumed clause boundaries and do not cross commas or other finite verbs.) Determine its lemma, tense, and voice.
- Look up the semantic verb (as a lemma) in the Axion Lexicon to find the corresponding Action Class. Otherwise return Action 0.
- "Determine if one of 32 fixed negation words contained in the lexicon is present within a fixed window of 6 to the right of the finite verb."
- Search for the agent. Depending on the voice from step 2, either look for a by-PP to the right or as a subject-NP to the left. Stay within assumed clause boundaries and do not cross commas or other finite verbs
- Return the Agent Type, if found, otherwise return Agent 0.
- Repeat the above steps until there are no more finite verbs left.
Evaluation
Two evaluations were conducted, first to determine the precision and recall of pattern matching (with quite strong results) and second to test how well agent and action recognition helps perform automatic argumentative zoning. Aim and Textual categories are easiest to recognize both for humans and for the algorithm. Since the machine performance lags human performance, the authors suggest further work on both making enlarging the training data the set of features more distinctive.
Selected References
- Paice 1981 The automatic generation of literary abstracts: An approach based on the identification of self-indicating phrases. In Robert Norman Oddy, Stephen E. Robertson, Cornelis Joost van Rijsbergen, and P.W. Williams, eds. Information Retrieval Research, 172-191. London, UK: Butterworth.
- M.E. Wellons and G.P. Purcell. 1999. Task-specific extracts for using the medical literature. In Proceedings of the American Medical Informatics Symposium. 1004-1008.
- This is part of the 'argumentation zoning' project of Simone Teufel's thesis. See also An annotation scheme for discourse-level argumentation in research articles and Discourse-level argumentation in scientific articles: Human and automatic annotation, which this work extends.
Theoretical and Practical Relevance
In addition to improving improving summarization (see also Wellons and Purcell 1999), they argue that argumentative zoning can improve citation indexes, by indicating *how* references are used, rather than merely counting the number of references. The authors take up that issue in later work: Argumentative Zoning for improved citation indexing.