Argumentative Zoning for improved citation indexing
Citation: Simone Teufel Argumentative Zoning for improved citation indexing.
Internet Archive Scholar (search for fulltext): Argumentative Zoning for improved citation indexing
Tagged: Computer Science
(RSS) citations (RSS), citation indexing (RSS), argumentation (RSS), argumentation zoning (RSS), information retrieval (RSS)
Summary
This paper addresses the "citation indexing" task suggested in Task-based evaluation of summary quality: Describing relationships between scientific papers as a possible outcome of argumentative zoning.
Rhetorical citation maps could provide a glanceable summary of how the work in a particular paper relates to the literature it cites. First, contrasts and continuations are distinguished (using grey vs. black lines). Second, "the most important textual sentence about each citation can be displayed" directly in the citation map; this is taken to be an evaluative sentences about the citation.
Since evaluative statements about the citation may appear in neighboring sentences, or in another section, they use machine learning (with a Naive Bayes classifier based on Kupiec 1995) to classify and extract these. The 15 features used are summarized in Figure 5 (see 'features' below); these are related to the features tested in Discourse-level argumentation in scientific articles: Human and automatic annotation but may have been renamed (e.g. "AbsLoc" was previously called "Relative Location"), added (e.g. SentLength - Is the sentence longer than a certain threshold?), or dropped (e.g. verb negation).
Features
- AbsLoc - Absolute location of the sentence (ten document sections)
- SectStruct - Relative and absolute position of sentence within section
- ParaStruct - Relative position of sentence within a paragraph (Initial, Medial, Final)
- Headline - Type of headline of this section (16 classes)
- SentLength - Is the sentence longer than a certain threshold?
- Title Content - Do words from the title or headlines also appear in this sentence?
- TD*IDF Content - Are "significant terms" (by TD*IDF) also in this sentence?
- VerbVoice - Voice of the first finite verb in the sentence (Active, Passive, NoVerb)
- VerbTense - tense of the first finite verb in the sentence - 9 tenses or NoVerb
- VerbModal - Is the first finite verb modified by a modal auxiliary?
- Cit - Does the sentence contain a citation or the name of an author contained in the reference list? [Citation (self), Citation (other), Author Name, or None). Where in the sentence does the citation appear? (Beginning, Middle, End)
- History - Most probable previous category (7 Zone Categories + BEGIN)
- Formulaic - Type of formulaic expression used in the sentence (18 types + 9 agent types or none)
- Agent - Type of Agent (9 Agent Types or None)
- Action - Type of Action, with or without Negation (27 Action Types or None)
The most important features (in order) are: Absolute Sentence Location, Agent, Citations, Headlines, History, Forumulaic, and Action.
Relation to previous work
This work follows from the 'argumentation zoning' project of Simone Teufel's thesis. See also An annotation scheme for discourse-level argumentation in research articles (which introduces the zones), Discourse-level argumentation in scientific articles: Human and automatic annotation, What's yours and what's mine: Determining intellectual attribution in scientific text, and Task-based evaluation of summary quality: Describing relationships between scientific papers.
The argumentative zoning categories first introduced in An annotation scheme for discourse-level argumentation in research articles are given, along with examples of each. The CONTRAST and BASIS categories are used, as a proxy for finer classification schemes in the field of Content Citation Analysis (see Weinstock 1971).
Metadiscourse
This paper gives a useful description of metadiscourse. Based on Myers 1992 the use metadiscourse to mean "the set of expressions that talk about the act of presenting research in a paper, rather than the research itself." Drawing from Swales 1990, who observed that "the argumentation of the paper is rather prototypical" they use the Formulaic feature to collect 1762 phrases and their variations. Likewise, the Agent feature represents the grammatical subjects; often this agent is the one being attributed. The Verb features are also related.
Verbs
"There is a set of verbs that is often used when the overall scientific goal of a paper is defined."("propose, present, report, suggest" -- and to a lesser extent "describe, discuss give, introduce, put forward, show, sketch, state, and talk about").
Verbs are very useful in distinguishing CONTRAST sentences (which use verbs of failure and contrast - see Figs 8 and 9) from CONTINUATION ones (which use verbs of continuation and change - see Figs 6 and 7).
Verbs of continuation: adopt, agree with, base, be based on, be derived from, be originated in, be inspired by, borrow, build on, follow, originate from, originate in, side with
Verbs of change: adapt, adjust, augment, combine, change, decrease, elaborate on, expand, extend, derive, incorporate, increase, manipulate, modify, optimize, refine, render, replace, revise, substitute, tailor, upgrade
Verbs of failure: abound, aggravate, arise, be cursed, be incapable of, be forced to, be limited to, be problematic, be restricted to, be troubled, be unable to, contradict, damage, degrade, degenerate, fail, fall prey, fall short, force oneself, force, hinder, impair, impede, inhibit, lack, misclassify, misjudge, mistake, misuse, neglect, obscure, overestimate, overfit, overgeneralize, overgenerate, overlook, pose, plague, preclude, prevent, resort to, restrain, run into problems, settle for, spoil, suffer from, threaten, thwart, underestimate, undergenerate, violate, waste, worsen
Verbs of contrast: be different from, be distinct from, conflict, contrast, clash, differ from, distinguish oneself, differentiate, disagree, disagreeing, dissent, oppose
Human Annotation
This briefly reviews the annotation work from What's yours and what's mine: Determining intellectual attribution in scientific text but extends it with the citation indexing task in mind.
In this case, annotators needed to identify the citation(s) associated with each evaluative statement (CONTRAST, BASIS). Citations can appear before or after the evaluative statement, so the distance between the evaluative statement and the citation must be indicated as well. Further, each citation may have several statements made about it. Contrastive evaluation statements, for instance, can be several sentences after the citation.
Citation patterns
The authors identified 6 "patterns of citing and author stance statements":
- citation, CONTRAST follows after a few sentences
- citation, BASIS follows after a few sentences
- approach criticized (CONTRAST), then described in following sentences
- citing with no evaluation
- BASIS embedded in a paragraph about OWN work
- CONTRAST embedded in OWN (without repetition of the citation) -- e.g. to contrast results from the work being contrasted
Future Work
The future work section of the paper is detailed, particularly regarding planned evaluations.
Selected References
- Julian Kupiec, Jan O. Pedersen, and Francine Chen. 1995. A trainable document summarizer. In Proceedings of the 18th ACM-SIGIR Conference, pages 68-73.
- Simon Buckingham Shum. 1998. Evolving the web for scientific knowledge. First steps towards an "HCI knowledge web". Interfaces, British HCI Group Magazine 39:16-21.
- M. Weinstock. 1971. Citation indexes. In Encyclopedia of Library and Information Science, volume 5. New Yor, New York. Dekker. 16-40.