Using automatically labelled examples to classify rhetorical relations: An assessment

{{Summary
 * title=Using automatically labelled examples to classify rhetorical relations: An assessment
 * authors=Caroline Sporleder, Alex Lascarides
 * url=http://homepages.inf.ed.ac.uk/alex/pubs/jnle.rhetorical.html
 * tags=discourse analysis, computational linguistics, rhetorical relations, discourse markers, rhetorical structure, discourse relations
 * summary=This paper takes up the question of whether rhetorical relations can be automatically derived and classified. It focuses, in particular, on discourse markers. These may be ambigious (e.g 'since', 'yet' have multiple uses and are sometimes, but not always, discourse markers); and these discourse markers may also be missing altogether.

The authors comment that: "what is needed is a model which can classify rhetorical relations in the absence of an explicit discourse marker." (p4). Previous work (e.g. Marcu & Echihabi 2002) has suggested creating training data for a classifier by labelling examples which contain an unambiguous lexically marked rhetorical relation, then removing the markers. The main purpose of this paper is to empirically test this.

It also provides an interesting theoretical observation: Two conditions are needed for training on marked examples to work well:

"First, there has to be a certain amount of redundancy between the discourse marker and the general linguistic context, i.e. removing the discourse marker should still leave enough residual information for the classifier to learn how to distinguish different relations."

Second, similarity between marked and unmarked examples is needed so that a classifier can make generalizations.

The paper suggests that texts with lexically marked and lexically unmarked rhetorical relations may be inherently different, in so far as removing discourse markers may change the meaning of a sentence, and classifiers built based on removing markers from classified sentences work little better than chance.

Selected References

 * Marcu, D. and A. Echihabi (2002). An unsupervised approach to recognizing discourse relations. In Proceedings of ACL-02, pp. 368–375.
 * Soria, C. and G. Ferrari (1998). Lexical marking of discourse relations – some experimental findings. In Proceedings of the ACL-98 Workshop on Discourse Relations and Discourse Markers.
 * Sporleder, C. and M. Lapata (2005). Discourse chunking and its application to sentence compression. In Proceedings of the 2005 Human Language Technology Conference and the Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP-05).


 * relevance=Section 2, on related research, summarizes a number of important related work.

Examples of both lexically marked and unmarked rhetorical relations, given in the introduction and in the appendices, will be useful elsewhere. }}
 * journal=Natural Language Engineering
 * pub_date=2008
 * subject=Computer Science