# Annotation

From AcaWiki

**Citation:** *Eduard Hovy (2010) Annotation. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (RSS)*

**Download:** http://aclweb.org/anthology-new/P/P10/P10-5004.pdf

**Tagged:** Computer Science
(RSS) annotation (RSS), annotations (RSS)

**Summary:**

This tutorial views annotation as a science, with clear steps.

- Q1: Selecting a corpus
- Q2: Instantiating the theory
- Q3: Designing the interface
- Q4: Selecting and training the annotators
- Q5: Designing and managing the annotation procedure
- Q6: Validating results
- Q7: Delivering and maintaining the product

## Corpus selection

- Consider availability (existing corpora) and openness (so others can evaluate and build on your work).
- Consider representativeness
- Different corpora will be appropriate for different purposes

## Theory instantiation

- Annotation guidelines are essential. These must be developed iteratively.
- There's a tradeoff between the granularity of the categories and the practical attainability. Use as few categories as possible and make distinctions between them clear.
- Measure interannotator agreement and disagreement:
- "Precision" measures the correctness of annotators (compared to a gold standard); it corresponds to how easy the categorization is. See Generalized estimating equations for correlated binary data: Using the odds ratio as a measure of association <math>P_i = number correct/N</math>
- "Entropy" measures ambiguity (clarity of definitions). <math>E_i = -\sum_iP_i</math> See Generalized estimating equations for correlated binary data: Using the odds ratio as a measure of association

## Selected References

### Stability of annotator agreement

- Bayerl, P.S. 2008. The human factor in manual annotations: Exploring annotator reliability. Language Resources and Engineering.
- Lipsitz, S.R., N.M. Laird, and D.P Harrington. 1991. Generalized estimating equations for correlated binary data: Using the odds ratio as a measure of association. Biometrika 78(1):156–160.
- Teufel, S., A. Siddharthan, and D. Tidhar. 2006. An annotation scheme for citation function. Proceedings of the SIGDIAL Workshop.
- Bhardwaj, V., R.J. Passonneau, A. Salleb-Aouissi, and N. Ide. 2010. Anveshan: A framework for analysis of multiple annotators’ labeling behavior. Proceedings of the 4th Linguistic Annotation Workshop (LAW-IV) at the ACL conference.
- Dligach, D., R.D. Nielsen, and M. Palmer. 2010. To annotate more accurately or to annotate more. Proceedings of the 4th Linguistic Annotation Workshop (LAW-IV) at the ACL conference..

### Validation / evaluation / agreement

- Artstein, R. and M. Poesio. 2008. Inter-coder agreement for computational linguistics. Computational Linguistics, 555–596.
- Bortz, J. 2005. Statistik für Human- und Sozialwissenschaftler. Springer Verlag.
- Cohen’s Kappa: Cohen, J. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, pp 37–46.

#### Kappa agreement studies and extensions

- Reidsma, D., and J. Carletta. 2008. Squib in Computational Linguistics.
- Devillers, L., R. Cowie, J.-C. Martin, and E. Douglas-Cowie. 2006. Real life emotions in French and English TV clips. Proceedings of the 5th LREC, 1105–1110.
- Rosenberg, A. and E. Binkowski. 2004. Augmenting the Kappa statistics to determine interannotator reliability for multiply labeled data points. Proceedings of the HLT-NAACL Conference, 77–80.
- Krippendorff, K. 2007. Computing Krippendorff’s Alpha reliability. papers/43 exact method, with example matrices online
- Hayes, A.F. and K. Krippendorff. 2007. Answering the call for a standard reliability measure for coding data. Communication Methods and Measures 1:77–89.

**Theoretical and practical relevance:**

Annotation is widely used. For example,

- to provide examples to supervised machine learning for NL
- to explain corpus analysis in linguistics
- to empirically test theories of linguistics and NLP
- to survey previous work, find trends, etc (biosciences, political science)

Other useful annotation tools and resources include:

- ATLAS.ti qualitative data anlaysis
- Qualitative Data Analysis Program (Pitt)
- UIMA Fit
- GATE
- CrowdFlower
- SamaSource

Sample Corpora and corpora sources

European Language Resources Association UPenn Linguistic Data Consortium American National Corpus Beyond the abstract, slides may be obtained upon request.