Citation: Eduard Hovy (2010) Annotation. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (RSS)
Internet Archive Scholar (search for fulltext): Annotation
Download: http://aclweb.org/anthology-new/P/P10/P10-5004.pdf
Tagged: Computer Science (RSS) annotation (RSS), annotations (RSS)

Summary

This tutorial views annotation as a science, with clear steps.

Corpus selection

Consider availability (existing corpora) and openness (so others can evaluate and build on your work).
Consider representativeness
- Different corpora will be appropriate for different purposes

Annotation guidelines are essential. These must be developed iteratively.
There's a tradeoff between the granularity of the categories and the practical attainability. Use as few categories as possible and make distinctions between them clear.
Measure interannotator agreement and disagreement:
- "Precision" measures the correctness of annotators (compared to a gold standard); it corresponds to how easy the categorization is. See Generalized estimating equations for correlated binary data: Using the odds ratio as a measure of association $P_{i}=numbercorrect/N$
- "Entropy" measures ambiguity (clarity of definitions). $E_{i}=-\sum _{i}P_{i}$ See Generalized estimating equations for correlated binary data: Using the odds ratio as a measure of association

Bayerl, P.S. 2008. The human factor in manual annotations: Exploring annotator reliability. Language Resources and Engineering.
Lipsitz, S.R., N.M. Laird, and D.P Harrington. 1991. Generalized estimating equations for correlated binary data: Using the odds ratio as a measure of association. Biometrika 78(1):156–160.
Teufel, S., A. Siddharthan, and D. Tidhar. 2006. An annotation scheme for citation function. Proceedings of the SIGDIAL Workshop.
Bhardwaj, V., R.J. Passonneau, A. Salleb-Aouissi, and N. Ide. 2010. Anveshan: A framework for analysis of multiple annotators’ labeling behavior. Proceedings of the 4th Linguistic Annotation Workshop (LAW-IV) at the ACL conference.
Dligach, D., R.D. Nielsen, and M. Palmer. 2010. To annotate more accurately or to annotate more. Proceedings of the 4th Linguistic Annotation Workshop (LAW-IV) at the ACL conference..

Artstein, R. and M. Poesio. 2008. Inter-coder agreement for computational linguistics. Computational Linguistics, 555–596.
Bortz, J. 2005. Statistik für Human- und Sozialwissenschaftler. Springer Verlag.
Cohen’s Kappa: Cohen, J. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, pp 37–46.

Reidsma, D., and J. Carletta. 2008. Squib in Computational Linguistics.
Devillers, L., R. Cowie, J.-C. Martin, and E. Douglas-Cowie. 2006. Real life emotions in French and English TV clips. Proceedings of the 5th LREC, 1105–1110.
Rosenberg, A. and E. Binkowski. 2004. Augmenting the Kappa statistics to determine interannotator reliability for multiply labeled data points. Proceedings of the HLT-NAACL Conference, 77–80.
Krippendorff, K. 2007. Computing Krippendorff’s Alpha reliability. papers/43 exact method, with example matrices online
Hayes, A.F. and K. Krippendorff. 2007. Answering the call for a standard reliability measure for coding data. Communication Methods and Measures 1:77–89.

Annotation is widely used. For example,

Other useful annotation tools and resources include:

Sample Corpora and corpora sources