Citation: Eduard Hovy (2010) Annotation. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (RSS)
This tutorial views annotation as a science, with clear steps.
- Q1: Selecting a corpus
- Q2: Instantiating the theory
- Q3: Designing the interface
- Q4: Selecting and training the annotators
- Q5: Designing and managing the annotation procedure
- Q6: Validating results
- Q7: Delivering and maintaining the product
- Consider availability (existing corpora) and openness (so others can evaluate and build on your work).
- Consider representativeness
- Different corpora will be appropriate for different purposes
- Annotation guidelines are essential. These must be developed iteratively.
- There's a tradeoff between the granularity of the categories and the practical attainability. Use as few categories as possible and make distinctions between them clear.
- Measure interannotator agreement and disagreement:
- "Precision" measures the correctness of annotators (compared to a gold standard); it corresponds to how easy the categorization is. See Generalized estimating equations for correlated binary data: Using the odds ratio as a measure of association <math>P_i = number correct/N</math>
- "Entropy" measures ambiguity (clarity of definitions). <math>E_i = -\sum_iP_i</math> See Generalized estimating equations for correlated binary data: Using the odds ratio as a measure of association
Stability of annotator agreement
- Bayerl, P.S. 2008. The human factor in manual annotations: Exploring annotator reliability. Language Resources and Engineering.
- Lipsitz, S.R., N.M. Laird, and D.P Harrington. 1991. Generalized estimating equations for correlated binary data: Using the odds ratio as a measure of association. Biometrika 78(1):156–160.
- Teufel, S., A. Siddharthan, and D. Tidhar. 2006. An annotation scheme for citation function. Proceedings of the SIGDIAL Workshop.
- Bhardwaj, V., R.J. Passonneau, A. Salleb-Aouissi, and N. Ide. 2010. Anveshan: A framework for analysis of multiple annotators’ labeling behavior. Proceedings of the 4th Linguistic Annotation Workshop (LAW-IV) at the ACL conference.
- Dligach, D., R.D. Nielsen, and M. Palmer. 2010. To annotate more accurately or to annotate more. Proceedings of the 4th Linguistic Annotation Workshop (LAW-IV) at the ACL conference..
Validation / evaluation / agreement
- Artstein, R. and M. Poesio. 2008. Inter-coder agreement for computational linguistics. Computational Linguistics, 555–596.
- Bortz, J. 2005. Statistik für Human- und Sozialwissenschaftler. Springer Verlag.
- Cohen’s Kappa: Cohen, J. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, pp 37–46.
Kappa agreement studies and extensions
- Reidsma, D., and J. Carletta. 2008. Squib in Computational Linguistics.
- Devillers, L., R. Cowie, J.-C. Martin, and E. Douglas-Cowie. 2006. Real life emotions in French and English TV clips. Proceedings of the 5th LREC, 1105–1110.
- Rosenberg, A. and E. Binkowski. 2004. Augmenting the Kappa statistics to determine interannotator reliability for multiply labeled data points. Proceedings of the HLT-NAACL Conference, 77–80.
- Krippendorff, K. 2007. Computing Krippendorff’s Alpha reliability. papers/43 exact method, with example matrices online
- Hayes, A.F. and K. Krippendorff. 2007. Answering the call for a standard reliability measure for coding data. Communication Methods and Measures 1:77–89.
Theoretical and practical relevance:
Annotation is widely used. For example,
- to provide examples to supervised machine learning for NL
- to explain corpus analysis in linguistics
- to empirically test theories of linguistics and NLP
- to survey previous work, find trends, etc (biosciences, political science)
Other useful annotation tools and resources include:
- ATLAS.ti qualitative data anlaysis
- Qualitative Data Analysis Program (Pitt)
- UIMA Fit
Sample Corpora and corpora sources