From AcaWiki
Jump to: navigation, search

Citation: Eduard Hovy (2010) Annotation. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (RSS)


Tagged: Computer Science (RSS) annotation (RSS), annotations (RSS)


This tutorial views annotation as a science, with clear steps.

  • Q1: Selecting a corpus
  • Q2: Instantiating the theory
  • Q3: Designing the interface
  • Q4: Selecting and training the annotators
  • Q5: Designing and managing the annotation procedure
  • Q6: Validating results
  • Q7: Delivering and maintaining the product

Corpus selection

  • Consider availability (existing corpora) and openness (so others can evaluate and build on your work).
  • Consider representativeness
    • Different corpora will be appropriate for different purposes

Theory instantiation

Selected References

Stability of annotator agreement

Validation / evaluation / agreement

Kappa agreement studies and extensions

Theoretical and practical relevance:

Annotation is widely used. For example,

  • to provide examples to supervised machine learning for NL
  • to explain corpus analysis in linguistics
  • to empirically test theories of linguistics and NLP
  • to survey previous work, find trends, etc (biosciences, political science)

Other useful annotation tools and resources include:

Sample Corpora and corpora sources

European Language Resources Association UPenn Linguistic Data Consortium American National Corpus Beyond the abstract, slides may be obtained upon request.