Annotation

From AcaWiki
Jump to: navigation, search


Citation: Eduard Hovy (2010) Annotation. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (RSS)


Download: http://aclweb.org/anthology-new/P/P10/P10-5004.pdf

Tagged: Computer Science (RSS) annotation (RSS), annotations (RSS)


Summary:

This tutorial views annotation as a science, with clear steps.

  • Q1: Selecting a corpus
  • Q2: Instantiating the theory
  • Q3: Designing the interface
  • Q4: Selecting and training the annotators
  • Q5: Designing and managing the annotation procedure
  • Q6: Validating results
  • Q7: Delivering and maintaining the product

Corpus selection

  • Consider availability (existing corpora) and openness (so others can evaluate and build on your work).
  • Consider representativeness
    • Different corpora will be appropriate for different purposes

Theory instantiation

Selected References

Stability of annotator agreement


Validation / evaluation / agreement

Kappa agreement studies and extensions

Theoretical and practical relevance:

Annotation is widely used. For example,

  • to provide examples to supervised machine learning for NL
  • to explain corpus analysis in linguistics
  • to empirically test theories of linguistics and NLP
  • to survey previous work, find trends, etc (biosciences, political science)

Other useful annotation tools and resources include:

Sample Corpora and corpora sources

European Language Resources Association UPenn Linguistic Data Consortium American National Corpus Beyond the abstract, slides may be obtained upon request.



Personal tools
Namespaces

Variants
Actions
Navigation
New
Tools
Discussion
Help
Toolbox