Dbnary: Wiktionary as a LMF based Multilingual RDF network

From AcaWiki
Jump to: navigation, search

Citation: Gilles Sérasset (2012-05) Dbnary: Wiktionary as a LMF based Multilingual RDF network. Proceedings of the Eight International Conference on Language Resources and Evaluation (RSS)
Internet Archive Scholar (search for fulltext): Dbnary: Wiktionary as a LMF based Multilingual RDF network
Download: http://www.lrec-conf.org/proceedings/lrec2012/pdf/387 Paper.pdf
Tagged: wiktionary (RSS), semantic web (RSS)

Summary

Wiktionary entries are specified form rather than structure, with different forms for different language editions, and are constantly changing. Previous efforts have been ad hoc, authors believed could progress by creating an open source database based on extractions and open source code for extracting.

The structure of wiktionary language editions and entires are described. Each edition potentially describes every word in every language, and uses its own templates for consistent formatting.

In order to extract variously formatted entries and obtain a single interoperable database, authors separated extraction and data handling. Language edition-specific subclasses of the extractor specify patterns used by an edition. A tool is used to compare extracted graphs in order to detect errors that may be introduced by the evolving nature of wiktionary.

Data is extracted into RDF using a vocabulary based on the Lexical Markup Framework; for simplicity authors avoid reification, unlike previous LMF/RDF mappings.

Authors hope eventually dbnary can become the wiktionary equivalent of dbpedia.