Wikidata: A Free Collaborative Knowledge Base

From AcaWiki
Jump to: navigation, search

Citation: Denny Vrandečić, Markus Krötzsch (2014) Wikidata: A Free Collaborative Knowledge Base. Communications of the ACM (RSS)
DOI (original publisher): 10.1145/2629489
Semantic Scholar (metadata): 10.1145/2629489
Sci-Hub (fulltext): 10.1145/2629489
Internet Archive Scholar (search for fulltext): Wikidata: A Free Collaborative Knowledge Base
Wikidata (metadata): Q18507561
Download: http://cacm.acm.org/magazines/2014/10/178785-wikidata/fulltext
Tagged:

Summary

"Wikipedia’s data is buried within 30 million Wikipedia articles in 287 languages, from where it is very difficult to extract" and to update/correct/make consistent across all of these articles (e.g., population of Rome).

Wikidata a new sister project of Wikipedia, multi-lingual, centralizes data for all projects, with following design decisions:

  • open editing
  • community control
  • plurality (allows conflicting data, provides mechanisms for managing, eg references, making as preferred or deprecated)
  • secondary data (gathered from other sources)
  • multilingual data
  • easy access (various formats)
  • continuous evolution

Forerunners: Semantic MediaWiki, OpenCyc, Freebase, extracts from Wikipedia: DBpedia, Yago. Existing consumers of such data include Google Knowledge Graph, Facebook Open Graph, Wolfram Alpha, Evi, IBM Watson, Google Maps. These and more could benefit from more readily available structured data from Wikipedia.

Wikidata launched October 2012, first consolidating links to same topic articles in on different language Wikipedias. Has become the most-edited Wikimedia project, 90% by user-created bots.

Data is in the form of property-value pairs. Properties given types on page defining property; values one of small number of datatypes. Pairs can have subordinate pairs, called qualifiers (eg to specify source).

Wikidata reuses identifiers from other projects (eg MusicBrainz). Each Wikidata entity has a unique identifier, eg http://www.wikidata.org/entity/Q42

Wikidata can be used for, eg:

  • obtaining labels in many languages (358 as of writing)
  • language-independent identifiers
  • an interesting database in its own right
  • enhancing applications, drop-in contextual data
  • advanced analytics, eg derive new facts

Future

  • support complex queries
  • will Wikidata earn trust of Wikipedia communities, other Wikimedia projects, what will the interplay be?
  • more external tools, including interacting with editorial processes
  • data resource for research and improving applications

Theoretical and Practical Relevance

https://www.wikidata.org/wiki/Wikidata:Main_Page