Wednesday, 3 June 2015

A Preliminary Study on Wikipedia, DBpedia and Wikidata





From the simplest free-text contribution, semi-structured data curation to the complex ontological construction, knowledge bases (KBs) are engaging with a new move toward a more collaborative fashion in the first half of the year 2015.

Within the context of Linked Open Data for Libraries, Archives, and Museums (LODLAM), Wikidata has been introduced to the community, analysed and evaluated for a semantic strategy,  and is on the move to be one of the major partnerships with Europeana. For a more general LOD picture, we witness the new born of DBpedia Wikidata which provides an alternative and semantic rich Wikidata representation through “the Eyes of DBpedia”. Figure 1 (click to be enlarged) shows the timeline of three major KBs and others like OpenCyc, Freebase, Yago and Google KBs.

In search of an approach that we can more directly relate to our LOD prototype, we might begin by focusing on the recent convergence of collaborative KBs mentioned above. Since the major data source of DBPedia and Wikidata is from Wikipedia, an overall understanding of these three KBs is a must. In this study and the presentation (shown below), we follow the arguments of Hovy et al. (2012) that the major advantages of collaborative knowledge generation are: semantified, wide-coverage, up-to-date, multilingual, and free in nature.

The main case to illustrate how a knowledge entity been curated and represented in Wikipedia, DBpedia and Wikidata is exemplified by using our institution, Academia Sinica (AS). AS as a knowledge unit was used to explain the construction of semantic linkage from semi-structured/structured data. Web tools like LodLive ,Resonator and DBpedia Atlas are used for visual semantic representation. Table 1(below) is a preliminary comparison of three KBs with some hyperlinks to original sources.


Wikipedia
DBpedeia
Wikidata
Website
Release Time
January 15, 2001
23 January 2007
30 October 2012
Description
Host
Wikimedia Foundation, Inc.
University of Leipzig; University of Mannheim; OpenLink Software
Wikimedia Foundation, Inc.
Creators
Jimmy Wales, Larry Sanger
Wikimedia community
Data mainly from
Wikipedians
Wikipedia
Wikipedia and sister projects
Generation method
manual/community-created  
automatic/
semi-automatic
semi-automatic;
manual/community-created
Advantage
Free text /
Easiness of access and contribution
LOD Hub,
Semantic coverage & depth
Quality (accuracy) : URI/ Provenance/Contextual representation
Operation
MediaWiki Extension: Wikibase
URI/IRI Schemes
(language).wikipedia.org/wiki/Name
Wikipedia-like IRIs
(language).dbpedia.org/resource/Name
http://wikidata.org/wiki/Qxxx or Pxxx
Data structure
Mostly unstructured texts;
Semi-structured: infobox, category…
Data Access
Free Text Search
DBpedia Spotlight (annotating mentions of DBpedia resources in text)
License
CC Attribution / Share-Alike 3.0; text with dual-licensed under GFDL; media licensing varies.
GNU General Public License
CC0 1.0
Language(s) Support
>125 (as Mar, 2015)
Mutual Relation

After presenting a general over view of three KBs, we recalled the current status of our LOD prototype and discussed questions like:

(1) What concepts are represented in the Artifact A?
(2) What artifacts have been described by the concept X?
(3) What relations are between A and B (or more)?

By using the dat Ontology and R4R ontology, we are able to represent and reason our data in a more semantic and reusable manners.  However, problems like semantic enrichment and coreference of external resources, the cost of mapping tasks, as well as data cleaning issues are at the heart of the pain created by the large and long history of our metadata curation (more than 5 million metadata since 2002).

Without collaborative KBs, we would be seriously invincible when we move toward LOD and a sustainable future. Complementary characters between Wikipedia, DBpedia and Wikidata have offered a more balanced view of the choices that we have to make. This study, although preliminary, has initiated our imagination through three-colored glasses to a semantically linked and collaboratively constructed data curation. 

Reference
  1. Arnold, P., & Rahm, E. Automatic Extraction of Semantic Relations from Wikipedia. International Journal on Artificial Intelligence Tools, 24(2), (2015).1540010
  2. Alexiev, V. Name Data Sources for Semantic Enrichment, (2015). http://vladimiralexiev.github.io/CH-names/README.html
  3. Erxleben, Fredo, et al. "Introducing Wikidata to the Linked Data Web." The Semantic Web–ISWC 2014 (2014): 50-65.
  4. Hovy, Eduard, Roberto Navigli, and Simone Paolo Ponzetto. "Collaboratively built semi-structured content and Artificial Intelligence: The story so far." Artificial Intelligence 194 (2013): 2-27.
  5. Ismayilov, Ali, et al. "Wikidata through the Eyes of DBpedia." ISWC 2015 submission
  6. Lehmann, Jens, et al. "DBpedia–A large-scale, multilingual knowledge base extracted from Wikipedia." Semantic Web 6.2 (2015)
  7. Vrandečić, Denny, and Markus Krötzsch. "Wikidata: a free collaborative knowledgebase." Communications of the ACM 57.10 (2014): 78-85.


Citation Information: Andrea Wei-Ching Huang (2015) A Preliminary Study on Wikipedia, DBpedia and Wikidata. URL: http://andrea-index.blogspot.tw/2015/06/wikipedia-dbpedia-wikidata.html

Popular Posts