Wikidata:WikiProject Scholia/Talks/Wikidata and VIVO in 2018
About edit
This page hosts the basis for a presentation given at the VIVO Development Interest Group's webinar on 18 December 2018 at 11 am Eastern Time, as per this event page and the corresponding announcement on the associated mailing list. The presenter is Daniel Mietchen (Q20895785), a biophysicist at the Data Science Institute of the University of Virginia (Q213439) (see bio sketch).
Introductions to Wikidata and its role(s) in the Wikimedia ecosystem edit
Wikipedia edit
- the encyclopedia that anyone can edit
- ca. 300 instances, organized by language
- ca. 15 of them have more than 1 million entries, largest (English) has over 5 million
Wikimedia ecosystem edit
- ca. 1000 wikis
- shared vision:
- all using the MediaWiki software
- content managed by the respective wiki communities
- infrastructure managed by the Wikimedia Foundation
- projects organized by
- language
- from Amharic to Zulu
- some are multilingual, e.g. Wikimedia Commons and Wikidata
- information channel
- Wikibooks, Wikisource, Wikiquote, Wikimedia Commons, Wikidata etc.
- topic
- Wikispecies, Wikivoyage
- language
- coordination via the Meta wiki, mailing lists, Wikimedia movement affiliate organizations and various other channels
- media files from Wikimedia Commons can be transcluded into other MediaWiki instances
Wikidata edit
The database that anyone can edit
- In one page
- 7-min intro
- multilingual
- over 50 million concepts
- about 20000 contributors per month
- hundreds of edits per minute, often using automated tools
Wikibase edit
- A software suite that allows to read and write structured data using MediaWiki
- Server version is a core component of Wikidata
- Client version is installed across Wikimedia sites
- Can be installed on independent Wikimedia instances
WikiCite edit
- Vision: Imagine a world in which anyone could use an open citation database to support free knowledge, with rich information about every citable source.
- a cross-project initiative and a community coordinating the creation of such a database, leveraging Wikidata as an infrastructure
- home base on Meta
- various other coordination fora, e.g. Wikidata:WikiProject Source MetaData
- organized three conferences so far, e.g. WikiCite 2018 last month
- A hands-on introduction to Wikidata and WikiCite
Scholia edit
- a project to present bibliographic information and scholarly profiles of authors and institutions using Wikidata
- includes various additional functionalities, e.g. to create BibTeX entries from Wikidata
- Wikidata:Scholia
Relationships between the semantic parts of the Wikimedia ecosystem edit
Wikidata and Wikibase edit
Overview of Wikidata usage outside Wikidata edit
- Wikidata-powered
- coming soon: use Wikidata properties in other Wikibase instances
Overview of WikiCite in Wikidata edit
- WikiProject Source Metadata
- WikiProject Books
- WikiProject Periodicals
- WikiProject Names
- Wikidata:WikiProject Biographical Identifiers
- WikiProject Zika Corpus
- WikiCite 2018 opening keynote
- Scholia as of November 2018 (talk)
Wikibase and Wikimedia Commons edit
- Structured data on Commons
- converts information about the 50 million media files on Wikimedia Commons to a structured and machine-readable format, making them easier to view, search, edit, organize and re-use, in many languages
- overview
Internationalization in Wikidata edit
- https://tools.wmflabs.org/reasonator/?q=Q80&lang=en
- https://tools.wmflabs.org/reasonator/?q=Q80&lang=el
- https://tools.wmflabs.org/reasonator/?q=Q80&lang=ru
- https://tools.wmflabs.org/reasonator/?q=Q80&lang=ko
- https://tools.wmflabs.org/reasonator/?q=Q80&lang=ja
- https://tools.wmflabs.org/reasonator/?q=Q80&lang=ar
- https://tools.wmflabs.org/reasonator/?q=Q80&lang=zh
- https://tools.wmflabs.org/reasonator/?q=Q80&lang=ne
- etc.
Overview of semantic capabilities for these tools edit
Wikidata Query Service edit
Brief overview of Wikidata's data model edit
Time check edit
- We should be at around 30 min in about now
My interest in VIVO edit
- interested in integrating research workflows with open workflows, particularly around collaborative curation
- appreciate that VIVO code is open source and openly licensed
- unclear about licensing of the VIVO ontology and some of the VIVO data
- engaged in Joint Roadmap for Open Science Tools
- community overlap between VIVO and WikiCite, e.g. as per
- Dario Taraborelli's VIVO 2016 keynote "Verifiable, linked open knowledge that anyone can edit"
- representatives from both sides involved in FORCE11
- occasional interactions in other contexts, e.g. on Twitter
- VIVO-centric lunch meetup at the recent WikiCite 2018
Entity matching edit
- Wikidata has tools like Mix'n'Match and OpenRefine and many others, how would this work with VIVO?
- Also other tools, e.g. Author Disambiguator (demo)
Next steps? edit
Ontology mapping between VIVO and Wikidata edit
Publication metadata as a potential joint project edit
- Identifying actual sources of data in Wikidata - Eg - attribution of the data source
- Usage of triple-pattern-fragment endpoint in VIVO and perhaps Wikidata
- endpoint down?
- Rules for VIVO
- Attribution
- Data must be CC0
Country data as a potential joint project edit
- Simplify VIVO data
- Possibly replace VIVO's use of FAO data
- Timely updates via Wikimedia process
- Scholia on
Potential future projects edit
- VIVO depth in humanities, grants, advising
- Signed statements
Open questions edit
- Q: Architecture?
- A: See Wikidata Architecture Overview or Wikidata Query Service/Implementation and Wikidata Query Service
- Q: Relationship between Wikibase and the Semantic MediaWiki project
- A: See RDFIO: extending Semantic MediaWiki for interoperable biomedical data management (Q38599953)
- Q: how to link data between VIVO and Wikidata semantic architectures?
- Q1: Can/should we use owl:sameAs to relate individuals that are in multiple semantic ecosystems. It would be nice to use sameas for entities that can be expressed as RDF.
- A1: There is exact match (P2888) that could be used this way (e.g. as per Q36133#P2888). The more frequent approach in such cases however is to create dedicated Wikidata properties to map from Wikidata to external resources, e.g. as per VIAF ID (P214). If we wanted some dedicated Wikidata property for things like "OpenVIVO person ID" or "Colorado experts ID", these would have to be proposed. However, in the OpenVIVO case, the ID seems to be based on ORCID, for which there is already ORCID iD (P496), so we could just give it another third-party formatter URL (P3303) (see Property:P496#P3303). To see how this looks like in RDF, check https://www.wikidata.org/wiki/Special:EntityData/Q56867901.rdf.
- Q2: many institutions have high value curated data - how can we map and preserve the quality
- A2: One mechanism proposed for this are the Signed Statements, which have yet to be fully implemented though — for progress tracking, see this ticket (and some background here). There are multiple existing mechanisms for quality assurance, e.g. property constraint (P2302) and the associated constraint violation reports, and there will be a Workshop on Data Quality Management in Wikidata in Berlin on January 18, 2019, where such matters will be discussed in detail.