Wikidata:WikiProject Wikidata for research/EINFRA-9-2015/Notes from drafting
Contributions to the "Wikidata for research" project (including Wikidata:WikiProject Wikidata for research and all its pages) are dual licensed under CC BY-SA 3.0 (the Wikimedia default) and the Creative Commons Attribution 4.0 license.
Contributions by the project to the item and project namespaces of Wikidata shall be under CC0.
This page contains notes that have been used for drafting the EINFRA-9-2015 proposal Enabling Open Science: Wikidata for Research (Wiki4R), which are kept here for archival purposes.
Summary
editThe aim of the project is to draft:
- a research proposal (Q947859) that aims at establishing Wikidata (Q2013) as a collaborative hub around research data (Q15809982) or, in other words, a virtual research environment (Q1674651) for citizen science (Q1093434) around data management (Q1149776) across fields of research (Q42240), building on similar work in genetics (Q7162);
- in response to the Horizon 2020 (Q13583472) EINFRA-9-2015 call;
- in public;
- in time for submission by January 14.
In the long run, WikiProject Wikidata for research is intended to facilitate activities at the Wikidata/ research interface more generally (e.g. with a regional or disciplinary focus).
Proposal management
editCall
edit- "These virtual research environments (VRE) should integrate resources across all layers of the e-infrastructure (networking, computing, data, software, user interfaces), should foster cross-disciplinary data interoperability and should provide functions allowing data citation and promoting data sharing and trust. "
Political context
editWe are collecting key EU policy information in this document.
- elaborate on alignment with objectives of the call and of the wider work program for 2014-2015 (PDF) and 2016-17 (PDF)
- Take into consideration existing EU policies and initiatives (big data, digital agenda EU, open access, global challenges, citizen science).
- also related national, regional or local initiatives, e.g. Vetenskapsrådet's Nationella riktlinjer för open access, the Long-term strategy to make the UK the best place in the world for science and business or open-data cities like Glasgow
Submission forms
editThe EU has defined very precisely in which form proposals have to be submitted in response to this call. Since those template documents are not under an open license, they cannot be copied here. So we have copied them into Google docs that are editable by anyone.
- Main folder
- Template section 1-3 (70 pages maximum)
- Template section 4-5 (no page limits, but good to be concise)
Timeline
editThe draft has now been completed and submitted.
Upcoming
edit- January 14: Submission deadline is 17:00 Brussels time
- January 15:
- update wiki page
- Wikipedia as a research tool (in Amsterdam)
- January 16:
- blog post
- January 17: start thinking about follow-up activities for WikiProject Wikidata for research
- national/regional proposals
- Germany (DFG)
- discipline-specific proposals ("Structured X")
- Biodiversity
- Chemistry
- Computer science
- History
- Mathematics
- Medicine
- national/regional proposals
Past
edit- January 13:
- drafting continued on remaining parts of the proposal
- test submission successful
- January 12:
- test submission failed
- drafting continued on all parts of the proposal
- January 11:
- drafting continued on all parts of the proposal
- WD4R in the GLAM-Wiki newsletter
- January 10:
- drafting continued on all parts of the proposal
- January 9:
- drafting continued on all parts of the proposal
- multiple letters of support/ collaboration/ intent received from Associate partners
- January 8:
- continued to flesh out Excellence, Impact, workpackages
- suggested start and end month for some tasks
- first concrete budget estimate
- January 7:
- excluded tasks lexicalization and nanopublications
- suggested leader for each task
- first estimate of effort in person months per task and partner
- drafting, brushing, revising in all parts of the proposal
- Wikidata meetup London
- January 6:
- Objectives section fleshed out
- GDocs switched from publicly editable to view/ comment only
- January 5:
- a flurry of post-holiday activity
- drafting continued on all aspects of the proposal, with a focus on objectives and workpackages
- added a new Task "Wikidata for cultural heritage" to WP4
- suggested new Task "Wikidata as a repository for nanopublications" for WP4
- revised timeline
- January 4:
- continued drafting deliverables and milestones
- reframed WP4 as "Enabling the use of Wikidata in research contexts"
- started to flesh out Workpackage 4
- moved the LOD for Wikidata task (3.4) into WP4
- moved the Wikidata identifiers in lab contexts task (3.3) into WP4
- added a new Task "Citizen science" to WP4
- January 3
- provisionally determined leaders of workpackages 3 (UM) and 4 (WMDE)
- brought the remaining tasks suggested by UPS and UPM into Workpackage 4; still very rough
- continued to flesh out Workpackage 3
- first estimate of effort in person months per task; summary to wiki
- added detailed instructions for letters of support/ collaboration
- January 2:
- started to flesh out Workpackage 3
- merged Task 2.3 into Task 3.1
- added Task 3.4 "Linked Open Data for Wikidata"
- defined tasks in WP4
- started to think about #Illustrations
- article published about the proposal in The Signpost, a Wikimedia community publication
- January 1:
- started flesh out the Excellence section
- continued to flesh out Workpackage 2
- revised #Timeline
- December 31:
- Workpackage 5 fleshed out; summary to wiki
- section on Associate Partners added to Google doc
- added more detailed history examples
- started to flesh out Workpackage 2
- December 30:
- moved use cases out of WP4, with the aim of preselecting a few and building the proposal around them
- merged "optimizing openness" (from WP4) and " Identification of resources" tasks (3.1) - we will identify CC0 sources suitable for use on Wikidata, and then take a look at the reasons behind their choice of CC0, with the aim of compiling some best practice guideline
- merged Task 2.4 Consistency management and 3.3: Data citation and provenance and Task 3.4: Review and verification into Task 3.3: Quality assurance
- rearranged tasks in WP3
- Public Hangout: 10pm CET (which is UTC +1; see it in your timezone)
- redefined WP4 as being about Wikibase tools for research
- incorporated suggestions from UPM and UPS into respective workpackages
- December 29: administrative work; scope of workpackages
- December 28: Whole consortium section initially fleshed out; summary to wiki
- December 27: fine tuning of tasks, partner descriptions and EU forms; refined timeline
- December 26: fine tuning of tasks, partner descriptions and EU forms
- December 25: fine tuning of tasks
- December 24: Workpackages restructured, WP2 (software development) merged into tasks in other WPs
- December 23: refinement of tasks; Public Hangout: 10pm CET (which is UTC +1; see it in your timezone)
- December 22: Public Hangout: 10am-noon CET (which is UTC +1; see it in your timezone)
- December 21: refinement of workpackages and scope
- December 20: refinement of workpackages and scope
- December 19: switch focus of drafting to EU templates (via public GDocs); second blog post
- December 18: Definition of partners; set up mailing list; edited project summary
- December 17: refinement of workpackages and scope; discussions of partner roles
- December 16: refinement of workpackages and scope; discussions of partner roles
- December 15: refinement of workpackages and scope; discussions of partner roles
- December 14: NYC Wikidata Workshop and Skill Share; discussions of partner roles
- December 13: refinement of workpackages and scope
- December 12: refinement of workpackages and scope
- December 11: first somewhat complete description of scope of workpackages
- December 10: started to define tasks
- December 9: more outreach; WikiProject has 11 members; 7 institutions have publicly expressed interest in joining, several more in private
- December 8: more outreach; WikiProject has 5 members
- December 7: refining scope; workpackages finally get numbers
- December 6: refinement of workpackage structure; added long-term perspective by starting WikiProject Wikidata for research
- December 5: initial blog post; initial sketch of workpackage structure
- December 4: launch of this wiki page out of user space
Project duration
edit- 3 years
Project title
edit- Official title: Enabling Open Science: Wikidata for Research
- Short title: Wikidata for Research
- Acronym: Wiki4R
Project partners
editPartner descriptions are available via the EU forms in Google docs.
- Natural History Museum, Berlin (Q233098) (MfN; coordinator) - contact person: User:Daniel Mietchen
- Wikimedia Deutschland (Q8288) (WMDE) - contact person: User:Abraham Taherivand (WMDE)
- Maastricht University (Q1137652) (UM) - contact person: User:Egon Willighagen
- Open University of Catalonia (Q3042433) (UOC) - IN3 (Open Science & Innovation group) - contact person: Eduard Aibar
- Europeana (Q234110) (EF)/The European Library (Q240304) - contact person: Alastair Dunning
- University of Paris-Sud (Q1480643) (UPS)/ ComUE Paris-Saclay University (Q13531686) (CDS - Center for Data Science of University of Paris-Saclay) contact person: Karima Rafes
- Technical University of Madrid (Q25864) (UPM)/ DBpedia (Q465) (esDBpedia) - contact person: Asuncion Gomez-Perez
- The consortium as a whole thus covers the major branches of the natural sciences, along with the information sciences and Semantic Web, the arts and humanities, the cultural and natural heritage sector, and civil society. Plus, working in the open facilitates getting feedback on whatever aspect is missing.
Associate partners
editThis section has now been migrated to the submission forms too.
We have received more interest from potential partners than we can accommodate in this project. We are grateful for that, since we see this proposal only as a start to a both deeper and broader engagement between the research and Wikidata communities, and encourage institutions with an interest in that to join the project as an Associate Partner. Importantly, this role is not limited to institutions eligible for funding under Horizon 2020 (Q13583472), so open for entities based outside the European Research Area (Q1377820). The caveat is that Associate Partners are not eligible to receive direct funding through the project, except for travel costs related to project meetings.
To get this process going, please sign up your institution below. Institutions that had previously signaled an interest in becoming a partner have all been moved here too.
- York Museums Trust (Q15233904) (YMT) - contact persons: Pat Hadley or Martin Fell
- Interdisciplinary Centre for Mathematical and Computational Modelling at University of Warsaw (Q11713358) (ICM) – contact persons: Piotr Jan Dendek, Łukasz Bolikowski (expertise: scalable text mining, social network analysis, and their applications to analysis of scholarly communication)
- Open Science Framework (Q18691678) (OSF), Open Science Framework, developed by the Center for Open Science (http://cos.io) - contact person: Andrew Sallans - expertise: infrastructure and services to support reproducibility, intergrity, and transparency; main platform is the Open Science Framework (http://osf.io)
- Histropedia - contact person: User:NavinoEvans. Historical research, academic use and facilitating third party use of data.
- interested: University of Koblenz - Landau (Q448590) (UKOB) - Institute for Web Science and Technologies - contact person: User:renepick and Steffen Staab (Q1764755)
- interested: Bern University of Applied Sciences (Q466455) (BUAS), E-Government Institute (competencies relevant to the project: linked open data; identity and access management; link to digital humanities and heritage institutions) - contact person: User:Beat Estermann
- interested: University of Bologna (Q131262) (UNIBO), Department of Computer Science and Engineering, Bologna, Italy - contact person: Silvio Peroni - expertise: development and use of models, such as XML schemas and OWL ontologies (e.g., SPAR Ontologies), for describing the various aspects of the publishing domain (biblographic metadata, functions and contexts of citations, document structures, research context related to publications, etc.), development of workflows for converting automatically legacy bibliographic data (e.g., Elsevier's Scopus) into RDF data compliant to SPAR Ontologies (e.g., Semantic Lancet Project), development of RDF-aware and Web-based graphical interfaces for browsing publishing data and provenance data
- interested: Ruđer Bošković Institute (Q7383690) (RBI) - contact person: Alen Vodopijevec
- interested: Medical University of Vienna (Q700731), Center for Medical Statistics, Informatics and Intelligent Systems, Austria - contact person: Matthias Samwald
- interested: Ontotext (Q7095072) - contact person: User:Vladimir_Alexiev. Industrial uses of wikidata (eg for semantic enrichment), improving wikidata Class Hierarchy (which currently is a horrible mess)
- interested: Meise Botanic Garden (Q3052500) - contact person: Quentin Groom. Contributing and using wikidata in the context of biodiversity and environmental change.
- interested Petermr (talk) 11:21, 22 December 2014 (UTC) running http://contentmine.org project to extract scientific facts on a daily basis from the scholarly literature.
- At Consumer Reports (Q1957782) I and my organization are interested in improving the quality of health information in Wikimedia projects, and especially making data more accessible to layman audiences. Blue Rasberry (talk) 16:36, 24 December 2014 (UTC)
- Royal Society of Chemistry (Q905549)Q905549) around chemistry links. contact person Richard Kidd
- your institution?
Mailing list
editA mailing list has been set up for proposal preparation:
- https://groups.google.com/a/wikimedia.de/d/forum/wd4r
- To post to the group, use
- wd4r (at) wikimedia (dot) de
- To post to the group, use
Budget
editThe call suggests 2-8 million euros per funded project. We aim below that, with an estimated total effort of ca. 200 person months.
- Shall contain provisions for
- staff time + overhead
- travel (including for Associate Partners)
- materials, equipment and other resources
Advisory Board
edit- We need one that consists of people that reviewers recognize as both relevant and competent. Let's aim at about 10 initially, about half of which should be from within the European Research Area (Q1377820), half from outside of it. Having more than 10 is fine.
- role for community
Illustrations
editCustom-made
edit- several figures in suggestions by UPM and UPS
Images
edit- is this image depicting black death or leprosy?
- Citation needed
- WLM 2014 participating countries
Screenshots
editFor copyright reasons, screenshots may not be compatible with inclusion into the proposal, which is licensed CC BY 4.0
Wikidata UI
edit- Marie Curie (Q7186)
- Albert Einstein (Q937)
- Mona Lisa (Q12418)
- Girl with a Pearl Earring (Q185372)
- The Prison of Copenhagen (Q14951231)
- Liancourt Rocks (Q20317)
- especially the located in the administrative territorial entity (P131) statement, which has values for both a South Korean and a Japanese province
- an alternative would be the territory claimed by (P1336) statement in Crimea (Q7835), which has both Ukraine and Russia
Reasonator
edit- Marie Curie http://tools.wmflabs.org/reasonator/?&q=7186
- Albert Einstein http://tools.wmflabs.org/reasonator/?q=Q937
- Mona Lisa http://tools.wmflabs.org/reasonator/?&q=12418
- Het Meisje met de Parel http://tools.wmflabs.org/reasonator/?&q=185372
- eye contact https://tools.wmflabs.org/reasonator/?q=633546
- found multiple images, e.g. File:Affe vor Skelett.jpg
- The Prison of Copenhagen https://tools.wmflabs.org/reasonator/?q=14951231
Autolist
edit- paintings by Kandinsky http://tools.wmflabs.org/autolist/autolist1.html?q=claim%5B170%3A61064%5D (incomplete)
- paintings by Frans Hals http://tools.wmflabs.org/autolist/autolist1.html?q=claim%5B170%3A61064%5D (incomplete)
- isotopes of beryllium http://tools.wmflabs.org/autolist/autolist1.html?q=claim%5B279%3A463820%5D
Stats
edit- http://tools.wmflabs.org/wikidata-todo/stats.php
- Special:Statistics
- Wikidata:Database reports/Popular items
- Wikidata:Database reports/List of properties/Top100
- Wikidata:Statistics is outdated
Similar projects
editThe introduction has to provide an overview of the current state of relevant research. This includes appropriate mentions of initiatives with overlapping focus.
- within a given research field
- many silos
- some attempts to bring things together
- EAGLE-wiki, a wikibase-based repository of epigraphies.
- across science
- government data
- http://www.data.gov/open-gov/
- similar setups in multiple countries
- very little science data
Specific use cases
edit- Requires easier access to Wikidata content. Currently there's two complementary APIs, the official Wikidata API (broadly speaking, for id-based queries) and Magnus' Wikidata query (mainly, for property-based queries). These need improved documentation
Cross-disciplinary use cases
edit- Bibliographic metadata
- journal articles
- PLOS thesaurus https://github.com/PLOS/plos-thesaurus
- consider other research outputs too, especially data, code and OER
- coordinate with Wikidata:Wikisource integration
- History/Cultural heritage collections
- MfN
- TEL
- YMT
- Wikidata:WikiProject sum of all paintings
- http://www.histropedia.com/
- https://lists.wikimedia.org/pipermail/wikidata-l/2014-December/005123.html
- http://www.jeffersonbailey.com/speak-to-the-eyes-the-history-and-practice-of-information-visualization/
- http://www.dispar.org/reference.php?id=92
- history of science
- Wikipedia as a time machine
- now increasingly applies to Wikidata too
- People
- Researchers, collectors
- Citizen science
- example: Mechanical Curator collection
- public surveys (e.g. in this form: en:Breeding bird survey)
- requires a platform that gives proper credit to participants (similar to authorship on Commons)
- out of scope?
- multimedia
- audio
- video
- 3D?
- Experimental Wikimedia Commons RDF extraction with DBpedia
- Audubon Core
- Multilingual information
- any of the above
- interested: UPM
- see also Wikidata:Wiktionary
Discipline-specific use cases
edit- Chemistry
- Small molecules
- Compounds
- Analytical techniques
- Minerals
- Meteorites
- Biochemical pathways
- Wikipathways
- example https://en.wikipedia.org/wiki/Alpha-Ketoglutaric_acid#Interactive_pathway_map
- example template: https://en.wikipedia.org/wiki/Special:WhatLinksHere/Template:TCACycle_WP58
- category on enWP: https://en.wikipedia.org/wiki/Category:WikiPathways
- category on Wikipathways: http://www.wikipathways.org/index.php?title=Special:CurationTags&showPathwaysFor=Curation:Wikipedia
- cf. multilingual
- Wikipathways
- isotopes
- Mathematics
- mathematical reasoning
- "The challenge in putting mathematics on the World Wide Web is to capture both notation and content (that is, meaning) in such a way that documents can utilize the highly-evolved notational forms of written and printed mathematics, and the potential for interconnectivity in electronic media."
- automated theorem proving (Q431667)
- example
- equation search?
- robot biologists
- Classifying mathematical structures
- Classifying algorithms
- somewhat exists for implementations
- Classifying algorithms
- Robotic reasoning
- mathematical reasoning
- Epigraphy
- EAGLE-wiki, a wikibase-based repository of epigraphies.
- Philosophy
- visualization of interactions of ancient philosophers: http://projetjourdain.org/network/index.html
- see also blog post on the matter
- WikiGalaxy (not Wikidata-based, but could be)
- tempo spatial display:
- http://tools.wmflabs.org/wikidata-todo/tempo_spatial_display.html?q=Q311 Père Lachaise Cemetery
- http://tools.wmflabs.org/wikidata-todo/tempo_spatial_display.html?q=Q361 World War One
- visualization of interactions of ancient philosophers: http://projetjourdain.org/network/index.html
- Biodiversity
- taxon concepts
- tree of life demo: https://tools.wmflabs.org/tree-of-life/
- taxon concepts
- Agriculture
- OpenFarm
- Data science
- DBpedia
Other fields
edit- out of scope?
- only some will be possible with project funding, but others should ideally get some basic level of generic support
- see also How to use Wikidata: Things to make and do with 30 million statements
- in other research areas
- consider the Gene Wiki
- what kinds of research are facilitated by it?
- try to do similar things in related areas
- avian genomes
- plant genomes
- try to do similar things in different areas
- small molecules
- taxonomy
- checklists (cf. en:List of plants of Cerrado vegetation of Brazil)
- fossils (The maintainer of fossilworks.org, John Alroy, allows us to map the sites identifiers)
- Rfam/Pfam
- subsets of celestial objects, e.g. exoplanets
- EAGLE-wiki, a wikibase-based repository of epigraphies.
- differentiate between Wikidata-enabled and Wikibase-enabled?
- out of scope?
- Wiki Atlas http://4thmain.github.io/projects/hacks/wiki-atlas.html
- Wikidata map view http://tools.wmflabs.org/wikidata-analysis/map/map.html
- TEL has contacts with various projects, particularly in history and literature, that want to make use of the data
- consider the Gene Wiki
- for citizen science
- consider multiple languages
- identify community needs and feed them back to software development and WP2 (mapping), WP3 (import), WP5 (training)
- some will be brought in by project partners as per project's Description of Work, others can be proposed by the community
- ideally aligned with existing task forces or WikiProjects on Wikidata
- http://wikimediadc.org/wiki/User:Econterms/Wikiscience_proposal_draft
- Maps
- http://tools.wmflabs.org/hay/directory/#/keyword/wikidata
- http://tools.wmflabs.org/wikidata-exports/miga/#_item=1340
- xkcd episodes: http://tools.wmflabs.org/autolist/autolist1.html?q=claim%5B361%3A13915%5D
Workpackages
editThe content of this section is gradually being migrated into the corresponding GDocs.
WP1: Management, coordination and communication
edit- WP leader: MfN
- Objectives
- Consortium management
- EU forms in Google doc
Task 1.1: Management of the consortium
edit- Task leader: MfN
- Decision-making
- small consortium, so simple governance structure
- ensuring that milestones are reached and deliverables produced
- quality assurance
- independent external evaluator
- quality assurance
- Internal communication
Task 1.2: Finance and reporting
edit- Task leader: MfN
- all partners involved
- Accounting
- Financial reports
- Progress reports
WP2: Semantic mapping
edit- WP-leader: UM
- Objectives
- Based on input from WP1, WP3, and WP4 define classes of interest
- Define property profiles for each of the selected classes
- Mapping identifiers of properties and items across multiple sources
- relative to existing database mapping services, the strength of Wikidata should be in the ability for users to correct annotation errors and add missing links
- provide mappings to research community in useful formats (e.g. "scientific lenses")
- EU forms in Google doc
Task 2.1: Property profiles for item classes
edit- Task leader: UPM
- interested: UM?
- interested: UPM
- Wikidata covers many applications, scientific and other
- The total number of properties applicable to a given item class is the union of the properties used for various use cases.
- This tasks focuses on the development of VRE-specific property profiles (sets of properties and associated policies and best practices that meet the functional requirements of specific research use cases).
- Wikidata development if property profiles require that
- e.g. of Quantities with units and Geo-shape datatype
- Example of user community specific profiles (cf. WP4), e.g. as in Wikidata:WikiProject Source MetaData/Bibliographic metadata for scholarly articles in Wikidata
- DOI (P356)
- NCBI taxonomy ID (P685)
- etc
- that should include ORCID iD (P496) for authors of a paper and perhaps some identifiers for their institution
- Examples for history: Identification of at least three subject areas and time periods of historical significance and diverse characteristics to capture in the ontological system or systems determined to be most applicable, and prepare Wikidata entries for these subject areas using the selected ontology or ontologies.
- see also Representing knowledge – metadata, data and linked data
- for selected item classes, create list of properties for which statements would be expected on such items
- Facilitate establishing guidelines on where to draw the line between data that do and do not fit into Wikidata (Notability).
- Short term
- Intermediate term
- Long term
- mention network of Wikibase instances as an alternative to putting everything into Wikidata
- EAGLE-wiki, a wikibase-based repository of epigraphies.
- UPS examples
Task 2.2: Semantic mapping for properties
edit- Task leader: UPM
- interested: UM?
- interested: UPM
- Map properties identified in 2.1 as relevant for the VRE across multiple sources relevant to the user communities
- Map to global standards as well as to local semantics used in the data sources to be connected to Wikidata in WP3.
- using statements on properties in general
- especially equivalent property (P1628) and subproperty of (P1647)
- coordinate with ontologies that are already widely used in research contexts
- e.g. OBO/ ChEBI (see talk page)
- DBpedia
- Freebase
- RoboBrain
- for the properties identified in the property profiles in 2.1, build a pattern that allows statements about those item classes to be expressed in RDF
- There is a need to harmonize the ontologies employed in Wikidata and elsewhere. Ideally, a single ontology may be identified that will serve this purpose, but it may also be the case that several ontological systems would be preferred, to avoid our understanding of knowledge in a given area being constrained by ontological categories.
- Subject the Wikidata entries so developed to a range of research queries or “reasoning” analyses, to see if the ontological properties are of practical utility.
- Establish as a deliverable a workflow containing the recommendations from this activity as to the best ontological approach for any of the use cases, and recommendations for the most effective methods for incorporating data from relevant institutions.
WP3: Integrating research resources with Wikidata
edit- WP-leader: ?
- interested: TEL
- TEL has an open data set based on bibliographic data from national (and some research) libraries across Europe. Currently 90m records are converted into RDF. This number is growing. In this project we want to integrate this data with Wikidata.
- interested: UPM
- UPM (as developer of datos.bne.es) could bring bibliographic metadata from the Spanish National Library. Currently, 8 million records are converted into RDF and linked with other library catalogues such as: VIAF, Library of Congress, the French National Library, the German National Library, LIBRIS, SUDOC and Dbpedia.
- UPM (as developer of http://geo.linkeddata.es) could bring geographic data from the Geographic Spanish Institute.
- interested: MfN
- involved in numerous initiatives aimed at making biodiversity research data or cultural heritage available as open data
- interested: UM
- contribute isotope data from the Blue Obelisk project
- interested: TEL
- Objectives
- increase the amount and semanticity of scientific information available on Wikidata in the focal areas defined in WP4
- EU forms in Google doc
Task 3.1: Import into Wikidata
edit- Task leader: ?
- interested: TEL
- interested: MfN
- interested: UPM
- generally requires approval from Wikidata community (ideally via relevant WikiProject)
- and has to meet Notability criteria
- should build on the property profiles identified in 2.1 and populate items with corresponding statements, importing necessary information from external databases as appropriate.
- may require development work (e.g. for bots)
- the GLAMwiki Toolset may be useful
- and the bots powering Wikidata:WikiProject sum of all paintings
- provide Wikibase interfaces to research workflow tools
- supported in mw:Wikidata Toolkit
- improve documentation
- may require mapping work (e.g. creating data models) - WP2
- with provenance - Task 3.3
- may require verification - Task 3.3
- some data from the H2020 Open Data pilot
- Freebase integration as a test case
- Map identifiers of items across multiple sources relevant to the user communities
- including DBpedia
- example: ethanol (Q153)
- statements and qualifiers
- for the classes identified in 2.1, apply the patterns created in 2.2 to existing Wikidata items belonging to those classes and to newly created items (see also consistency management, 3.3)
- expose that information as RDF and via SPARQL in real time
- Experimental Wikimedia Commons RDF extraction with DBpedia
Task 3.2: Quality assurance
edit- Task leader: UPM
- interested: UPM
- interested: MfN
Data citation and provenance
edit- Facilitate establishing guidelines around provenance and data citation
- use standards like W3C PROV
- see discussion by Paul Groth here
- Freebase integration as a test case
- see also Wikidata:Primary sources tool
Review and verification
edit- Develop Wikidata-based professional review and verification mechanisms
- checking against third-party databases
- couple this with other forms of expert review, ideally with technical support, e.g. via annotations
- signal to Wikidata users any available measures of trust in a given statement
Consistency management
edit- Consistency checking across multiple sources of data or vocabularies
- identify cases where multiple databases agree
- need original source
- identify cases where they do not agree, and then sort out why
- if this is systematic, it might get people engaged
- example: is this image depicting black death or leprosy?
- pingback mechanisms to original sources
- propagation to DBpedia
- see also Tasks 2 and 3 in UPM suggestions
Task 3.3: Optimizing openness
edit- Task leader: ?
- interested: MfN
Identification of sources of CC0 data
edit- Identification of relevant data available under CC0
- from scholarly curated databases
- from the peer-reviewed literature
- facts: example workflow described in http://magnusmanske.de/wordpress/?p=245 , using ContentMine
- bibliographic metadata about scholarly references
- CrossRef
- incl. licensing information
- CrossMark
- FundRef
- Open Citation Corpus and related-work.net (including)
- PubMed
- PMC
- arXiv
- citation types http://wikimediadc.org/wiki/User:Econterms/Wikiscience_proposal_draft
- CrossRef
- including on images and media from scholarly sources
- from other scholarly sources
- from Wikidata:Data collaborators
- Identification of a set of institutions who would participate in making their collection information available under CC0.
- could highlight barriers to reuse, such as sui generis database rights in the EU (see also talk page)
Analyzing motivations for CC0 licensing
edit- Optimizing the general societal benefits of Wikidata enabled research.
The sharing of data and other resources is an integral part of research endeavours. In the Web age, most new research objects are digital, and many legacy ones are being digitized. Once digital, they can be easily shared over the Web, and from there, it is technically only a very small step towards opening them up for reuse by a potentially global and cross-disciplinary audience. Socially, though, this step is larger, and few incentives beyond altruism exist for institutions, research groups or individuals to fully embrace openness.
This task is concerned with identifying benefits that accrue to those who share their research openly. Mid- to long-term effects of openness have been the subject of prior investigations that established, for instance, citation advantages for open-access articles or for publications associated with open data or open-source software. The situation is much less clear for immediate and short-term benefits, but if such benefits exist, an analysis of best practices around them can help harness their potential for data providers, the wider research community, and society at large.
- Open can be more efficient - details depend on data type, discipline, workflows, identifier integration, abilities to analyse type of changes, IPR situation, use potential, communities
- Concrete scientific/ societal benefit analysis and optimization for data providers and organisations investing into the collaboration.
- see also Crowdsourcing: the Wiki Way of Working
- Workflow requirements to optimize the mutual benefit for both scientific organisations and the open knowledge movement.
- Develop best practices recommendation for collaboration
- Under what conditions is it beneficial for data providers to share their data openly, and via Wikidata? Under what conditions is it beneficial for Wikidata to integrate with research data and communities?
- Including analysis of citizen observation and citizen data science
- Coding da Vinci
WP4: Enabling the use of Wikidata in research contexts
edit- WP-leader: WMDE
- Objectives
- Facilitate research that makes use of Wikibase instances, especially Wikidata and Wikimedia Commons
- EU forms in Google doc
- Wikidata development for WD4R that goes beyond the Wikidata:Development plan (either in speed or scope) and facilitates integration of Wikidata with research activities
- provide tools (incl. bots and gadgets) to facilitate use of Wikidata in research contexts
- according to the use cases
- improve documentation
- see also Wikidata:Tools
- Wikidata:Data access
- access to RDF dumps provided by mw:Wikidata Toolkit
- WikiBase has to export in real time without intermediate in an SPARQL endpoint without latency.
- developing an API for Project Jupyter (Q18633895) - pulling data right into NumPy (Q197520) or pandas (Q15967387) data structures would ensure that Wikidata integrates well into scientific workflows.
- interfaces for other workflow tools - scientific workflow system (Q7433829) - would be useful too, e.g. Taverna workbench (Q7689070) or Open Science Framework (Q18691678)
- developing a library (or, even better, an app) to get painless transformations of API queries to some standard dataviz (e. g. social graph…) - cf. use cases
Linked Open Data for Wikidata
edit- Task leader: UPM
There are multiple ways in which Wikidata could be integrated with the Linked Open Data. This task will explore two of them: Linked Data Fragments, a SPARQL endpoint. A third one — mapping through DBpedia — is explored as part of the mapping to external databases in WP2.
Linked Data Fragments
edit- The basic idea here is to take some of the load off from SPARQL endpoint servers to the client side by making intelligent use of dumps.
- A demo for Wikidata already exists at http://wikidataldf.com/ but needs work in order to become fit for research needs.
Basic SPARQL endpoint
edit- as per Task 1.1 Integrating Wikibase with research workflows in USP/CDS suggestions for WP4
- see also Task 1 in UPM suggestions
- what level of "Basic" would be useful here?
Editing via the Wikibase API
edit- task leader: UPS
- using external tools, e.g. from lab notebooks, intranet
- cf. task 1.5 in UPS/CDS suggestions in WP4
Wikidata identifiers in the lab
edit- task leader: UPS
- Extend Wikibase to connect the private Wikis in the laboratories and the identification keys in Wikidata
- OK for Center for Data Science : interconnect the different laboratories' data via the concepts in Wikidata and internal wiki for human and via a SPARQL endpoint for the machine (in WP4). contact person: Karima Rafes
Citizen science
edit- Task leader: MfN
Wikidata for cultural heritage
edit- Task leader: Europeana
WP5: Dissemination and stakeholder engagement
edit- WP-leader: UOC
- Objectives
- Education about using Wikidata as a VRE and how to collaborate with the community (Wikimedia-culture, WikiProjects)
- EU forms in Google doc
Task 5.1: Dissemination
edit- Task leader: MfN
- moved to Google doc
Task 5.2: Community engagement
edit- Task leader: MfN
- all partners involved
- moved to Google doc
- Wikidata community
- participation in Wikimedia hackathon and Wikimania
- possibly with workshops
- project meetings as satellite meetings to above
- participation in Wikimedia hackathon and Wikimania
- Research community
- Publications
- Drafted and reviewed as openly as possible
- includes publication of this research proposal
- participation in scientific meetings focused on data and identifiers
- possibly with workshops
- Publications
- Developer community
- GitHub organization
- could initially reside under https://github.com/wpoa
- GitHub organization
- Meetings at the interfaces between the researcher, Wikimedia and developer communities
- Upon submission, proposal shall be put on Zenodo and possibly published in a more formal way
- Forking encouraged
Task 5.3: Development of tutorials
edit- Task leader: UOC
- moved to Google doc
Task 5.4: Development of course materials
edit- Task leader: UOC
- moved to Google doc
Task 5.5: Development of a MOOC
edit- Task leader: UOC
- moved to Google doc
Task 5.6: Organization of training events
edit- Task leader: UM
- moved to Google doc
Notes
editHere, information is dumped that may be useful in drafting but for which no better place could be found so far.
- List of all data types available
- Wikidata:Database reports
- User:Pasleim/Sitelink statistics
- User:Pasleim/Badge statistics
- Is access to sources an issue for Wikidata contributors and verifiability?
- see also Wikidata:Primary sources tool
- Wikidata:Verifiability
- Wikidata:Database reports
- Wikidata stats by project
- Wikidata:Glossary#Query
- Wikidata:Database download
- Wikidata:Data access
- see also somewhat outdated 3 Ways To Access Wikidata Data: Python, Dumps, and Linked Data
- mw:Manual:ContentHandler
- Linked Data Fragments
- original E-biomed
- should probably go to Wikisource
- Webcitation copy
- Resource Identification Initiative (Force11)
- WMFLabs tools for Wikidata
- for using dashes, see en:WP:DASH
Ideas not to be included in this proposal
editThis section collects activities that have been considered for inclusion into the project but determined to be out of scope. We are keeping them here for the time being in case some aspect thereof should become relevant during further development of the proposal.
- Update management/ version control
- solved for item/ property etc. pages via MediaWiki
- out of scope for ontologies?
- Wikibase outside Wikidata
- Wikimedia Commons
- basically covered by Wikidata roadmap
- elsewhere
- out of scope?
- Wikimedia Commons
- annotations
- https://github.com/EwaKowalczuk/annot
- out of scope?
- out of scope: activities not related to research
- won't do: dumps, because mw:Wikidata Toolkit exists
- Wikidata would be well suited to store legislative data. The wording of laws is very formulaic and could be made even more understandable through a relevant ontology
- out of scope?
- as a complement to pulling info into Wikidata, nanopublications may provide a push mechanism
- out of scope?
- see also https://groups.google.com/a/wikimedia.de/forum/#!topic/wd4r/wxOlJuiSqqE
- ...
- Lexicalization
- a tool - perhaps as part of Wikidata games - as per Task 5 in UPM suggestions
- creating natural language text, e.g. for bot-created Wikipedia articles, or database content more generally
Visualizations
edit- especially for use cases
- e.g. of interactions of ancient philosophers: http://projetjourdain.org/network/index.html
- pathways
- family trees
- map with geo-located Wikidata items http://tools.wmflabs.org/wikidata-todo/around.html?lat=50.938056&lon=6.956944&radius=15
Content mashups
edit- gadget that displays additional information when viewing a Wikipedia article
Identification keys
edit- Task leader: ?
- interested: MfN
- Build a framework for identification keys
- e.g. for minerals, chemicals, taxa, developmental stages, historical figures
- see also Wikidata:WikiProject Identification Keys
Identification keys for research outputs
edit- interested: UPS
- OK for Center for Data Science with Laboratory LRI of National Center for Scientific Research (Q280413) : implementation of a workflow to help the scientists to connect the challenges in the sciences and the artefacts of scientists via Wikidata. contact person: Karima Rafes
Identification keys for chemistry
edit- Identification keys for small molecules
- interested: MU
- Identification keys for Analytical Chemistry
- interested: UPS
- compounds and analytical techniques
- OK for Center for Data Science with Laboratory Analytical Chemistry : Build a framework for establishing identification keys for the Analytical Chemistry. contact person: Karima Rafes
- Identification keys for metabolic pathways
- interested: MU
- Identification keys for minerals
- interested: MfN
Identification keys for taxa
edit- interested: MfN
Versioning
edit- consider versioning effects of Wikibase and external data or vocabularies
- simple and robust solution: via versioned dumps
- theoretical alternative: via version history of Wikidata pages
- perhaps try out on small scale, e.g. with data extracted from living meta analyses
- out of scope?
Improve existing code
edit- can't be a task of its own, but has to be kept in mind
- code hardening, updating and enhancement of community-developed tools and gadgets
- Wikidata game
Risk assessment
edit- required in several ways
- see GDocs for respective workpackage
Impact
edit- mention tight integration with other Wikimedia projects
- mention readership and editor/developer community of Wikipedia
- focus on research-related topics (e.g. WP:COMPBIO Popular pages, or WP:MED500, or OA)
- mention readership and editor/developer community of Wikipedia
- stress existing VRE aspects of Wikidata
- mention DBpedia as a hub for research and centerpiece of the Linked Open Data interconnections (Graph 2014)
- mention Freebase's merge into Wikidata
- see also Wikidata:WikiProject Freebase
- mention multilingual
- mention GLAM-Wiki/ OpenGLAM
- mention the Letters of support
- mention open drafting
- mention forkability
- manage inconsistencies
- mention Wikidata:Glossary
- L. Candela, D. Castelli, P. Pagano (2013) Virtual Research Environments: An Overview and a Research Agenda. Data Science Journal, Vol. 12, p. GRDI75-GRDI81 DOI: http://dx.doi.org/10.2481/dsj.GRDI-013
- "The lack of community uptake has cascading effects on the entire VRE research domain, in particular its impacts on sustainability."
- "services and resources that are aggregated and offered by such infrastructures should, as much as possible, be independent of a specific application domain and “designed for reuse”."
- "Actually, Virtual Research Environments should be linked to existing infrastructures with both roles of consumer, i.e., VREs should benefit from the services offered by these infrastructures, and provider, i.e., the resources produced in the context of the VRE operation should contribute to the infrastructures offering."
- "Virtual Research Environments should be designed, since the beginning, to promote uptake, ensure usability, and guarantee sustainability."
- "As regards usability, Virtual Research Environments building should be mainly a community building process rather than a technology development process. This implies that the focus should be primarily on using technology to identify and rationalise workflows, procedures, and processes characterising a certain research scenario rather than having technology invading the research scenario and distracting effort from its real needs. As far as sustainability is concerned, it is fundamental that the resulting VRE service is conceived as a vital tool in the community of practice it is dedicated to. Moreover, sustainability is further enhanced whenever the VRE is perceived as a useful tool in the context of larger research initiatives and communities so to benefit from economies of scale, i.e., savings gained by an incremental level of production, and economies of scope,i.e., savings gained by producing two or more distinct goods when the costs of doing so is less than that of producing each of them separately."
- these notes about the paper are now incorporated into the Concept and approach section
Open drafting
edit- The very fact of drafting the proposal in the open creates a level of community engagement that can rarely be found in the contexts of research projects not yet funded.
- Ioannidis JPA (2014) How to Make More Published Research True. PLoS Med 11(10): e1001747. doi:10.1371/journal.pmed.1001747
- Gurwitz D, Milanesi E, Koenig T (2014) Grant Application Review: The Case of Transparency. PLoS Biol 12(12): e1002010. doi:10.1371/journal.pbio.1002010
- comment in Mietchen D (2014) The Transformative Nature of Transparency in Research Funding. PLoS Biol 12(12): e1002027. doi:10.1371/journal.pbio.1002027
- Khan R, Goodman L, Mittelman D (2014) Dragging scientific publishing into the 21st century. Genome Biology 15:556 doi:10.1186/s13059-014-0556-2