Wikidata:Outil de sources principales
L'outil de sources principales permet un flux de travail pour traiter les dons de données à Wikidata, où les contributeurs de Wikidata peuvent examiner, modifier ou rejeter les données offertes à la communauté. Le flux de travail est intégré dans Wikidata.
Essayez
- allez dans les
Gadgets
; - cocher la case
Primary Sources
(sectionWikidata-centric
), puis cliquer sur le boutonSave
en bas de la page ; - dans la barre latérale, cliquer sur l'icône d'engrenage à côté du lien
Random Primary Sources item
; - sélectionner un jeu de données, comme
strephit-confident
,soweego
oufreebase
; - il est possible d'essayer l'outil de trois façons :
- le lien
Primary Sources list
de la barre latérale (sectionTools
)- cliquer sur le bouton
Charger
pour un aperçu rapide du contenu du jeu de données ; - optionnellement, insérer un identifiant de propriété pour filtrer les déclarations ;
- cliquer sur le bouton
- le lien
Random page
dans la barre latérale affiche un élément à améliorer ; - un élément quelconque.
- le lien
N.B. : les gadgets Wikidata sont écrits en JavaScript, assurez-vous d’autoriser son exécution dans votre navigateur.
Proposition d'amélioration de l'outil de sources principales
The first and current version of the primary sources tool (PST) stems from the donation of Freebase by Google.
Based on community feedback collected since its deployment as a Wikidata gadget,[2][3][4] the StrepHit team submits here a radical uplift proposal, which will lead to the next version of the tool.
Veuillez noter que les maquettes mentionnées dans ce document sont disponibles sur phab:M218.
Version 1 de la base de code
Objectifs
The general goal is to make the tool self-sustainable. To achieve this, the highest priority is given to:
- Web standards;
- stability, i.e., choices driven by the Wikidata stable interface policy;[7]
- programming languages adoption by the Wikimedia community.
Additionnellement, l'outil devrait aussi devenir le premier choix pour les publications de données par des fournisseurs tiers[8]. Ceci rend d'autant plus important le besoin d'une procédure de publication normalisée.
Flux de travail de l'utilisateur
The user can approve or reject a new statement suggested by the tool:
- given an Item page, the suggested statement is highlighted with a blue background;
- the user can approve or reject it by clicking either on the "approve claim" or on the "reject claim" links respectively;
- after that, the page will update with the new statement in the first case or without it in the second one.
Identically, the tool can suggest new references for an existing statement:[9]
- the new reference is highlighted with a blue background;
- the user can approve or reject it by clicking either on the
approve reference
or on thereject reference
links, respectively; - the user can also see a preview tooltip that shows where the source came from by clicking on
preview reference
;[10][11] - if the dataset contains fine-grained provenance information, e.g., the text snippet where the suggested statement was extracted,[12] the preview tooltip will highlight that exact piece of information;[13]
- in case the interaction between the front end and the back end is not smooth, a tooltip will show up with an alert message.[14]
Configuration des sources primaires
- When the user clicks on the gear icon next to the
Random Primary Sources item
link (cf. the section below) in the main menu on the left sidebar, a modal window will open;[15][16][17] - the user can search and select which dataset to use;
- essential information is shown, namely
Dataset description
,Missing statements
andTotal statements
; - the user can either
Save
orCancel
the new settings.
Elément aléatoire des sources primaires
- The user can jump to a random Item containing suggested statements by clicking on the
Random Primary Sources item
link located in the main menu on the left sidebar; - the item will be randomly picked from the datasets selected in the Primary Sources configuration.
Voir les sources primaires
- The user can browse through the suggested statements grouped by property by clicking on the appropriate property link below the
Browse Primary Sources
menu on the left sidebar; - the user can move back to the top of the page by clicking on the
back to top
link right below theBrowse Primary Sources
menu on the left sidebar.
Outil basé sur des filtres
A similar workflow applies to a filter-based tool, located in the Tools
menu of the left sidebar.
- When the user clicks on the
Primary Sources filter
link (currentlyPrimary Sources list
), a modal window will open;[18] - the user can view a table of suggested statements with eventual references by building filters in several ways:
Domain of interest
: the user starts typing a domain they are interested in and gets autocompletion based on simple constraints, typically the instance of (P31) property. For example, list all the Items that are a chemical compound (Q11173);Property
: the user starts typing a property they are interested in and gets autocompletion based on property labels. This filter then only shows suggested statements with the given property. For instance, list all the date of birth (P569);SPARQL Query
: this filter is intended for power users and accepts arbitrary SPARQL queries;Source language
: shows only statements in the selected language;Dataset
: lets the user pick one or more specific dataset to use, similarly to Primary Sources configuration.
After building the filters, the tool shows a table of statements, where the user can either approve
or reject
suggestions, after a preview
of the reference source, as per the "User workflow" section. The approval or reject actions can be blocked if the source preview is not opened.[19]
Architecture
Implementation côté serveur
Format des données
The tool currently accepts datasets serialized in QuickStatements (Q20084080). While it is indeed a very compact format, useful to upload large datasets, it is totally non-standard: the only available documentation is contained in the QuickStatements service page itself.[20] Hence, we foresee the support of stable formats for both the self-sustainability of the project and a standardized data donation workflow. Still, we will keep the QuickStatements support.
Datasets from third-party providers should be serialized in RDF and follow the Wikidata RDF data model.[21] We believe this is the most standard way for 2 reasons:
Composant principal
Given these premises, a Wikidata Query Service[24] instance is a good fit for the back end, since it:
- uses an RDF triple store, i.e., Blazegraph as the storage engine;[25]
- is claimed to be a stable Wikidata public API;[26]
- is written in Java, probably a more adopted programming language compared to the current implementation in C++;
- has facilities to upload datasets in Wikidata RDF dump format;[27]
- exposes APIs to access data via SPARQL, specifically useful for both the domain filter and the query text box features.[28]
The main tool will support full statements, while the filter-based tool should be fed with truthy statements.
API d'ingestion
The Ingestion API is responsible for the interaction with third-party data providers. Incoming datasets are first validated against the Wikidata RDF data model. It will then provide the following facilities for datasets:
- upload;
- update;
- drop.
API de sélection
The Curation API is responsible for the interaction with Wikidata users, with 2 main services. It will suggest claims for addition and flag the rejected suggestions in the back-end storage.
Implementation côté interface utilisateur
The main self-sustainability goal is to avoid breaking the front end whenever a change is made in the Wikidata user interface. To achieve this, the current gadget will become a MediaWiki extension for Wikibase (Q16354758). A major refactoring of the code base is essential and will:
- include unit tests. Failures are expected in case of changes in the Wikidata user interface, and will break the Wikidata build instead of breaking the tool;
- make a clear distinction between the interaction with the back end and the users;
- port the HTML templates.
Le code sera séparé en composants typiques d'une extension MediaWiki, écrits en PHP et JavaScript.
Composant PHP
Le composant PHP sera uniquement chargé de la configuration de l'extension. Tout le reste sera pris en charge par le composant JavaScript.
Composant JavaScript
Le composant Javascript va :
- créer le rendu final des modèles. Plus précisément, il ajoutera le modèle aux déclarations des éléments là où c’est nécessaire.
- gérer les interactions avec l’utilisateur. Plus précisément, il va :
- notifier le serveur de l’outil lors d’une approbation ou d’un rejet de suggestion de déclaration ou référence ;
- ajouter une déclaration ou référence à Wikidata via l’API MediaWiki ;[29]
- implémenter les fonctionnalités décrites dans la section « Flux de travail de l’utilisateur ».
Références
- ↑ Pellissier Tanon, T., Vrandečić, D., Schaffert, S., Steiner, T., and Pintscher, L. (2016, April). From Freebase to Wikidata: The Great Migration. In Proceedings of the 25th International Conference on World Wide Web (pp. 1419-1428). ACM (2016)
- ↑ RFC: Semi-automatic Addition of References to Wikidata Statements
- ↑ Wikidata_talk:Primary_sources_tool
- ↑ phab:project/view/2788
- ↑ https://github.com/Wikidata/primarysources/tree/d24ca9ecef71b93f0feed27f6fdc4e479c929004/backend
- ↑ https://github.com/Wikidata/primarysources/tree/d24ca9ecef71b93f0feed27f6fdc4e479c929004/frontend
- ↑ Wikidata:Stable Interface Policy
- ↑ Data donation: 3. Work with the Wikidata community to import the data
- ↑ Help:Sources
- ↑ PST - Wireframe 1
- ↑ PST - Mockup 1
- ↑ m:Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References
- ↑ PST - Mockup 2
- ↑ PST - Mockup 3
- ↑ PST configuration - Mockup 1
- ↑ PST configuration - Mockup 2
- ↑ PST configuration - Mockup 3
- ↑ PS filter - Mockup 1
- ↑ PS filter - Mockup 2
- ↑ https://tools.wmflabs.org/wikidata-todo/quick_statements.php
- ↑ mw:Wikibase/Indexing/RDF_Dump_Format#Data_model
- ↑ https://www.w3.org/TR/PR-rdf-syntax/
- ↑ Wikidata:Stable_Interface_Policy#Stable_Data_Formats
- ↑ mw:Wikidata_query_service
- ↑ phab:T166503
- ↑ Wikidata:Stable_Interface_Policy#Stable_Public_APIs
- ↑ https://github.com/wikimedia/wikidata-query-rdf/tree/master/dist/src/script
- ↑ phab:T166512
- ↑ https://www.wikidata.org/w/api.php