Wikidata:Flemish art collections, Wikidata and Linked Open Data/Whitepaper/fr
Le livre blanc ci-dessous a été écrit en octobre 2015, pour le projet Linked Open Data publication with Wikidata. Dans ce projet, plusieurs musées flamands ont contribué à la collecte de données pour Wikidata. Ce livre blanc peut aussi être intéressant pour des tiers - en particulier d'autres institutions culturelles qui envisagent d'importer des données dans Wikidata.
Tous les retours sont les bienvenus. Vous pouvez poser des questions et des remarques sur la page de discussion.
Cet article a été écrit par Sandra Fauconnier ([[User:Spinster]]), Bert Lemmens et Barbara Dierickx (PACKED vzw). il est publié sous la licence Creative Commons Paternité-Partage à l'identique.
Notes des contributeurs
Ce livre blanc est le premier projet livré de Linked Open Data publication with Wikidata (D1. Whitepaper open data management in Wikidata) et a été décrit dans le concept du projet comme suit :
PACKED vzw and Wikimedia develop a shared vision on how data managers in museums may publish collection data on Wikidata, and how they can update this collection data at regular intervals. This includes:
- how collection data is modeled in Wikidata;
- how the upload and export to Wikidata works.
This vision is recorded in a whitepaper that is proposed to the steering committee of this project. The aim of the whitepaper is to match the objectives of Wikimedia and museums, and Wikidata.
Ce document a été écrit en néerlandais puis traduit en anglais afin de le distribuer plus largement dans la communauté de Wikimedia.
Sur la base de ce livre blanc, d'autres actions dans le cadre du projet sont prévues. Cela se fera en fonction des bénévoles de Wikimedia qui effectuent ces actions.
Récapitulatif | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Ce document comprend trois chapitres.
La conclusion de ce livre blanc contient une analyse des forces, faiblesses, opportunités et menaces de l'utilisation de Wikidata pour rendre vos données collectées disponibles sur le Web. Wikimedia, Wikipedia et WikidataCe chapitre introduit Wikidata en relation avec son 'parent', le projet Wikimedia et sa très bien connue sœur Wikipedia. Ces informations peuvent être assez familiers aux contributeurs expérimentés de Wikimedia. WikimediaWikimedia is a world-wide movement with a mission to make educational content freely available to the world. Its most well-known project is Wikipedia, the free encyclopedia. More than a dozen less well-known projects (such as Wikimedia Commons, Wikidata and the software MediaWiki) belong to the same family.[1] All Wikimedia projects are edited by a community of users (mostly volunteers) and run on MediaWiki software. All contributions to the projects fall under a Creative Commons license, so that the contents can be re-used, edited, copied and freely distributed - also for commercial purposes. The various projects in the Wikimedia movement support each other, and exchange content where possible. Wikimedia Commons is the free media bank where images, sound files and videos of the other Wikimedia projects are hosted. The free database Wikidata serves as a central 'data hub' between the various Wikimedia projects. = WikipédiaWikipédia est la projet le plus connu dans le mouvement Wikimédia. Cet encyclopédie libre fut fondée en 2001 et existe en 290 langues différents (dès octobre 2015). Il est un travail de référence; dit autrement, il résume des informations d'autres sources (i.e. des sources sécondaires); c'est pas un lieu pour les travaux inédits. In September–October 2015, the English Wikipedia had almost 5 million articles, edited by nearly 30,000 active users.[2] WikidataWikidata was founded in October 2012. It's a free knowledge base that intends to cover the whole world; Wikidata is designed to be readable by humans and by machines. It provides data in all languages covered by the Wikimedia projects. The data on Wikidata is even more 'free' than the information on other Wikimedia projects: Wikidata's data is made available under the Creative Commons CC0 license, in order to allow third parties to re-use the data as freely as possible (see below for more info about this license). The data on Wikidata is explicitly intended to be universally useful and re-usable by anyone, for any purpose – from educational to commercial. Wikidata is financially supported by, among others, Google. In 2015, Google has de-activated its own free knowledge base, Freebase.[3] Presumably the Google Knowledge Graph will, therefore, be partly based on data in Wikidata. Not only large search engines can use Wikidata's data; because of the CC0 license every developer is allowed to do so. On Wikipedia itself, the first steps are being taken to re-use data from Wikidata in Wikipedia articles. With the scripting language Lua, Wikipedia editors can retrieve data from Wikidata and use it in so-called templates (such as infoboxes). In various Wikipedias, this development happens at a different pace; decisions for retrieving data from Wikidata depend on consensus within the local Wikipedia community. Some Wikipedias are more open to experimentation in this area than others. Examples:
External developers can pull data out of Wikidata via its API. Since mid-2015, Sparql queries are possible as well. Items in WikidataWikidata consists of a collection of interlinked items. An item refers to an object from the real world (e.g. a building, artwork or person), a concept or an event. Each item has a label (a human-readable name) in at least one language, has its own identifier, and contains metadata. Each item also has its own page on Wikidata and a unique Q number. Examples of items on Wikidata:
In October 2015, Wikidata had approximately 15 million unique items. The 'earliest' items result from a mass import of concepts that correspond with existing Wikipedia articles. After this large-scale import of concepts from Wikipedia, volunteers and bots (scripts) add thousands of new items daily. What is Wikidata's scope? Which items belong on Wikidata, and which do not? For an explanation of notability on Wikidata, see below (section 'Notable') Data in WikidataWikidata has started with (open) data, imported from all Wikipedias in the world. Every topic with a Wikipedia article received its own Wikidata item. Metadata about these topics was retrieved from the infoboxes on Wikipedia, mostly by bots or scripts, and added to the items as properties or statements. In addition to this information from Wikipedia, external (open) data is constantly added to Wikidata. Examples: Artists who don't have a Wikipedia article yet, but who are relevant (they are mentioned in reputable publications/sources) and who fulfil a structural role – for instance, they are creators of artworks on Wikidata. An example is the Dutch artist Klaas Kloosterboer (Klaas Kloosterboer (Q19938879)), a few of whose artworks are already included on Wikidata. All Dutch Rijksmonumenten (protected buildings) have their own Wikidata item, also if the building or monument doesn't have a Wikipedia article yet. Example: the Nederlands Hervormde Kerk in Sprang-Capelle (Dutch Reformed Church (Q17441238)). The Wikidata community is open to larger-scale uploads of open data, initiated by external institutions. More information about this can be found in the project page Wiki Loves Open Data. This project page mentions the expectations of the Wikidata community with regards to donated open data. This ideally has the following characteristics:
Several of these points are explained further below. FreeAs a museum you’re only going to add metadata to Wikidata of which you have renounced all copyrights. In no way you claim the right to use the data. You do this in order to lower the barrier as much as possible to re-use and modify this data in other applications, and to have your data as widely disseminated/distributed as possible. Licenses The data that finds its way to Wikidata should be available under a Creative Commons CC0 license, in order for third parties to freely re-use the data. This implies that anyone can use the data for any purpose, from educational to commercial applications. CC0 allows anyone to re-use what is published under the license without any kind of attribution. The license does not legally demand it, in contrast to e.g. a CC-BY license it is not an intrinsic part of the licensing conditions. For those who re-use the data that is made available, CC0 is a clear added value. When you’re combining data from different sources, enrich or rework it, you may otherwise end up with a quite complex way of giving proper attribution. Usage guidelines Yet there are also re-users out there with the best intentions, who would really like to give you attribution for what you made available (to them). How do you solve this, when you’re not legally demanding it through a license? One solution lies in the shift from making it a legal requirement, into a social request[4]. You allow the fact that re-use doesn’t have to happen, but also bring a moral element to the table that does encourage to do so. This can happen by creating usage guidelines, and adding them to your data. Europeana was one of the pioneers to, alongside its metadata publication policy using CC0, create such usage guidelines. The Europeana Usage Guidelines for Metadata contain the following:
Dan Cohen, the Executive Director of the Digital Public Library of America, referred to this kind of attribution request as follows:
He also mentions that when you’re cynical, you could state that people with bad intentions may go and do bad things with all that open data. But that’s an intrinsic characteristic of the web. It doesn’t really matter what license you are going to apply to your information; someone with bad intentions will take it anyway. We worry so much about possible misuse, that the use which is in line with what we hope to achieve, almost goes by unnoticed. It is the experience of the DPLA that a lot of software developers who do things with their data, make proper attribution out of their own intention, based on their DPLA Data Use Best Practices. And this despite the fact that the CC0-license did not force them to do so.
The DPLA and Europeana are not lone soldiers with this way of working: others have followed in their tracks. Tate recently opened up metadata on about 70.000 works of art and 3.500 artists. They did this using a CC0-license, but next to the license declaration a user also finds the heading ‘Usage guidelines’. The American institutions MoMA and Cooper-Hewitt followed the same idea. (See the annex of this whitepaper for a summary.) In the project Linked Open Data publication with Wikidata, such usage guidelines are also an integral part of the Data Usage Agreement signed by the project partners. Although these guidelines are non-binding, they will be published alongside the different published datasets. The minimal usage guidelines that are proposed in this project, make clear:
Depending on the own intentions, these may of course be further specified or extended per institution. NotableWhich information belongs on Wikidata, which doesn't? In its initial phase, Wikidata has the following two goals:
An item is acceptable on Wikidata if and only if it fulfills at least one of these two goals, that is if it meets at least one of the criteria below:
The data provided in the project Linked Open Data publication with Wikidata usually falls under goal #2 and criterion #2. In a few cases, Wikipedia articles already exist about artworks in the contributing collections, which makes these fall under goal #1 and criterion #1. The same principle applies to the artists who created the artworks in the contributing collections. By the end of 2015, no significant problems have emerged in terms of notability of unique artworks (paintings, drawings, installations, unique sculptures) from public collections, described in art historical literature and/or whose creator is mentioned in reputable sources. Notability of items produced in series is under discussion. Individual copies of a massively spread publication (like a book) don't belong on Wikidata. In October 2015, there's no community consensus or 'best practice' yet on describing individual prints of e.g. engravings or lithographs, of which different copies may exist in various art collections. Individual everyday objects in art collections are usually not relevant or notable enough for Wikidata. An exception can be made if it is a very special object, described individually in independent and reputable sources. A good example is the Saliera (Cellini Salt Cellar (Q697208)) by Benvenuto Cellini, in the collection of the Kunsthistorisches Museum Wien. This object has a Wikipedia article in many languages and is covered in many publications. ReferencedDatasets in the project Linked Open Data publication with Wikidata have a number of (persistent) URI fields. These URIs refer to sources for a number of statements, such as the creator of an artwork, its date and inventory number. QueryableDatasets in the project Linked Open Data publication with Wikidata are made available for upload/import as static (csv) files. In the upload phase, they can be queried and edited/cleaned by Wikidata volunteers. Ideally, such datasets are made available publicly and permanently, like MoMA and Tate have done via GitHub, and/or are queryable through an API, like Europeana. The participating collections of the project Linked Open Data publication with Wikidata plan to build their own data hub which will also make this possible. EditableWikidata, like any Wikimedia project, is filled and maintained by a community of (mainly) volunteers. A data donor maintains and controls its own data in its own databases and platforms/websites. After import to a lively platform like Wikidata, the data will be enriched and edited there by volunteers and bots. Data donors must be aware of this, and must be open to additions and improvements by external parties on Wikidata. MaintainedWhat is the relationship between the work of volunteers on Wikidata and the carefully compiled and controlled content by experts? And how does the Wikidata community find partners that want to engage in long-term management of their information on Wikidata? Who edits (art and culture on) Wikidata? In September 2015, Wikidata had 25,917 registered users, of which 6,126 can be considered active.[5] These users edit Wikidata mainly in their free time. According to their areas of interest, Wikidata volunteers organize themselves (among other things) in so-called WikiProjects. In the area of the visual arts, the following WikProjects are active:
Most volunteers of the cultural WikiProjects are passionate, well-read culture and art lovers; some work for cultural institutions. Several edit Wikidata both by hand and with bots they have written. A typical Wikimedia volunteer keeps track of watchlists of articles to which he/she actively contributes. With these watchlists, a user can keep an eye on recent edits in his/her area of interest and can react promptly if needed. Most Wikimedia projects, including Wikidata, have a specific workflow and dedicated volunteers who focus on countering vandalism. Nonsensical edits are typically reverted within a few minutes. Musées en tant qu'autorités On Wikidata, museums are considered authorities on their own collections. Wikidata strives to have reliable sources for all statements; references to reputable (online) publications by museums are very suitable for this. The data donation in the project Linked Open Data publication with Wikidata contains a number of such references: persistent links to artwork descriptions on the participating museums' websites. These references are included in the upload to Wikidata. Of course, after the upload volunteers can add other references to these statements as well. Informations contradictoires et 'incorrectes' sur Wikidata? Contradictory statements can find a place on Wikidata. When various (reputable) sources contradict each other (for instance in the attribution of an artwork or the birth date of a person), both statements – with their own sources – can be included. If one statement is considered 'the most up-to-date', it is possible to give it a 'preferred' status. For historical and research purposes, it is very interesting to maintain (and not delete!) an older, 'deprecated' statement that might have been considered 'true' in the past. It must be emphasized that references to sources are crucial. If a volunteer or an expert consider a specific statement 'true' or 'false', he/she must be able to support this claim with an independent, trustworthy source. Contributing incomplete and unchecked data? Perfect is the enemy of good, said Voltaire (or Montesquieu?). Most museum collection websites show only a selection of the whole collection: only those items that have been approved by curators or other museum staff. These items have been thoroughly checked and are considered good enough for publication. However, collection management databases usually contain a multitude of information that has not been 'cleaned' or checked yet. Is it acceptable (or even preferable) to also publish 'messy', potentially unclean and incomplete data online, and to include this in a data donation? MoMA, for instance, has decided to do this when publishing its collection data under a CC0 license on GitHub. Sufficiently 'clean' and checked data is marked 'curator approved' in the dataset; other data is included too, but without this notice. Fiona Romeo, MoMA's Director of Digital Content & Strategy states that this decision is inspired by a proven need from researchers[6]:
Also literally: 'authority control' on Wikidata Wikidata is a knowledge base that wants to cover the whole world. In October 2015, Wikidata contained, for instance, almost 3 million people. In order to clearly identify and distinguish all items, and in order to embed Wikidata as a data hub among other information sources, authority control is a central activity for many Wikidata volunteers. Wikidata items are linked, as much as possible, with reputable external authority databases. An up-to-date overview of the many authority properties on Wikidata can be found on Wikidata:List of properties/Generic#Authority control. In the visual arts, the following selection of authority databases is (among many others) referred to on Wikidata:
If external, donated datasets (like the datasets in Linked Open Data publication with Wikidata) already contain a matching with external authority databases (example: artist names are already linked with their identifiers in RKDartists), then this helps to find the exactly correct people on Wikidata. It's important to note that, because of clarity, concepts are (almost) always linked directly; only on a second level a connection is made to an external authority database. Artworks, for instance, are described as follows: <item (artwork)> creator (P170) <item (person)> RKDartists ID (P650) identifier in RKDartists Wikidata and new terminology Creating and maintaining authority databases is time-intensive and often requires long discussions between publishers and experts (such as between Getty, RKD and the international cultural sector for the maintenance of the Art and Architecture Thesaurus). Wikidata, on the contrary, is quick to react to new developments: new terminology emerges quickly, for instance as soon as a Wikipedia article is written about a topic for the first time. For instance, the concept internet art (Q1569950) is not present yet in the Art and Architecture Thesaurus, but does have an item on Wikidata. ConclusionWhen data from cultural institutions is uploaded to Wikidata, it will be edited there by Wikidata volunteers. Therefore, Wikidata should be considered an external and open platform for a dialogue about museum objects and cultural heritage. In that regard, Wikidata is complementary to – and does not replace – the institutions' own, internally managed collection databases and websites. The dialogue on Wikidata consists of enrichment, corrections, and the juxtaposition of different opinions. The Wikidata community expects a certain commitment from the museum and heritage community to effectively participate in that dialogue. Ideally this also involves regular updates of the data, for instance when new items have been added to the collection. What do both the Wikidata and museum/heritage community benefit from such a dialogue? The next chapter investigates costs and benefits for Wikimedia projects, for museums and for society in the broad sense. Costs and benefits of contributing data to WikidataThis chapter presents an analysis of the costs and benefits of using Wikidata to make information about works of art available as open data. The analysis is made for museums, the Wikimedia community and society as a whole. For museums / art collectionsThe benefits of a data donation to Wikidata for museums were also explained in a screencast specifically made for the project Linked Open Data publication with Wikidata. (Screencast is in Dutch.)
For WikimediaThe Wikimedia community regularly works together with social and cultural organisations, from UNESCO and the British Library to educational institutions and museums all over the world. Collaboration with cultural partners happens under the umbrella of the GLAMwiki project (Galleries/Libraries/Archives/Museums). Liam Wyatt, the first Wikipedian in Residence (British Museum, 2010): “We’re doing the same thing, for the same reason, for the same people, in the same medium. Let’s do it together.” A data donation from art collections brings the following benefits and costs for Wikimedia and Wikidata:
For society (funding body, commissioning organisation, the public, taxpayers...)
Crosswalk, data delivery and upload methodThis chapter proposes a minimal input profile for an artwork on Wikidata. Which CC0 data is needed to create a minimal but complete Wikidata item for an artwork? How is this described and recorded in Wikidata? Next, we briefly outline the method of delivery and how the delivered data is integrated in Wikidata by volunteers (October 2015). CrosswalkA good example artwork item is A reading by Emile Verhaeren (Q21012032), a painting by Théo van Rysselberghe.
Data delivery and upload methodAt the time of writing this whitepaper (October 2015), no tools exist yet for easy/straightforward mass upload of external data to Wikidata. At this point, upload of external data is performed by an experienced volunteer. He/she usually uploads the data with a custom script (a 'bot). Such an upload bot can handle data in many different formats. The most important condition is a clear and logical structure in the dataset. Among others, the following delivery formats are suitable. If a data donor has any questions or doubts, he/she is advised to contact the uploading volunteer.
The order of fields/metadata in the dataset is not important. After receiving the dataset, the uploading volunteer and his/her bot will
Maintenance of the data, manual changes and updates, and RDF extraction will be covered in this project's handbook (Deliverable 4, December 2015). ConclusionWikidata is still a young project. It was launched in 2012 and is continuously in development, both in terms of technology and data modelling. In October 2015, many questions and issues remain open, such as
Advantages of early participation in Wikidata:
SWOT analysis of data donation to Wikidata
Annex: Usage guidelinesMoMAThe Museum of Modern Art has made its collection data available as a csv file, under CC0 license, on GitHub: https://github.com/MuseumofModernArt/collection This includes a README files with additional usage guidelines: https://github.com/MuseumofModernArt/collection/blob/master/README.md In brief, these guidelines outline:
TateTate has made its collection data available as csv files, under CC0 license, on GitHub: https://github.com/tategallery/collection This includes a README files with additional usage guidelines: https://github.com/tategallery/collection/blob/master/README.md In brief, these guidelines outline:
Cooper-HewittThe Smithsonian Cooper-Hewitt, National Design Museum has made its collection data available as csv files, under CC0 license, on GitHub: https://github.com/cooperhewitt/collection This includes a README files with additional usage guidelines: https://github.com/cooperhewitt/collection/blob/master/README.md In brief, these guidelines outline:
Notes and references (if not included as hyperlinks) |
- ↑ For an overview of all Wikimedia projects, see https://wikimediafoundation.org/wiki/Our_projects
- ↑ A Wikimedia user is considered 'active' when (s)he makes an edit at least five times per month. For extensive statistics, see https://stats.wikimedia.org/EN/TablesWikipediaEN.htm
- ↑ Google's announcement on deactivating Freebase: https://plus.google.com/109936836907132434202/posts/bu3z2wVqcQc
- ↑ See a blog post by Dan Cohen from November 2013, available at http://www.dancohen.org/2013/11/26/cc0-by/
- ↑ Statistics about Wikidata's editors can be found at http://stats.wikimedia.org/wikispecial/EN/TablesWikipediaWIKIDATA.htm
- ↑ “...their most important wish is that online access to museum databases to be provided as quickly as possible, even if the records are imperfect or incomplete.” From http://www.rin.ac.uk/our-work/using-and-accessing-information-resources/discovering-physical-objects-meeting-researchers-