Le collezioni d'arte fiamminga, Wikidata e i Linked Open Data: uno studio

This page is a translated version of the page Wikidata:Flemish art collections, Wikidata and Linked Open Data/Whitepaper and the translation is 21% complete.
Outdated translations are marked like this.
Adoration of the Magi (Q5861089) by Peter Paul Rubens (Q5599), collection Royal Museum of Fine Arts Antwerp (Q1471477)
Adoration of the Magi (Q5861089) di Peter Paul Rubens (Q5599) (collezione Royal Museum of Fine Arts Antwerp (Q1471477))

Questo studio è stato realizzato nell'ottobre del 2015 per il progetto Linked Open Data publication with Wikidata, all'interno del quale numerosi musei delle Fiandre hanno contribuito a Wikidata con dati provenienti dalle proprie collezioni. Questo studio potrebbe, dunque, interessare eventuali istituzioni culturali interessate a donare i propri dati a Wikidata.

Domande, critiche e suggerimenti sono benvenuti e possono essere pubblicati in questa pagina.

Questo studio è stato condotto da Sandra Fauconnier (User:Spinster), Bert Lemmens e Barbara Dierickx (PACKED vzw) ed è pubblicato con licenza Creative Commons Attribuzione-Condividi allo stesso modo.

Nota degli autori

Questo studio è il primo rapporto del progetto Linked Open Data publication with Wikidata (D1. Whitepaper open data management in Wikidata) ed è stato descritto come segue nella presentazione originale del progetto:

PACKED vzw e Wikimedia sviluppano una visione condivisa su come coloro che gestiscono dati nei musei possano pubblicare i propri dati su Wikidata e su come possano aggiornarli a intervalli regolari. Ciò include:

  • come mappare i dati su Wikidata;
  • come funziona l'importazione e l'esportazione di dati su Wikidata.

Questa visione viene trascritta in uno studio che viene proposto al comitato di direzione del progetto. L'obiettivo dello studio è di combinare gli obiettivi di Wikimedia, dei musei e di Wikidata.

Lo studio è stato originariamente pubblicato in olandese e dopo tradotto in inglese, per permetterne una maggiore diffusione all'interno della comunità wikimediana.

In base a questo studio, sono state pianificate ulteriori azioni che verranno applicate in accordo con i volontari wikimediani che si impegneranno a portare avanti questi obiettivi.

Sommario
Questo studio si compone di tre capitoli.
  1. Prima di tutto, introdurremo brevemente il movimento Wikimedia e i progetti Wikipedia e Wikidata.
  2. Di seguito, una descrizione degli esempi relativi a opere d'arte presenti su Wikidata, indirizzata prevalentemente ai musei, alla comunità wikimediana e alla società nel suo complesso.
  3. In ultimo, un percorso che spieghi come i dati vengono mappati in relazione al data model di Wikidata.

Le conclusioni di questo studio contengono inoltre un'analisi SWOT sull'uso di Wikidata per rendere i vostri dati disponibili sul web.

Wikimedia, Wikipedia e Wikidata

Questo capitolo introduce Wikidata in relazione ai suoi progetti "fratelli", incluso il più famoso fra loro: Wikipedia. Queste informazioni potrebbero essere già note agli utenti wikimediani esperti.

Wikimedia

 

Wikimedia è un movimento diffuso in tutto il mondo e ha la missione di rendere liberamente disponibili materiali educativi a chiunque sul pianeta. Il suo progetto più conosciuto è Wikipedia, l'enciclopedia libera, ma circa una dozzina di altri progetti (come Wikimedia Commons, Wikidata e il software MediaWiki) appartengono alla stessa famiglia di progetti.[1]

Tutti i progetti Wikimedia sono gestiti da una comunità di utenti (quasi tutti volontari) e sono basati sul software MediaWiki. Tutte le modifiche apportate sono pubblicate con una licenza Creative Commons, che permette di riutilizzarle, modificarle, copiarle e distribuirle liberamente, anche per scopo commerciale.

I vari progetti della comunità wikimediana si sostengono a vicenda e si scambiano contenuti laddove possibile. Wikimedia Commons è un archivio di file multimediali liberi che ospita le immagini, i file audio e video di tutti i progetti Wikimedia. Wikidata è un database libero che funge da "hub di dati" per i vari progetti Wikimedia.

Wikipedia

Wikipedia è il progetto Wikimedia più famoso. La libera enciclopedia è stata creata nel 2001 ed esiste in circa 290 lingue (dato aggiornato a ottobre 2015). Si tratta di una fonte secondaria o terziaria: questo significa che riassume e sistematizza informazioni provenienti da altre fonti (a loro volta secondarie o terziarie), dunque non può ospitare ricerche originali.

Fra settembre e ottobre 2015, Wikipedia in inglese ha circa 5 milioni di voci, modificate da circa 30 000 utenti attivi.[2]

Wikidata

Wikidata è stata creata nell'ottobre del 2012. Si tratta di una base di conoscenza libera che raccoglie dati su tutto lo scibile umano, programmata per essere consultabile da umani e macchine. Tutti i dati sono forniti in tutte le lingue supportate dai progetti Wikimedia.

I dati di Wikidata sono anche più "liberi" delle informazioni gestite sugli altri progetti Wikimedia, dal momento che sono pubblicati con licenza Creative Commons CC0, che permette di poter riutilizzare i dati nel modo più libero possibile (vedi più sotto per maggiori informazioni). I dati di Wikidata sono esplicitamente intesi per essere utili e riutilizzabili universalmente da chiunque per qualunque scopo, da quelli educativi a quelli commerciali.

Wikidata è supportata finanziariamente, fra gli altri, da Google. In 2015, Google ha discontinuato la propria base di conoscenza, Freebase.[3] Presumibilmente, il Google Knowledge Graph si baserà anche su dati provenienti da Wikidata. Ma non sono solo i grossi motori di ricerca a poterli utilizzate: grazie alla licenza CC0, ogni sviluppatore può utilizzarli.

 
Le voci sui formaggi di Wikipedia in francese sono arricchite da infobox che traggono i dati da Wikidata. Vedi per esempio Pont-l'évêque

Anche su Wikipedia si stanno compiendo i primi passi per riutilizzare i dati di Wikidata all'interno delle voci. Attraverso il linguaggio di programmazione Lua, gli utenti di Wikipedia possono prendere i dati da Wikidata e utilizzarli nei cosiddetti "template" (una specie di tabelle sinottiche). L'adozione di questi dati varia in base alla versione di Wikipedia, visto che ogni comunità decide autonomamente se e quali dati importare da Wikidata. Alcune Wikipedia sono più aperte alla sperimentazione rispetto ad altre.

Esempi:

 

Gli sviluppatori possono estrarre dati da Wikidata tramite la sua API e, da metà 2015, anche condurre query SPARQL.

Gli elementi di Wikidata

Wikidata consiste di una collezioni di elementi interconnessi fra loro. Un elemento si riferisce a un oggetto reale (per esempio un edificio, un'opera d'arte o una persona), un concetto o un evento. Ogni elemento ha una etichetta (ossia un titolo leggibile da un umano) in almeno una lingua, ha un proprio numero identificativo e contiene dei metadati. Ogni elemento ha una propria pagina su Wikidata, identificata da un numero unico preceduto da una Q. Alcuni esempi di elementi su Wikidata sono:

A ottobre 2015, Wikidata ha circa 15 milioni di elementi. I "primi" elementi sono stati frutto di un'importazione di massa di concetti a cui corrispondeva almeno una voce di Wikipedia. Dopo la fine di questa prima fase, volontari e bot (script automatizzati) aggiungono migliaia di elementi ogni giorno.

Qual è lo scopo di Wikidata? Quali elementi possono essere inseriti su Wikidata e quali no? Per saperne di più sulle linee guida in proposito di Wikidata, vedi la sezione "Rilevante".

I dati su Wikidata

Wkidata ha iniziato con i dati (aperti) importati dalle Wikipedie di tutto il mondo. Ogni argomento trattato da una voce di Wiipedia ha ricevuto un elemento apposito. I metadati riguardanti questi argomenti sono stati ottenuti tramite gli infobox presenti, spesso tramite bot o tramite script, e aggiunti agli elementi come "proprietà" o "dichiarazioni".

In aggiunta a queste informazioni, altri dati (aperti) provenienti anche dall'esterno vengono costantemente aggiunti a Wikidata. Per esempio:

  • gli artisti che non hanno ancora una voce su Wikipedia ma rientrano nei criteri di rilevanza (ossia, sono citati in pubblicazioni rilevanti) oppure che soddisfano determinate necessità strutturali - per esempio, gli autori di opere d'arte presenti su Wikidaya. Un esempio è dato dall'artista olandese Klaas Kloosterboer (Klaas Kloosterboer (Q19938879)), di cui alcune opere d'arte sono già incluse su Wikidata
  • tutti i Rijksmonumenten (monumenti olandesi sottoposti a protezione) hanno un elemento su Wikidata, anche se alcuni di loro non hanno ancora una voce su Wikipedia. Un esempio è la Nederlands Hervormde Kerk di Sprang-Capelle (Dutch Reformed Church (Q17441238)).

La comunità di Wikidata è favorevole a caricamenti su larga scala di dati, operati da istituzioni esterne. Maggiori informazioni in materia si trovano alla pagina Wiki Loves Open Data.

Questa pagina di progetto spiega quali sono le aspettative della comunità di Wikidata rispetto alle donazioni di dati aperti che, idealmente, dovrebbero avere le seguenti caratteristiche:

  • Liberi, più specificamente pubblicate con licenza CC0, per permettere il loro uso, riutilizzo e modifica senza ostacoli legali;
  • Rilevanti, ossia riguardanti elementi che sono o potrebbero essere rilevanti per i progetti Wikimedia (vedi anche Wikidata:Rilevanza);
  • Completi di fonti, permettendo la verifica dei dati e l'eventuale pubblicazione di più dati riferibili a più fonti;
  • Ricercabili, permettendo a vari tool di effettuare processi di pubblicazione e manutenzione il più automatizzati possibile;
  • Modificabili, come il resto dei contenuti dei progetti Wikimedia, il che implica che anche voi dovete essere aperti all'idea di integrare eventualmente le modifiche migliorative nei vostri dati;
  • Mantenuti, non semplicemente regalati una volta e poi dimenticati - Wikidata è qui per creare una relazione di lungo periodo.

Alcuni di questi punti sono spiegati più dettagliatamente di seguito.

Free

As a museum you’re only going to add metadata to Wikidata of which you have renounced all copyrights. In no way you claim the right to use the data. You do this in order to lower the barrier as much as possible to re-use and modify this data in other applications, and to have your data as widely disseminated/distributed as possible.

Licenses

 

The data that finds its way to Wikidata should be available under a Creative Commons CC0 license, in order for third parties to freely re-use the data. This implies that anyone can use the data for any purpose, from educational to commercial applications.

CC0 allows anyone to re-use what is published under the license without any kind of attribution. The license does not legally demand it, in contrast to e.g. a CC-BY license it is not an intrinsic part of the licensing conditions. For those who re-use the data that is made available, CC0 is a clear added value. When you’re combining data from different sources, enrich or rework it, you may otherwise end up with a quite complex way of giving proper attribution.

Usage guidelines

Yet there are also re-users out there with the best intentions, who would really like to give you attribution for what you made available (to them). How do you solve this, when you’re not legally demanding it through a license? One solution lies in the shift from making it a legal requirement, into a social request[4]. You allow the fact that re-use doesn’t have to happen, but also bring a moral element to the table that does encourage to do so. This can happen by creating usage guidelines, and adding them to your data.

 

Europeana was one of the pioneers to, alongside its metadata publication policy using CC0, create such usage guidelines. The Europeana Usage Guidelines for Metadata contain the following:

  • Give credit where credit is due: give attribution to who made available the digital material and its metadata information. These organisations play a crucial role in collecting, managing and harmonising data so that they may become widely available and interoperable.
  • Metadata is dynamic: consider using the metadata via the Europeana APIs or by linking: the metadata can be subject to change (renewal, additions, ..) and thus can best be used through a dynamic call method.
  • Mention your modifications of the metadata and make your modified metadata available under the same terms: don’t claim to be the source of the data if it already comes from another provider.
  • Please note that you use the metadata at your own risk: if you would use non-complete information you do this at your own risk - Europeana collects metadata that was delivered to them by third parties.
 

Dan Cohen, the Executive Director of the Digital Public Library of America, referred to this kind of attribution request as follows:

I have been calling this implied or ethical attribution. Or, if you like short and snappy symbols, think of it as CC0 (+BY) rather than CC-BY (or ODB-BY).

He also mentions that when you’re cynical, you could state that people with bad intentions may go and do bad things with all that open data. But that’s an intrinsic characteristic of the web. It doesn’t really matter what license you are going to apply to your information; someone with bad intentions will take it anyway. We worry so much about possible misuse, that the use which is in line with what we hope to achieve, almost goes by unnoticed. It is the experience of the DPLA that a lot of software developers who do things with their data, make proper attribution out of their own intention, based on their DPLA Data Use Best Practices. And this despite the fact that the CC0-license did not force them to do so.

I think CCO (+BY) is the best of both worlds: the data in a free-flowing environment that enables creativity and reuse, with attribution still maintained by the vast majority of people who consider themselves part of a social contract. – Dan Cohen

The DPLA and Europeana are not lone soldiers with this way of working: others have followed in their tracks. Tate recently opened up metadata on about 70.000 works of art and 3.500 artists. They did this using a CC0-license, but next to the license declaration a user also finds the heading ‘Usage guidelines’. The American institutions MoMA and Cooper-Hewitt followed the same idea. (See the annex of this whitepaper for a summary.)

In the project Linked Open Data publication with Wikidata, such usage guidelines are also an integral part of the Data Usage Agreement signed by the project partners. Although these guidelines are non-binding, they will be published alongside the different published datasets. The minimal usage guidelines that are proposed in this project, make clear:

  1. that the material only contains metadata (no images);
  2. that attribution of the collection of origin is appreciated;
  3. that deceptive and irresponsible use is not appreciated;
  4. that changes and improvements to the material may be occuring and may be integrated by the project partners;
  5. that (re)use of the material happens at own risk.

Depending on the own intentions, these may of course be further specified or extended per institution.

Notable
 
  OK. Saint Jerome (Q2566840) by Hieronymus Bosch (Q130531), collection Museum of Fine Arts Ghent (MSK) (Q2365880), has six Wikipedia articles and is notable on Wikidata under goal #1 and notability criterion #1.
 
  OK. Shipwreck (Q20020184) by Joseph Vernet (Q315819), collection Groeningemuseum (Q1948674)      has no Wikipedia articles, but it is a painting by a notable painter and is part of a notable museum collection. This artwork belongs on Wikidata according to its goal #2 and notability criterion #2.

Which information belongs on Wikidata, which doesn't? In its initial phase, Wikidata has the following two goals:

  1. to centralize interlanguage links across Wikimedia projects
  2. and to serve as a general knowledge base for the world at large.

An item is acceptable on Wikidata if and only if it fulfills at least one of these two goals, that is if it meets at least one of the criteria below:

  1. It contains at least one valid sitelink to a page on Wikipedia, Wikivoyage, Wikisource, Wikiquote, Wikinews, Wikibooks, Wikidata or Wikimedia Commons.
  2. It refers to an instance of a clearly identifiable conceptual or material entity. The entity must be notable, in the sense that it can be described using serious and publicly available references.
  3. It fulfills some structural need, for example: it is needed to make statements made in other items more useful.

The data provided in the project Linked Open Data publication with Wikidata usually falls under goal #2 and criterion #2. In a few cases, Wikipedia articles already exist about artworks in the contributing collections, which makes these fall under goal #1 and criterion #1. The same principle applies to the artists who created the artworks in the contributing collections.

By the end of 2015, no significant problems have emerged in terms of notability of unique artworks (paintings, drawings, installations, unique sculptures) from public collections, described in art historical literature and/or whose creator is mentioned in reputable sources.

Notability of items produced in series is under discussion. Individual copies of a massively spread publication (like a book) don't belong on Wikidata. In October 2015, there's no community consensus or 'best practice' yet on describing individual prints of e.g. engravings or lithographs, of which different copies may exist in various art collections.

Individual everyday objects in art collections are usually not relevant or notable enough for Wikidata. An exception can be made if it is a very special object, described individually in independent and reputable sources. A good example is the Saliera (Cellini Salt Cellar (Q697208)) by Benvenuto Cellini, in the collection of the Kunsthistorisches Museum Wien. This object has a Wikipedia article in many languages and is covered in many publications.

Referenced

Datasets in the project Linked Open Data publication with Wikidata have a number of (persistent) URI fields. These URIs refer to sources for a number of statements, such as the creator of an artwork, its date and inventory number.

Queryable

Datasets in the project Linked Open Data publication with Wikidata are made available for upload/import as static (csv) files. In the upload phase, they can be queried and edited/cleaned by Wikidata volunteers. Ideally, such datasets are made available publicly and permanently, like MoMA and Tate have done via GitHub, and/or are queryable through an API, like Europeana. The participating collections of the project Linked Open Data publication with Wikidata plan to build their own data hub which will also make this possible.

Editable

Wikidata, like any Wikimedia project, is filled and maintained by a community of (mainly) volunteers. A data donor maintains and controls its own data in its own databases and platforms/websites. After import to a lively platform like Wikidata, the data will be enriched and edited there by volunteers and bots. Data donors must be aware of this, and must be open to additions and improvements by external parties on Wikidata.

Maintained

What is the relationship between the work of volunteers on Wikidata and the carefully compiled and controlled content by experts? And how does the Wikidata community find partners that want to engage in long-term management of their information on Wikidata?

 
Wikidata coders and volunteers at work during the Wikidata & Culture hackathon held in Amsterdam in 2014.

Who edits (art and culture on) Wikidata?

In September 2015, Wikidata had 25,917 registered users, of which 6,126 can be considered active.[5] These users edit Wikidata mainly in their free time. According to their areas of interest, Wikidata volunteers organize themselves (among other things) in so-called WikiProjects. In the area of the visual arts, the following WikProjects are active:

  • WikiProject Visual Arts – 14 active volunteers in October 2015 – discussions on 'best practices' for the description of visual art on Wikidata
  • WikiProject Sum of All Paintings – 24 active volunteers in October 2015 – strives to create a Wikidata item for every notable painting in the world

Most volunteers of the cultural WikiProjects are passionate, well-read culture and art lovers; some work for cultural institutions. Several edit Wikidata both by hand and with bots they have written.

A typical Wikimedia volunteer keeps track of watchlists of articles to which he/she actively contributes. With these watchlists, a user can keep an eye on recent edits in his/her area of interest and can react promptly if needed. Most Wikimedia projects, including Wikidata, have a specific workflow and dedicated volunteers who focus on countering vandalism. Nonsensical edits are typically reverted within a few minutes.

Museums as authorities

On Wikidata, museums are considered authorities on their own collections. Wikidata strives to have reliable sources for all statements; references to reputable (online) publications by museums are very suitable for this.

The data donation in the project Linked Open Data publication with Wikidata contains a number of such references: persistent links to artwork descriptions on the participating museums' websites. These references are included in the upload to Wikidata. Of course, after the upload volunteers can add other references to these statements as well.

Contradictory and 'wrong' information on Wikidata?

Contradictory statements can find a place on Wikidata. When various (reputable) sources contradict each other (for instance in the attribution of an artwork or the birth date of a person), both statements – with their own sources – can be included. If one statement is considered 'the most up-to-date', it is possible to give it a 'preferred' status. For historical and research purposes, it is very interesting to maintain (and not delete!) an older, 'deprecated' statement that might have been considered 'true' in the past.

It must be emphasized that references to sources are crucial. If a volunteer or an expert consider a specific statement 'true' or 'false', he/she must be able to support this claim with an independent, trustworthy source.

Contributing incomplete and unchecked data?

Perfect is the enemy of good, said Voltaire (or Montesquieu?). Most museum collection websites show only a selection of the whole collection: only those items that have been approved by curators or other museum staff. These items have been thoroughly checked and are considered good enough for publication.

However, collection management databases usually contain a multitude of information that has not been 'cleaned' or checked yet. Is it acceptable (or even preferable) to also publish 'messy', potentially unclean and incomplete data online, and to include this in a data donation? MoMA, for instance, has decided to do this when publishing its collection data under a CC0 license on GitHub. Sufficiently 'clean' and checked data is marked 'curator approved' in the dataset; other data is included too, but without this notice. Fiona Romeo, MoMA's Director of Digital Content & Strategy states that this decision is inspired by a proven need from researchers[6]:

...a bigger cultural shift lies behind the records that are marked “not curator approved.” More than half of the records included in this data release have incomplete information and may contain errors. There is established evidence that researchers want online access to collection records as quickly as possible, “whatever the perceived imperfections or gaps in the records.” We therefore decided that we would share this work in progress in order to provide a more comprehensive view of MoMA’s collection.

 
Peter Paul Rubens (Q5599)      has identifiers on Wikidata for the following authority files: Library of Congress Authors, VIAF, ISNI, GND, Freebase, ULAN, Bibliothèque Nationale de France, CANTIC, BBC Your Paintings, RKDartists, Sandrart.net, Oxford Biography Index, NTA, Digitale Bibliotheek der Nederlandse Letteren, EMLO, genealogics.org, Smithsonian American Art Museum, National Portrait Gallery, Web Gallery of Art, BALat, SUDOC, Artsy, Open Library, NGA, NNDB.

Also literally: 'authority control' on Wikidata

Wikidata is a knowledge base that wants to cover the whole world. In October 2015, Wikidata contained, for instance, almost 3 million people. In order to clearly identify and distinguish all items, and in order to embed Wikidata as a data hub among other information sources, authority control is a central activity for many Wikidata volunteers.

Wikidata items are linked, as much as possible, with reputable external authority databases. An up-to-date overview of the many authority properties on Wikidata can be found on Wikidata:List of properties/Generic#Authority control.

In the visual arts, the following selection of authority databases is (among many others) referred to on Wikidata:

  • People and organisations: ULAN, RKDartists, VIAF
  • Places: Thesaurus of Geographic Names
  • Concepts/keywords: Art and Architecture Thesaurus

If external, donated datasets (like the datasets in Linked Open Data publication with Wikidata) already contain a matching with external authority databases (example: artist names are already linked with their identifiers in RKDartists), then this helps to find the exactly correct people on Wikidata.

It's important to note that, because of clarity, concepts are (almost) always linked directly; only on a second level a connection is made to an external authority database.

Artworks, for instance, are described as follows:

<item (artwork)> creator (P170) <item (person)> RKDartists ID (P650) identifier in RKDartists
<item (artwork)> depicts (P180) <item> Art & Architecture Thesaurus ID (P1014) identifier in the Art and Architecture Thesaurus

Wikidata and new terminology

Creating and maintaining authority databases is time-intensive and often requires long discussions between publishers and experts (such as between Getty, RKD and the international cultural sector for the maintenance of the Art and Architecture Thesaurus). Wikidata, on the contrary, is quick to react to new developments: new terminology emerges quickly, for instance as soon as a Wikipedia article is written about a topic for the first time. For instance, the concept internet art (Q1569950)      is not present yet in the Art and Architecture Thesaurus, but does have an item on Wikidata.

Conclusion

When data from cultural institutions is uploaded to Wikidata, it will be edited there by Wikidata volunteers. Therefore, Wikidata should be considered an external and open platform for a dialogue about museum objects and cultural heritage. In that regard, Wikidata is complementary to – and does not replace – the institutions' own, internally managed collection databases and websites.

The dialogue on Wikidata consists of enrichment, corrections, and the juxtaposition of different opinions. The Wikidata community expects a certain commitment from the museum and heritage community to effectively participate in that dialogue. Ideally this also involves regular updates of the data, for instance when new items have been added to the collection.

What do both the Wikidata and museum/heritage community benefit from such a dialogue? The next chapter investigates costs and benefits for Wikimedia projects, for museums and for society in the broad sense.

Costi e benefici di contribuire a Wikidata

Questo capitolo presenta un'analisi dei costi e dei benefici dell'uso di Wikidata per creare informazioni relative alle vostre opere d'arte, sotto forma di open data. L'analisi si rivolge ai musei, alla comunità wikimediana e alla società nel suo complesso.

For museums / art collections

The benefits of a data donation to Wikidata for museums were also explained in a screencast specifically made for the project Linked Open Data publication with Wikidata. (Screencast is in Dutch.)

Benefits

Costs

Low cost

The Wikidata platform is a cheap, solid infrastructure to make data available for re-use on the web. The platform offers a robust interface and API to manage data and to integrate it in other applications.

Museums save on the development and technical expertise to create and manage a similar platform in-house.

Public outreach

Wikidata and the related Wikimedia projects have a big public outreach. The Wikidata platform itself has 20,500 active users. This is a very high number, since it is a specific and technically experienced crowd who often develop applications by themselves. Because of the known brand and openness of Wikidata, developers from outside Wikimedia also easily find their way to the platform (cf. the recent support by Google).

Through Wikidata, museums may find access to a vast, international and very diverse audience. Through Wikidata, the collection reaches a much wider audience than a museum can realise through its own educational and communication departments.

Creativity

Next to pure outreach, the 20,500 active Wikidata users offer something extra, namely the capacity to open new perspectives on the collection. Wikidata reaches a specific public of ‘digital natives’ who interact very spontaneous and creatively in mixing and processing data in web applications. In addition this is also a young group, that could help museums to make the necessary translation towards younger people.

Through Wikidata, museums find access to a precious reservoir of creativity that can help them to communicate (about) collections in an efficient way to this growing groups of ‘digital natives’.

Context

Museums’ data doesn’t end up in Wikidata in a specialised ‘silo’, but in a knowledge base that covers the entire world. This means that the data is placed in a wide and rich context. Wikidata also contains metadata of (and links to authorities about) subjects that are depicted in artworks, like historical events and famous persons. Wikidata is a data hub of external authority- and terminology sources like VIAF and the Art and Architecture Thesaurus. Lastly, artworks become part of artists’ oeuvres across the boundaries of individual collections.

Loss of data exclusivity

The data that you publish on Wikidata is released for re-use under a CC0 license. Museums distance themselves explicitly from any form of exclusivity over the data they publish to the Wikidata platform. Once the data has been published under CC0, this license is irrevocable.

Through Wikidata, museums donate data to society. By doing so, museums discard any possible model of gain/benefit based on making data available for re-use to third parties. Specifically this means for example that a museum can not gain revenue by selling licenses on the collection data. Museums can also not claim a share in the profit that third parties make with re-using data in a product.

Time investment for updates

The museums who publish data in Wikidata are expected to be dedicated to regularly update that data and engage in the dialogue with other (non-professional) Wikidata users regarding the correctness and completeness of the data.

This requires that data managers in the museums get acquainted with the Wikidata interfaces through which they can manage data and engage themselves as Wikipedians. This engagement is voluntarily, but essential in order to facilitate the re-use of the data.

Time investment for data cleanup

Data from collection management systems needs to be cleaned and normalised before it can be uploaded to the Wikidata platform. With the tools that are currently available to do so, this requires a considerable amount of manual work – including exports, linking to authorities, normalisation and mapping of data, etc.

This requires the data manager to have specific expertise to transfer this from one system to the other and have familiarity with specific tools for data cleaning.

The project Linked Open Data publication with Wikidata is part of a broader strategy to renew the digital infrastructure of the Flemish art museums. This group of museums has already gone through an intensive trajectory in which data was cleaned, normalised, enriched and identified with persistent URIs. Because of this, a large part of this cost is already covered and the data can be uploaded to the Wikidata platform with minimal adaptations.

For Wikimedia

The Wikimedia community regularly works together with social and cultural organisations, from UNESCO and the British Library to educational institutions and museums all over the world. Collaboration with cultural partners happens under the umbrella of the GLAMwiki project (Galleries/Libraries/Archives/Museums).

Liam Wyatt, the first Wikipedian in Residence (British Museum, 2010): “We’re doing the same thing, for the same reason, for the same people, in the same medium. Let’s do it together.”

A data donation from art collections brings the following benefits and costs for Wikimedia and Wikidata:

Benefits

Costs

Mission-compliant content

Institutions donate data that is compliant with the mission of the Wikimedia movement and that falls within the notability criteria of Wikidata.

High-quality data

Institutions donate data that is carefully edited, of high quality, containing references to reliable sources.

Learning opportunity

A data donation project like Linked Open Data publication with Wikidata offers a learning opportunity for the Wikidata community, in the areas of collaboration with experts from the cultural sector, data modelling, import and re-use.

Stepping stone for inclusion of more free content

The donated data will hopefully encourage the participating museums/collections to enrich and add more free information (like images and other media).

Storage

The donated data takes up server space of the Wikimedia Foundation and, therefore, generates a certain cost in terms of storage, maintenance and energy use.

Time

The donated data is edited, managed and maintained by volunteers. This asks for a significant amount of goodwill and investment of people's free time.

Need for new tools

The donation of data creates a need for more and better tools (e.g. for mass uploading, measuring and updating data); there is possibly no budget and time for this in the short run.

For society (funding body, commissioning organisation, the public, taxpayers...)

Benefits

Costs

Visibility

A data donation increases the findability, visibility and accessibility of the heritage that is preserved in (Flemish) museums (in accordance with the mission of the Flemish Art Collection).

Low cost

The use of an open platform is also cost-efficient in terms of infrastructure for society as a whole.

Open data

Donating data to Wikidata is a tangible implementation of the European PSI directive. Data produced by institutions financed with public money is made available as open data, in a sustainable way. Museums can use an additional instrument, the Wikidata API, to re-use the data in their own daily work.

Wikipedia articles

The donated data can serve as a good source for article writers on Wikipedia. In this way, the public is also benefitting: a source of information for writers is provided, and a worldwide audience can consult this information. The same is true for developers who use the Wikidata API.

An organisation like the Flemish Art Collection wants to present Flemish art heritage to an international audience. Through a platform like Wikidata, information on such works ends up in Wikipedia articles in different languages. Since an artwork will only have one Wikidata item to fall back on. Every article writer sources from this one, same record that is containing authoritative information.

Investment of taxpayers' money

The data is produced with public budgets.

Percorso su consegna e caricamento dei dati

Questo capitolo propone un profilo minimo per un'opera d'arte su Wikidata. Quali dati in CC0 sono necessari per creare un profilo minimo ma esaustivo su un'opera d'arte? E come vengono registrati e descritti questi dati?

Next, we briefly outline the method of delivery and how the delivered data is integrated in Wikidata by volunteers (October 2015).

Crosswalk

A good example artwork item is A reading by Emile Verhaeren (Q21012032), a painting by Théo van Rysselberghe.

Metadata to deliver in dataset.
  • Black = required
  • Blue = optional

(Please note: data which was not provided in the original dataset, might be added by volunteers later)

In Wikidata Remarks
Title of the work, in at least one language Label of the Wikidata item (in the language(s) provided Titles may be provided in more than one language, if the dataset clearly indicates which language(s). Alternative titles are welcome too; these are stored as aliases in Wikidata and improve findability of each item.
Creator(s) of the work Property creator (P170) Preferably formatted as Firstname Lastname, or first name and last name in separate fields. Formatting as Lastname, Firstname is less clear.

The uploader must 'match' all creators in Wikidata – i.e. the exact, correct person with his/her Q number must be found on Wikidata). Therefore it helps if the original dataset already contains a match with Wikidata. Wikidata also stores VIAF, ULAN and RKDartists identifiers; via this way, artists are also findable. Providing (a selection of) these IDs in the original dataset is thus also very helpful.

Type of object (what kind of artwork is it?) Property instance of (P31) The uploader must 'match' all object types in Wikidata – i.e. the exact, correct concept with its Q number must be found on Wikidata. Therefore it helps if the original dataset already contains a match with Wikidata. Wikidata also stores AAT identifiers; via this way, types and genres of artworks are also findable. Providing (a selection of) these IDs in the original dataset is thus also very helpful.
Collecting institution Property collection (P195) The uploader must 'match' all organisations in Wikidata – i.e. the exact, correct organisation with its Q number must be found on Wikidata. Therefore it helps if the original dataset already contains a match with Wikidata. Wikidata also stores ISIL identifiers; via this way, types and genres of artworks are also findable. Providing (a selection of) these IDs in the original dataset is thus also very helpful.
Inventory number in this collection Property inventory number (P217)
Date (if known) Property inception (P571) Wikidata only supports precise dates that coincide with
  • ...
  • day
  • month
  • year
  • decade
  • century
  • millennium
  • ...

Dates like 'circa 1856' and 'between 1574 and 1603' can't be expressed in Wikidata. Approximations will be used in a data import.

URL / URI
  1. As a reference for property inventory number (P217) and/or inception (P571) and/or creator (P170) and/or collection (P195)
  2. Property described at URL (P973)
Preferably persistent / a permalink. A URL that refers to more information about the artwork.
Image(s) Property image (P18) Like all other Wikimedia projects, Wikidata only includes (links to) images and media that are available under a free license (public domain, CC-BY, CC-BY-SA) and that are uploaded to the media bank Wikimedia Commons.
What is depicted on the artwork? Property depicts (P180)
Location of the artwork Property location (P276) In most cases the same as the collecting institution, but may be different (e.g. in case of long-term loan, art in public space...)
Material Property made from material (P186)
Genre Property genre (P136)
Art movement Property movement (P135)
Width Property width (P2049)
Height Property height (P2048)
Weight Property mass (P2067)

Data delivery and upload method

At the time of writing this whitepaper (October 2015), no tools exist yet for easy/straightforward mass upload of external data to Wikidata.

At this point, upload of external data is performed by an experienced volunteer. He/she usually uploads the data with a custom script (a 'bot). Such an upload bot can handle data in many different formats. The most important condition is a clear and logical structure in the dataset.

Among others, the following delivery formats are suitable. If a data donor has any questions or doubts, he/she is advised to contact the uploading volunteer.

  • csv, tsv or otherwise 'delimited' text file
  • excel file, Google Sheet, OpenOffice spreadsheet...
  • XML or RDF
  • a Microsoft Access export (though a 'flat' file is preferred)
  • a publicly accessible API

The order of fields/metadata in the dataset is not important.

After receiving the dataset, the uploading volunteer and his/her bot will

  • match people, organisations, concepts in the datasets with Wikidata (i.e. look up the corresponding Q items)
  • create missing people, organisations and concepts on Wikidata
  • check if any artworks in the dataset already exist on Wikidata and if so, make sure that they are not duplicated during the upload
  • add each new artwork, one by one, as a new Wikidata item with its own Q number; all delivered metadata will be added as properties, according to the principles outlined in the crosswalk above
  • persistent links / URIs are added to/as properties and references where relevant

Maintenance of the data, manual changes and updates, and RDF extraction will be covered in this project's handbook (Deliverable 4, December 2015).

Conclusion

Wikidata is still a young project. It was launched in 2012 and is continuously in development, both in terms of technology and data modelling.

In October 2015, many questions and issues remain open, such as

  • data modelling of artworks produced in series
  • precise and correct dates for artworks
  • tools for import, statistics, maintenance and mutual updates of donated data

Advantages of early participation in Wikidata:

  • A large data donation places the issues above higher on the agenda of the Wikidata community
  • Practical experiences and arguments of early data donors can influence future developments
  • Earlier donated data will, in case of changes to Wikidata's data model, be updated towards the new situation, together with all other data on Wikidata.

SWOT analysis of data donation to Wikidata

Strengths Weaknesses
  • Reaching larger audiences
  • Save costs on infrastructure
  • Dialogue leads to enrichment of information
  • New applications are made possible
  • Lack of time from data managers and volunteers
  • Lack of functionality of Wikidata's UI
  • Lack of appropriate tools
  • Imperfect data model on Wikidata
Opportunities Threats
  • Dialogue on Wikidata leads to involvement with the collection
  • Enabling and monitoring re-use
  • Enabling contributions on other Wikimedia projects
  • Loss of exclusive control on data
  • Loss of interest from data managers and volunteers

Allegato: Linee guida sull'uso

MoMA

The Museum of Modern Art has made its collection data available as a csv file, under CC0 license, on GitHub: https://github.com/MuseumofModernArt/collection

This includes a README files with additional usage guidelines: https://github.com/MuseumofModernArt/collection/blob/master/README.md

In brief, these guidelines outline:

  • Images not included
  • Research in progress
  • Give attribution to MoMA
  • Do not misrepresent the dataset

Tate

Tate has made its collection data available as csv files, under CC0 license, on GitHub: https://github.com/tategallery/collection

This includes a README files with additional usage guidelines: https://github.com/tategallery/collection/blob/master/README.md

In brief, these guidelines outline:

  • Give attribution to Tate
  • Metadata is dynamic
  • Mention your modifications of the Metadata and contribute your modified Metadata back
  • Be responsible

Cooper-Hewitt

The Smithsonian Cooper-Hewitt, National Design Museum has made its collection data available as csv files, under CC0 license, on GitHub: https://github.com/cooperhewitt/collection

This includes a README files with additional usage guidelines: https://github.com/cooperhewitt/collection/blob/master/README.md

In brief, these guidelines outline:

  • Give credit where credit is due. Give attribution to Smithsonian Cooper-Hewitt, National Design Museum
  • Metadata is dynamic
  • Mention your modifications of the Metadata and contribute your modified Metadata back
  • Be responsible
  • Ensure that you do not mislead others or misrepresent the Metadata or its sources
  • Please note that you use the Metadata at your own risk

Note e riferimenti

  1. Per maggiori informazioni sui progetti Wikimedia, visitate https://wikimediafoundation.org/wiki/Our_projects
  2. Un utente Wikimedia è considerato "attivo" quando compie almeno cinque edit al mese. Per approfondire le statistiche, vedi https://stats.wikimedia.org/EN/TablesWikipediaEN.htm
  3. L'annuncio di Google sulla disattivazione di Freebase: https://plus.google.com/109936836907132434202/posts/bu3z2wVqcQc
  4. See a blog post by Dan Cohen from November 2013, available at http://www.dancohen.org/2013/11/26/cc0-by/
  5. Statistics about Wikidata's editors can be found at http://stats.wikimedia.org/wikispecial/EN/TablesWikipediaWIKIDATA.htm
  6. “...their most important wish is that online access to museum databases to be provided as quickly as possible, even if the records are imperfect or incomplete.” From http://www.rin.ac.uk/our-work/using-and-accessing-information-resources/discovering-physical-objects-meeting-researchers-