Wikidata:Property proposal/original catalog description

original catalog description edit

Originally proposed at Wikidata:Property proposal/Commons

Descriptioncontent of field labeled "description" in original institutional catalog metadata
Representsdescription (Q3024326)
Data typeMonolingual text
DomainWikimedia Commons media entities
Allowed valuestext strings
Example 1File:120th Infantry band at practice - DPLA - 690447140723f385df06cb61e5fba986 (page 1).jpgThe 120th Infantry band practicing at Camp Atterbury, Indiana during World War II (No. 348 A).
Example 2File:"Don't let this surprise you"A postcard from the Ken Levin Toledo Postcard Collection, donated by Toledo resident, Ken Levin. The collection contains picture postcards about the Toledo area. Mr. Levin’s collection was published by the Toledo Blade in a book entitled “You Will Do Better in Toledo: From Frogtown to Glass City”, edited by Sandy and John R. Husman.
Example 3File:"Cato" on constitutional "money" and legal tender. In 12 no. from the Charleston mercury - DPLA - 20b4f8f4b36bd2c33baf94189f183c71 (page 30).jpgIssued in a case; "Cato" on constitutional "money" and legal tender. In 12 no. from the Charleston mercury, Charleston, Evans & Cogswell, 1862, Duke University Libraries
Expected completenessalways incomplete (Q21873886)

Motivation edit

DPLA has recently contributed almost 2.5 million files to Wikimedia Commons, and I am migrating all of the metadata to SDC. When it comes to museum or archival collections, we have ways of representing almost all of the original catalog metadata in structured data, except for "description", which is an element common to many metadata schemas (see, e.g., Q3024326 vs.Q1200750). There are some basic reasons we would want to do this, such as: being able to use Lua to utilize this data in templates across other wikis; being able to distinguish between the Commons template "description" field and SDC caption field, which often contains Wikimedian-generated descriptions, and original catalog descriptions provided by the institution; ability to add structured references, qualifiers, and language codes to these statements; the ability to detect and synchronize changes to metadata.

I am being very careful here to describe and scope this proposed property as only the original element from the source metadata which is known as "description", and not just a property for descriptions generally. There was previously a failed proposal at Wikidata:Property proposal/description which suffered due to a lack of agreement about what was even being discussed, with the suggestion that it could be used for anything from abstracts to song lyrics. This proposal is only about GLAM metadata (or equivalent), and only to be used for importing data from the source catalog.

As parallels, I will point out that Dublin Core has only 15 basic elements, and one of them is "description". Similarly, an element named "description" is also present in both the Europeana Data Model and the DPLA Metadata Application Profile, the data models used by the two largest metadata aggregators in the world. My goal here is to demonstrate that while this is a property composed of strings of text, it is not "unstructured", as it is an element integral to many standard metadata schemas. In particular, as I am referencing the DPLA work on Commons, I have given three real examples above, each from a different institution, of institution-provided "description" fields which we would immediately add as SDC statements. There are currently over one million files which could have this property added as SDC statements in DPLA uploads alone. This property could just as reasonably be applied to Wikidata items describing cataloged museum objects as well.

Finally, as there is usually concern raised about copyright in these discussions, I note that while this property could clearly encompass values that would be copyrighted, such as if an institution does not make its metadata open, these should not be added. That is not a reason to not allow such a property, though, as there are already millions of potential values either from public domain US federal government sources or institutions (such as all 4,000+ DPLA contributing institutions in the United States) that make their original catalog descriptions available under CC0. Dominic (talk) 16:34, 1 November 2021 (UTC)[reply]

Discussion edit

If the proposal is only for Commons, I leave it to Commons to determine if they want to have it. It's clear that the same content could easily be added otherwise there and the information in the samples should be in structured form as well. The proposal does create a bulky "must carry" load on Commons SDC server. --- Jura 11:47, 23 November 2021 (UTC)[reply]

  Strong support A wonderful proposal! There needs to be a way to record different stages of a media description and be able to add structured data to it, for example designate the creator to record the provenance. Reading the previous proposal, I understand that it is needed in both Commons and Wikidata, as Wikidata stores collections items, and this is a central metadata element. – Susanna Ånäs (Susannaanas) (talk) 08:28, 7 December 2021 (UTC)[reply]

  Strong support I'm glad to see this proposal here, as I believe it is a very important one. In my activities as a volunteer, but as well as in my present and past positions, within GLAM and Culture team (User:GFontenelle_(WMF)) and Wiki Movimento Brasil (User:GFontenelle_(WMB)), several GLAM staff, especially museums workers and librarians, from different languages and communities, have shared distinct versions of this with me. In their understanding, a description like this one would improve the content of Wikidata and leverage it to a new level -- especially as, sometimes, is difficult to understand what the WD item is really about when the WD label description is too short and very objective. On a similar note, this would also help take SDC to a new level, as it would open up new possibilities of usage with Structured Data on Wikimedia Commons and, even, on Wikipedia, if we consider the Structured Data Across Wikimedia project. Commons is, indeed, a different project, but the SD part of it is connected to Wikidata and, therefore, there are some necessities of the project there that we need to address here. However, as I said above, I believe it would also serve specifically Wikidata quite a lot as well. --GiFontenelle (talk) 17:34, 13 December 2021 (UTC)[reply]

  • Maybe we should add in the description as well that this is for SDC. Is the SDC team ok with it? Not that we drown their server with unstructured data that is already in the template. --- Jura 11:59, 16 December 2021 (UTC)[reply]
  •   Support, an important property for art.--Arbnos (talk) 12:02, 20 December 2021 (UTC)[reply]
  •   Comment I marked this as "ready", but Jura removed that and added the above concern about SDC. I know next to nothing about SDC; would somebody please clarify what else is needed here if anything? Is there a particular communication protocol with Commons people that we should follow on such things? ArthurPSmith (talk) 20:27, 20 December 2021 (UTC)[reply]
Just contact the SDC team and ask them to confirm they are ok with the duplication of the text data on the SDC server. Apparently they are already more concerned about its stability than for Wikidata. @MPham_(WMF): maybe you want to comment. --- Jura 12:00, 22 December 2021 (UTC)[reply]
If it helps, this was our vertical analysis on Wikidata: in short, descriptions made up the largest proportion of triples by a good margin. MPham (WMF) (talk) 18:09, 22 December 2021 (UTC)[reply]
@MPham_(WMF): The property is meant for Commons. The initial plan above seems to be to add triples for 2.5 million files as SDC, i.e. on Wikimedia Commons Query Service (WCQS). If there is no capacity issue in sight, let's go ahead then. --- Jura 19:35, 22 December 2021 (UTC)[reply]
Does that mean you support this proposal now? Ooligan (talk) 00:37, 28 December 2021 (UTC)[reply]
Let MPham respond to the comment. (or someone form the SDC development team). --- Jura 09:50, 28 December 2021 (UTC)[reply]
Thanks for your patience with my response. As of Feb 1, 2022, WCQS is in beta 2, which includes better infrastructure to support the growth of WCQS's triple store, which is currently around 3.66 B triples. This is considerably below the current size of WDQS (~13.6B triples), where we expect capacity problems. I don't foresee 2.5M triples being added to WCQS to cause any capacity problems, but please keep in mind that graph and querying functionality isn't completely dependent on only the number of triples. MPham (WMF) (talk) 17:28, 13 February 2022 (UTC)[reply]
  •   Oppose given the open question about the viability of the duplication from file description pages on the SDC server of unstructured text. Querying is already possible in its current location. --- Jura 12:00, 22 December 2021 (UTC)[reply]
  •   Strong support Let's get it done. Thank you Dominic for the timely suggestion. The DPLA is now focusing on getting smaller non-profit historical organizations and local museums to become new members in order to add their digitized collections as well. I'm looking forward to your next suggestion. --Ooligan (talk) 00:52, 28 December 2021 (UTC)[reply]
@Dominic, Jura1, Multichill, Susannaanas, GiFontenelle: @Arbnos, ArthurPSmith, MPham (WMF), Ooligan:   Done original catalog description (P10358) Pamputt (talk) 17:52, 12 February 2022 (UTC)[reply]